aapot commited on
Commit
80013bd
1 Parent(s): 8fb0899

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -16
README.md CHANGED
@@ -30,9 +30,23 @@ model-index:
30
  - name: Test CER
31
  type: cer
32
  value: 1.2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  ---
34
 
35
- # Wav2Vec2 XLS-R for Finnish ASR
36
 
37
  This acoustic model is a fine-tuned version of [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) for Finnish ASR. The model has been fine-tuned with 259.57 hours of Finnish transcribed speech data. Wav2Vec2 XLS-R was introduced in
38
  [this paper](https://arxiv.org/abs/2111.09296) and first released at [this page](https://github.com/pytorch/fairseq/tree/main/examples/wav2vec#wav2vec-20).
@@ -154,7 +168,9 @@ The pretrained `facebook/wav2vec2-xls-r-1b` model was initialized with following
154
 
155
  ## Evaluation results
156
 
157
- Evaluation was done with the [Common Voice 7.0 Finnish test split](https://huggingface.co/datasets/mozilla-foundation/common_voice_7_0) and with the [Common Voice 9.0 Finnish test split](https://huggingface.co/datasets/mozilla-foundation/common_voice_9_0). This model's training data includes the training splits of Common Voice 7.0 but our newest `Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned` model includes the Common Voice 9.0 so we ran tests for both versions. Note: Common Voice doesn't seem to fully preserve the test split as fixed between the dataset versions so it is possible that some of the training examples of Common Voice 9.0 are in the test split of the Common Voice 7.0 and vice versa. Thus, test result comparisons are not fully accurate between the models trained with different Common Voice versions but the comparison should still be meaningful enough.
 
 
158
 
159
  ### Common Voice 7.0 testing
160
 
@@ -164,14 +180,15 @@ To evaluate this model, run the `eval.py` script in this repository:
164
  python3 eval.py --model_id Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm --dataset mozilla-foundation/common_voice_7_0 --config fi --split test
165
  ```
166
 
167
- This model (the second row of the table) achieves the following WER (Word Error Rate) and CER (Character Error Rate) results compared to our other models and their parameter counts:
168
 
169
- | | Model parameters | WER (with LM) | WER (without LM) | CER (with LM) | CER (without LM) |
170
- |----------------------------------------------------|------------------|---------------|------------------|---------------|------------------|
171
- |Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2 | 1000 million |**4.09** |**9.73** |**0.88** |**1.65** |
172
- |Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm | 1000 million |5.65 |13.11 |1.20 |2.23 |
173
- |Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned | 95 million |5.85 |13.52 |1.35 |2.44 |
174
- |Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm | 300 million |8.16 |17.92 |1.97 |3.36 |
 
175
 
176
  ### Common Voice 9.0 testing
177
 
@@ -181,14 +198,33 @@ To evaluate this model, run the `eval.py` script in this repository:
181
  python3 eval.py --model_id Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm --dataset mozilla-foundation/common_voice_9_0 --config fi --split test
182
  ```
183
 
184
- This model (the second row of the table) achieves the following WER (Word Error Rate) and CER (Character Error Rate) results compared to our other models and their parameter counts:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
185
 
186
- | | Model parameters | WER (with LM) | WER (without LM) | CER (with LM) | CER (without LM) |
187
- |----------------------------------------------------|------------------|---------------|------------------|---------------|------------------|
188
- |Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2 | 1000 million |**3.72** |**8.96** |**0.80** |**1.52** |
189
- |Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm | 1000 million |5.35 |13.00 |1.14 |2.20 |
190
- |Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned | 95 million |5.93 |14.08 |1.40 |2.59 |
191
- |Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm | 300 million |7.42 |16.45 |1.79 |3.07 |
 
192
 
193
  ## Team Members
194
 
 
30
  - name: Test CER
31
  type: cer
32
  value: 1.2
33
+ - task:
34
+ name: Automatic Speech Recognition
35
+ type: automatic-speech-recognition
36
+ dataset:
37
+ name: FLEURS ASR
38
+ type: google/fleurs
39
+ args: fi_fi
40
+ metrics:
41
+ - name: Test WER
42
+ type: wer
43
+ value: 20.34
44
+ - name: Test CER
45
+ type: cer
46
+ value: 6.97
47
  ---
48
 
49
+ # Wav2vec2-xls-r-1b for Finnish ASR
50
 
51
  This acoustic model is a fine-tuned version of [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) for Finnish ASR. The model has been fine-tuned with 259.57 hours of Finnish transcribed speech data. Wav2Vec2 XLS-R was introduced in
52
  [this paper](https://arxiv.org/abs/2111.09296) and first released at [this page](https://github.com/pytorch/fairseq/tree/main/examples/wav2vec#wav2vec-20).
 
168
 
169
  ## Evaluation results
170
 
171
+ Evaluation was done with the [Common Voice 7.0 Finnish test split](https://huggingface.co/datasets/mozilla-foundation/common_voice_7_0), [Common Voice 9.0 Finnish test split](https://huggingface.co/datasets/mozilla-foundation/common_voice_9_0) and with the [FLEURS ASR Finnish test split](https://huggingface.co/datasets/google/fleurs).
172
+
173
+ This model's training data includes the training splits of Common Voice 7.0 but our newer `Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned` and `Finnish-NLP/wav2vec2-large-uralic-voxpopuli-v2-finnish` models include the Common Voice 9.0 so we ran tests for both Common Voice versions. Note: Common Voice doesn't seem to fully preserve the test split as fixed between the dataset versions so it is possible that some of the training examples of Common Voice 9.0 are in the test split of the Common Voice 7.0 and vice versa. Thus, Common Voice test result comparisons are not fully accurate between the models trained with different Common Voice versions but the comparison should still be meaningful enough.
174
 
175
  ### Common Voice 7.0 testing
176
 
 
180
  python3 eval.py --model_id Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm --dataset mozilla-foundation/common_voice_7_0 --config fi --split test
181
  ```
182
 
183
+ This model (the fourth row of the table) achieves the following WER (Word Error Rate) and CER (Character Error Rate) results compared to our other models and their parameter counts:
184
 
185
+ | | Model parameters | WER (with LM) | WER (without LM) | CER (with LM) | CER (without LM) |
186
+ |-------------------------------------------------------|------------------|---------------|------------------|---------------|------------------|
187
+ |Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned | 95 million |5.85 |13.52 |1.35 |2.44 |
188
+ |Finnish-NLP/wav2vec2-large-uralic-voxpopuli-v2-finnish | 300 million |4.13 |**9.66** |0.90 |1.66 |
189
+ |Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm | 300 million |8.16 |17.92 |1.97 |3.36 |
190
+ |Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm | 1000 million |5.65 |13.11 |1.20 |2.23 |
191
+ |Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2 | 1000 million |**4.09** |9.73 |**0.88** |**1.65** |
192
 
193
  ### Common Voice 9.0 testing
194
 
 
198
  python3 eval.py --model_id Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm --dataset mozilla-foundation/common_voice_9_0 --config fi --split test
199
  ```
200
 
201
+ This model (the fourth row of the table) achieves the following WER (Word Error Rate) and CER (Character Error Rate) results compared to our other models and their parameter counts:
202
+
203
+ | | Model parameters | WER (with LM) | WER (without LM) | CER (with LM) | CER (without LM) |
204
+ |-------------------------------------------------------|------------------|---------------|------------------|---------------|------------------|
205
+ |Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned | 95 million |5.93 |14.08 |1.40 |2.59 |
206
+ |Finnish-NLP/wav2vec2-large-uralic-voxpopuli-v2-finnish | 300 million |4.13 |9.83 |0.92 |1.71 |
207
+ |Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm | 300 million |7.42 |16.45 |1.79 |3.07 |
208
+ |Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm | 1000 million |5.35 |13.00 |1.14 |2.20 |
209
+ |Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2 | 1000 million |**3.72** |**8.96** |**0.80** |**1.52** |
210
+
211
+ ### FLEURS ASR testing
212
+
213
+ To evaluate this model, run the `eval.py` script in this repository:
214
+
215
+ ```bash
216
+ python3 eval.py --model_id Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm --dataset google/fleurs --config fi_fi --split test
217
+ ```
218
+
219
+ This model (the fourth row of the table) achieves the following WER (Word Error Rate) and CER (Character Error Rate) results compared to our other models and their parameter counts:
220
 
221
+ | | Model parameters | WER (with LM) | WER (without LM) | CER (with LM) | CER (without LM) |
222
+ |-------------------------------------------------------|------------------|---------------|------------------|---------------|------------------|
223
+ |Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned | 95 million |13.99 |17.16 |6.07 |6.61 |
224
+ |Finnish-NLP/wav2vec2-large-uralic-voxpopuli-v2-finnish | 300 million |12.44 |**14.63** |5.77 |6.22 |
225
+ |Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm | 300 million |17.72 |23.30 |6.78 |7.67 |
226
+ |Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm | 1000 million |20.34 |16.67 |6.97 |6.35 |
227
+ |Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2 | 1000 million |**12.11** |14.89 |**5.65** |**6.06** |
228
 
229
  ## Team Members
230