wav2vec2-large-960h

This model is a fine-tuned version of facebook/wav2vec2-large-960h on an acc_dataset_v2" dataset (commit: a41c520).

It achieves the following results on the evaluation set:

Model description

This is a voice-2-text transcription model specialized for the acc dataset.

Training was based on the training set in acc_dataset_v2 and evaluation based on the validation and test sets in the same dataset.

See the Jupyter notebook Finetuning-notebook-wav2vec2-large-960h-on-acc-data for the full training procedure.

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 64
eval_batch_size: 8
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 15
num_epochs: 128
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Wer
1.2379	8.3333	50	0.6501	0.3056
0.4865	16.6667	100	0.7069	0.2790
0.3054	25.0	150	0.6598	0.2369
0.2308	33.3333	200	0.6517	0.2215
0.1793	41.6667	250	0.6884	0.2103
0.1379	50.0	300	0.6418	0.1949
0.1253	58.3333	350	0.7004	0.1918
0.0988	66.6667	400	0.6059	0.1846
0.088	75.0	450	0.6507	0.1826
0.0773	83.3333	500	0.5473	0.1682
0.0686	91.6667	550	0.6027	0.1682
0.0643	100.0	600	0.6192	0.1713
0.0595	108.3333	650	0.6119	0.1703
0.0562	116.6667	700	0.5953	0.16
0.0507	125.0	750	0.6286	0.1538