wav2vec2-large-xls-r-300m-kaqchikel-with-bloom

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on a collection of audio from Deditos videos in Kaqchikel provided by Viña Studios and Kaqchikel audio from audiobooks on Bloom Library. It achieves the following results on the evaluation set:

Loss: 0.6700
Cer: 0.0854
Wer: 0.3069

Model description

Homepage: SIL AI
Point of Contact: SIL AI email
Source Data: Bloom Library and Viña Studios

This model is a baseline model finetuned from XLS-R 300m. Users should refer to the original model for tutorials on using a trained model for inference.

Intended uses & limitations

Users of this model should abide by the UN Declarations on the Rights of Indigenous Peoples.

This model is released under the MIT license and no guarantees are made regarding the performance of the model is specific situations.

Training and evaluation data

Training, Validation, and Test datasets were generated from the same corpus, ensuring that no duplicate files were used.

Training procedure

Standard finetuning of XLS-R was used based on the examples in the Hugging Face Transformers Github

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 200
num_epochs: 100
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Cer	Wer
11.1557	1.84	100	4.2251	1.0	1.0
3.7231	3.7	200	3.5794	1.0	1.0
3.3076	5.55	300	3.4362	1.0	1.0
3.2495	7.4	400	3.2553	1.0	1.0
3.2076	9.26	500	3.2932	1.0	1.0
3.1304	11.11	600	3.1100	1.0	1.0
2.899	12.95	700	2.4021	0.8477	1.0
2.2875	14.81	800	1.5473	0.4790	0.9984
1.7605	16.66	900	1.1034	0.3061	0.9192
1.3802	18.51	1000	0.9422	0.2386	0.8530
1.0989	20.37	1100	0.7429	0.1667	0.6042
0.857	22.22	1200	0.7490	0.1499	0.5751
0.6899	24.07	1300	0.6376	0.1286	0.4798
0.5927	25.92	1400	0.6887	0.1232	0.4443
0.4699	27.77	1500	0.6341	0.1184	0.4378
0.4029	29.62	1600	0.6341	0.1103	0.4216
0.3492	31.48	1700	0.6709	0.1121	0.4120
0.3019	33.33	1800	0.7665	0.1097	0.4136
0.2681	35.18	1900	0.6671	0.1085	0.4120
0.2491	37.04	2000	0.7049	0.1010	0.3748
0.2108	38.88	2100	0.6699	0.1064	0.3974
0.2146	40.73	2200	0.7037	0.1046	0.3780
0.1854	42.59	2300	0.6970	0.1055	0.4006
0.1693	44.44	2400	0.6593	0.0980	0.3764
0.1628	46.29	2500	0.7162	0.0998	0.3764
0.156	48.15	2600	0.6445	0.0998	0.3829
0.1439	49.99	2700	0.6437	0.1004	0.3845
0.1292	51.84	2800	0.6471	0.0944	0.3457
0.1287	53.7	2900	0.6411	0.0923	0.3538
0.1186	55.55	3000	0.6754	0.0992	0.3813
0.1175	57.4	3100	0.6741	0.0953	0.3538
0.1082	59.26	3200	0.6949	0.0977	0.3619
0.105	61.11	3300	0.6919	0.0983	0.3683
0.1048	62.95	3400	0.6802	0.0950	0.3425
0.092	64.81	3500	0.6830	0.0962	0.3263
0.0904	66.66	3600	0.6993	0.0971	0.3554
0.0914	68.51	3700	0.6932	0.0995	0.3554
0.0823	70.37	3800	0.6742	0.0950	0.3409
0.0799	72.22	3900	0.6852	0.0917	0.3279
0.0767	74.07	4000	0.6684	0.0929	0.3489
0.0736	75.92	4100	0.6611	0.0923	0.3393
0.0708	77.77	4200	0.7123	0.0944	0.3393
0.0661	79.62	4300	0.6577	0.0899	0.3247
0.0651	81.48	4400	0.6671	0.0869	0.3150
0.0607	83.33	4500	0.6980	0.0893	0.3231
0.0552	85.18	4600	0.6947	0.0884	0.3183
0.0574	87.04	4700	0.6652	0.0899	0.3183
0.0503	88.88	4800	0.6798	0.0863	0.3053
0.0479	90.73	4900	0.6690	0.0884	0.3166
0.0483	92.59	5000	0.6789	0.0872	0.3069
0.0437	94.44	5100	0.6758	0.0875	0.3069
0.0458	96.29	5200	0.6662	0.0884	0.3102
0.0434	98.15	5300	0.6699	0.0881	0.3069
0.0449	99.99	5400	0.6700	0.0854	0.3069

Framework versions

Transformers 4.11.3
Pytorch 1.10.0+cu113
Datasets 2.2.1
Tokenizers 0.10.3

sil-ai
/

w2v2-kaqchikel