Whisper Small GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-small on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia dataset. The datasets are augmented in two ways: noise augmentation, and truncating low-amplitude samples. The best model checkpoint (this version) based on ChrF is at step 2000, epoch 0.4378, and it achieves the following results on the evaluation set:

Loss: 1.2119
Bleu: 30.93
Chrf: 49.09
Wer: 63.1247

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 0.02
training_steps: 4000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Bleu	Chrf	Validation Loss	Wer
2.7017	0.02	100	2.83	14.96	2.4392	169.5182
2.6732	0.04	200	7.27	22.72	1.9552	103.2868
2.1622	0.07	300	11.43	30.01	1.7297	108.2395
2.0314	0.09	400	12.96	31.0	1.6499	106.4385
1.7219	0.11	500	12.94	33.67	1.5543	107.6092
1.577	0.13	600	12.84	35.03	1.4812	118.5502
1.3569	0.1532	700	19.94	38.08	1.4559	84.2864
1.3401	0.1751	800	13.39	36.11	1.3855	126.4295
1.2272	0.1970	900	24.39	41.75	1.3764	70.7789
1.2793	0.2189	1000	23.01	42.13	1.3389	80.6844
1.0383	0.2408	1100	23.42	43.59	1.3125	82.3953
1.0485	0.2627	1200	25.42	42.99	1.2996	69.4732
1.0427	0.2846	1300	29.24	45.36	1.2996	65.6461
0.8174	0.3065	1400	27.28	45.67	1.2522	68.3926
0.7345	0.3284	1500	26.35	46.78	1.2349	79.1986
0.7551	0.3503	1600	27.81	46.49	1.2317	70.6439
0.6765	0.3722	1700	27.62	47.46	1.2062	70.9140
0.6613	0.3940	1800	26.56	47.12	1.2087	72.8050
0.6181	0.4159	1900	29.91	48.76	1.2139	65.2859
0.5809	0.4378	2000	30.93	49.09	1.2119	63.1247
0.5898	0.4597	2100	25.91	46.24	1.2540	73.9307
0.5926	0.4816	2200	25.19	44.72	1.2479	78.7933
0.5158	0.5035	2300	28.9	46.76	1.2532	66.3665
0.4511	0.5254	2400	28.89	46.83	1.2517	66.3215
0.4329	0.5473	2500	26.19	45.91	1.2573	72.6700
0.4106	0.5692	2600	26.91	46.84	1.2615	72.4899
0.4002	0.5911	2700	27.77	46.93	1.2396	71.0491
0.4047	0.6130	2800	29.9	47.79	1.2450	66.9968
0.3719	0.6349	2900	30.5	48.78	1.2522	65.1959
0.327	0.6567	3000	31.22	49.0	1.2493	64.1153
0.3138	0.6786	3100	30.1	47.82	1.2653	65.1959
0.3349	0.7005	3200	30.37	48.64	1.2651	63.9802
0.2807	0.7224	3300	26.02	45.46	1.2762	76.8573
0.2648	0.7443	3400	30.65	47.58	1.2761	64.6105
0.2633	0.7662	3500	29.73	47.74	1.2890	65.5110
0.2316	0.7881	3600	29.94	47.33	1.2886	66.4566
0.233	0.8100	3700	27.82	48.01	1.2905	73.1202
0.2196	0.8319	3800	31.51	48.66	1.2994	63.7100
0.2119	0.8538	3900	30.09	48.44	1.2910	65.0158
0.2082	0.8757	4000	30.91	47.99	1.2924	65.1058

Framework versions

Transformers 4.40.0
Pytorch 2.2.1+cu121
Datasets 2.18.0
Tokenizers 0.19.1

ymoslem
/

whisper-small-ga2en-v5.2

Whisper Small GA-EN Speech Translation

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for ymoslem/whisper-small-ga2en-v5.2

Datasets used to train ymoslem/whisper-small-ga2en-v5.2

Collection including ymoslem/whisper-small-ga2en-v5.2

Speech Translation (Irish-English)

Evaluation results