Whisper Small GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-medium on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1873
  • Bleu: 33.1
  • Chrf: 51.85
  • Wer: 62.4043

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.02
  • training_steps: 4000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Bleu Chrf Wer
2.6291 0.0109 100 2.1971 2.33 16.34 175.5516
2.6591 0.0219 200 2.0357 5.57 22.49 122.2873
2.5637 0.0328 300 1.8690 7.67 26.29 133.0032
2.2954 0.0438 400 1.8062 11.2 30.03 114.2278
2.3292 0.0547 500 1.7421 9.85 29.28 117.2895
2.1223 0.0657 600 1.6739 14.56 32.56 84.2864
2.2398 0.0766 700 1.7187 13.86 34.74 98.9644
2.002 0.0876 800 1.6392 15.53 36.64 96.7582
1.8611 0.0985 900 1.6283 15.8 36.32 94.3719
1.8498 0.1095 1000 1.6102 17.58 36.0 85.5921
1.7585 0.1204 1100 1.6337 15.91 36.61 100.2251
1.6115 0.1314 1200 1.5381 22.21 39.94 76.8122
1.4415 0.1423 1300 1.5864 20.36 37.87 79.1986
1.5103 0.1533 1400 1.4925 23.2 41.26 75.2364
1.6576 0.1642 1500 1.4508 18.12 40.49 102.9266
1.3429 0.1752 1600 1.4399 27.88 43.74 69.7884
1.2522 0.1861 1700 1.4256 23.04 43.31 77.1724
1.2018 0.1970 1800 1.4072 21.06 40.39 78.6583
1.1945 0.2080 1900 1.4222 23.0 42.71 76.7222
1.1869 0.2189 2000 1.3992 22.54 42.02 75.8667
1.1752 0.2299 2100 1.3926 20.81 41.07 79.5137
1.0281 0.2408 2200 1.3633 27.24 45.55 69.6083
0.894 0.2518 2300 1.3287 28.6 45.58 65.8712
0.9788 0.2627 2400 1.3138 27.75 46.21 69.2931
0.8418 0.2737 2500 1.3064 27.85 46.17 68.3026
0.7559 0.2846 2600 1.2903 28.44 48.52 68.3476
0.8632 0.2956 2700 1.2834 27.87 46.86 68.3476
0.7501 0.3065 2800 1.2669 28.63 49.25 68.5277
0.6953 0.3175 2900 1.2615 30.46 48.83 64.4304
0.7195 0.3284 3000 1.2514 27.49 47.94 71.0941
0.6155 0.3394 3100 1.2428 30.06 49.64 66.5916
0.605 0.3503 3200 1.2040 31.64 50.27 63.8451
0.6349 0.3612 3300 1.2077 28.96 49.35 65.3760
0.4669 0.3722 3400 1.2219 31.17 48.95 64.2503
0.5196 0.3831 3500 1.2124 30.97 50.13 63.8001
0.5141 0.3941 3600 1.2026 31.97 50.8 63.0347
0.4221 0.4050 3700 1.1893 31.76 51.35 63.4399
0.2951 0.4160 3800 1.2049 32.4 51.08 63.1247
0.3898 0.4269 3900 1.1906 32.15 51.09 63.5299
0.4071 0.4379 4000 1.1873 33.1 51.85 62.4043

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
40
Safetensors
Model size
764M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ymoslem/whisper-medium-ga2en-v5.2-r

Finetuned
(539)
this model

Datasets used to train ymoslem/whisper-medium-ga2en-v5.2-r

Evaluation results

  • Bleu on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia
    self-reported
    33.100
  • Wer on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia
    self-reported
    62.404