Edit model card

whisper-medium-myanmar

This model is a fine-tuned version of openai/whisper-medium on the chuuhtetnaing/myanmar-speech-dataset-openslr-80 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2282
  • Wer: 49.4657

Usage

from datasets import Audio, load_dataset
from transformers import pipeline

# Load a sample audio
dataset = load_dataset("chuuhtetnaing/myanmar-speech-dataset-openslr-80")
dataset = dataset.cast_column("audio", Audio(sampling_rate=16000))
test_dataset = dataset['test']
input_speech = test_dataset[42]['audio']

pipe = pipeline(model='chuuhtetnaing/whisper-medium-myanmar')

output = pipe(input_speech, generate_kwargs={"language": "myanmar", "task": "transcribe"})
print(output['text']) # α€€α€»α€™ α€•α€Όα€Šα€Ία€• မှာ α€•α€Šα€¬α€žα€„α€Ί တော့ စာမေးပွဲ α€€α€­α€― တပတ်တခါ α€…α€…α€Ία€α€šα€Ί

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 40
  • eval_batch_size: 40
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 200
  • num_epochs: 50
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.8546 1.0 57 0.5703 98.0855
0.2643 2.0 114 0.2404 84.9510
0.1982 3.0 171 0.1889 71.6385
0.1608 4.0 228 0.1781 68.4773
0.1212 5.0 285 0.1511 63.7133
0.1067 6.0 342 0.1427 60.2404
0.0682 7.0 399 0.1330 59.3500
0.0413 8.0 456 0.1322 56.9902
0.0249 9.0 513 0.1271 55.6545
0.0158 10.0 570 0.1430 54.8085
0.0124 11.0 627 0.1486 55.0312
0.0099 12.0 684 0.1550 53.7845
0.0082 13.0 741 0.1486 55.1647
0.0057 14.0 798 0.1747 53.6955
0.0041 15.0 855 0.1608 53.3393
0.0029 16.0 912 0.1596 50.6233
0.0013 17.0 969 0.1798 51.2912
0.0005 18.0 1026 0.1796 50.3562
0.0006 19.0 1083 0.1799 50.0890
0.0 20.0 1140 0.1849 50.2671
0.0001 21.0 1197 0.1878 50.0445
0.0 22.0 1254 0.1907 50.1781
0.0 23.0 1311 0.1929 50.0890
0.0 24.0 1368 0.1942 49.8664
0.0 25.0 1425 0.2019 50.0445
0.0 26.0 1482 0.2068 49.9555
0.0 27.0 1539 0.2103 50.0
0.0 28.0 1596 0.2129 49.9555
0.0 29.0 1653 0.2150 50.0
0.0 30.0 1710 0.2168 49.9555
0.0 31.0 1767 0.2183 49.9555
0.0 32.0 1824 0.2196 49.8664
0.0 33.0 1881 0.2208 49.6438
0.0 34.0 1938 0.2218 49.7329
0.0 35.0 1995 0.2227 49.5993
0.0 36.0 2052 0.2234 49.5548
0.0 37.0 2109 0.2242 49.5548
0.0 38.0 2166 0.2248 49.5102
0.0 39.0 2223 0.2253 49.5548
0.0 40.0 2280 0.2259 49.5548
0.0 41.0 2337 0.2263 49.5548
0.0 42.0 2394 0.2267 49.4657
0.0 43.0 2451 0.2271 49.5102
0.0 44.0 2508 0.2274 49.5102
0.0 45.0 2565 0.2276 49.4657
0.0 46.0 2622 0.2278 49.4657
0.0 47.0 2679 0.2280 49.5548
0.0 48.0 2736 0.2281 49.5102
0.0 49.0 2793 0.2282 49.5102
0.0 50.0 2850 0.2282 49.4657

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.1+cu121
  • Datasets 2.14.5
  • Tokenizers 0.15.1
Downloads last month
23
Safetensors
Model size
764M params
Tensor type
F32
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for chuuhtetnaing/whisper-medium-myanmar

Finetuned
(448)
this model

Dataset used to train chuuhtetnaing/whisper-medium-myanmar

Collection including chuuhtetnaing/whisper-medium-myanmar