metadata

license: apache-2.0
base_model: openai/whisper-small
tags:
  - generated_from_trainer
metrics:
  - wer
model-index:
  - name: whisper-small-myanmar
    results: []
datasets:
  - chuuhtetnaing/myanmar-speech-dataset-openslr-80
language:
  - my
pipeline_tag: automatic-speech-recognition
library_name: transformers

whisper-small-myanmar

This model is a fine-tuned version of openai/whisper-small on the chuuhtetnaing/myanmar-speech-dataset-openslr-80 dataset. It achieves the following results on the evaluation set:

Loss: 0.1904
Wer: 49.0650

Usage

from datasets import Audio, load_dataset
from transformers import pipeline

# Load a sample audio
dataset = load_dataset("chuuhtetnaing/myanmar-speech-dataset-openslr-80")
dataset = dataset.cast_column("audio", Audio(sampling_rate=16000))
test_dataset = dataset['test']
input_speech = test_dataset[42]['audio']

pipe = pipeline(model='chuuhtetnaing/whisper-small-myanmar')

output = pipe(input_speech, generate_kwargs={"language": "myanmar", "task": "transcribe"})
print(output['text']) # ကျွန်မ ပြည်ပ မှာ ပညာသင် တော့ စာမေးပွဲ ကို တပတ်တခါ စစ်တယ်

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 64
eval_batch_size: 64
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 200
num_epochs: 30
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
1.2566	1.0	36	0.8893	215.0045
0.8862	2.0	72	0.6243	388.6465
0.3546	3.0	108	0.2046	316.8744
0.1839	4.0	144	0.1695	81.3001
0.1198	5.0	180	0.1385	63.8914
0.0969	6.0	216	0.1583	66.0285
0.084	7.0	252	0.1539	70.6589
0.0628	8.0	288	0.1603	61.3090
0.0565	9.0	324	0.1424	60.3295
0.0355	10.0	360	0.1457	58.1478
0.0299	11.0	396	0.1547	57.7916
0.0183	12.0	432	0.1543	54.3633
0.0131	13.0	468	0.1532	54.1407
0.011	14.0	504	0.1604	53.8736
0.0083	15.0	540	0.1630	54.0516
0.0042	16.0	576	0.1711	52.1371
0.0034	17.0	612	0.1670	52.5824
0.0022	18.0	648	0.1649	52.5378
0.0013	19.0	684	0.1802	52.1817
0.0014	20.0	720	0.1820	53.1612
0.002	21.0	756	0.1792	52.7159
0.0016	22.0	792	0.1796	50.7124
0.0004	23.0	828	0.1803	50.4007
0.0003	24.0	864	0.1804	49.4657
0.0001	25.0	900	0.1819	49.2431
0.0	26.0	936	0.1857	49.0205
0.0	27.0	972	0.1879	49.1541
0.0	28.0	1008	0.1893	49.1095
0.0	29.0	1044	0.1901	49.1095
0.0	30.0	1080	0.1904	49.0650

Framework versions

Transformers 4.35.2
Pytorch 2.1.1+cu121
Datasets 2.14.5
Tokenizers 0.15.1