test-whisper-tiny-th
This model is a fine-tuned version of openai/whisper-tiny on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.8875
- Cer: 34.9798
Datasets
Training Loss | Epoch | Step | Validation Loss | Cer |
---|---|---|---|---|
:Speech Understanding: | ||||
transcription en (ASR) | 1.0 | 7 | 0.9713 | 37.2984 |
1.1414 | 2.0 | 14 | 0.9285 | 34.4758 |
0.8953 | 3.0 | 21 | 0.9022 | 35.2823 |
0.8953 | 4.0 | 28 | 0.8911 | 52.9234 |
0.8159 | 5.0 | 35 | 0.8875 | 34.9798 |
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 5.0
Training results
Training Loss | Epoch | Step | Validation Loss | Cer |
---|---|---|---|---|
No log | 1.0 | 7 | 0.9713 | 37.2984 |
1.1414 | 2.0 | 14 | 0.9285 | 34.4758 |
0.8953 | 3.0 | 21 | 0.9022 | 35.2823 |
0.8953 | 4.0 | 28 | 0.8911 | 52.9234 |
0.8159 | 5.0 | 35 | 0.8875 | 34.9798 |
Model | WER (CV18) | WER (Gowejee) | WER (LOTUS-TRD) | WER (Thai Dialect) | WER (Elderly) | WER (Gigaspeech2) | WER (Fleurs) | WER (Distant Meeting) | WER (Podcast) |
---|---|---|---|---|---|---|---|---|---|
whisper-large-v3 | 18.75 | 46.59 | 48.14 | 57.82 | 12.27 | 33.26 | 24.08 | 72.57 | 41.24 |
airesearch-wav2vec2-large-xlsr-53-th | 8.49 | 17.28 | 63.01 | 48.53 | 11.29 | 52.72 | 37.32 | 85.11 | 65.12 |
thonburian-whisper-th-large-v3-combined | 7.62 | 22.06 | 41.95 | 26.53 | 1.63 | 25.22 | 13.90 | 64.68 | 32.42 |
monsoon-whisper-medium-gigaspeech2 | 11.66 | 20.50 | 41.04 | 42.06 | 7.57 | 21.40 | 21.54 | 51.65 | 38.89 |
pathumma-whisper-th-large-v3 | 8.68 | 9.84 | 15.47 | 19.85 | 1.53 | 21.66 | 15.65 | 51.56 | 36.47 |
Model | ASR-th CV18 th (WER↓) | ASR-en CV18 En (WER↓) | ASR-en Librispeech En (WER↓) | ThaiSER Emotion (Acc↑, F1↑) | ThaiSER Gender (Acc↑, F1↑) |
---|---|---|---|---|---|
Typhoon-Audio-Preview | 13.26 | 13.34 (partial result) | 5.07 (partial result) | 41.50, 33.48 | 96.20, 96.69 |
DIVA | 69.15 (partial result) | 37.40 | 49.06 | 18.64, 8.16 | 47.50, 35.90 |
Gemini-1.5-Pro | 16.49 | 12.94 | 25.83 | 26.00, 18.26 | 79.66, 77.32 |
Pathumma-llm-audio-1.0.0 | 12.03 | 12.20 | 11.36 | 42.30, 36.88 | 90.30, 92.07 |
Training Loss | Epoch | Step | Validation Loss | Cer |
---|---|---|---|---|
No log | 1.0 | 7 | 0.9713 | 37.2984 |
1.1414 | 2.0 | 14 | 0.9285 | 34.4758 |
0.8953 | 3.0 | 21 | 0.9022 | 35.2823 |
0.8953 | 4.0 | 28 | 0.8911 | 52.9234 |
0.8159 | 5.0 | 35 | 0.8875 | 34.9798 |
Framework versions
- Transformers 4.44.2
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1
Citation
@misc{tipkasorn2024pathumma,
title = { {Pathumma-Audio} },
author = { Pattara Tipkasorn and Wayupuk Sommuang and Oatsada Chatthong and Kwanchiva Thangthai },
url = { https://huggingface.co/nectec/Pathumma-llm-audio-1.0.0 },
publisher = { Hugging Face },
year = { 2024 },
}
Citation
@misc{tipkasorn2024PatWhisper,
title = { {Pathumma Whisper Large V3 (TH)} },
author = { Pattara Tipkasorn and Wayupuk Sommuang and Oatsada Chatthong and Kwanchiva Thangthai },
url = { https://huggingface.co/nectec/Pathumma-whisper-th-large-v3 },
publisher = { Hugging Face },
year = { 2024 },
}
- Downloads last month
- 27
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for kwanchiva/test-whisper-tiny-th
Base model
openai/whisper-tiny