metadata
language:
- ms
- en
Malaysian Finetune Whisper Small
Finetune Whisper Small on Malaysian dataset,
- IMDA STT, https://huggingface.co/datasets/mesolitica/IMDA-STT
- Pseudolabel Malaysian youtube videos, https://huggingface.co/datasets/mesolitica/pseudolabel-malaysian-youtube-whisper-large-v3
- Malay Conversational Speech Corpus, https://huggingface.co/datasets/malaysia-ai/malay-conversational-speech-corpus
- Haqkiem TTS Dataset, this is private, but you request access from https://www.linkedin.com/in/haqkiem-daim/
- Pseudolabel Nusantara audiobooks, https://huggingface.co/datasets/mesolitica/nusantara-audiobook
Script at https://github.com/mesolitica/malaya-speech/tree/malaysian-speech/session/whisper
Wandb at https://wandb.ai/huseinzol05/malaysian-whisper-small?workspace=user-huseinzol05