Hands-on exercise
In this unit, we explored the challenges of fine-tuning ASR models, acknowledging the time and resources required to fine-tune a model like Whisper (even a small checkpoint) on a new language. To provide a hands-on experience, we have designed an exercise that allows you to navigate the process of fine-tuning an ASR model while using a smaller dataset. The main goal of this exercise is to familiarize you with the process rather than expecting production-level results. We have intentionally set a low metric to ensure that even with limited resources, you should be able to achieve it.
Here are the instructions:
- Fine-tune the
”openai/whisper-tiny”
model using the American English (“en-US”) subset of the”PolyAI/minds14”
dataset. - Use the first 450 examples for training, and the rest for evaluation. Ensure you set
num_proc=1
when pre-processing the dataset using the.map
method (this will ensure your model is submitted correctly for assessment). - To evaluate the model, use the
wer
andwer_ortho
metrics as described in this Unit. However, do not convert the metric into percentages by multiplying by 100 (E.g. if WER is 42%, we’ll expect to see the value of 0.42 in this exercise).
Once you have fine-tuned a model, make sure to upload it to the 🤗 Hub with the following kwargs
:
kwargs = {
"dataset_tags": "PolyAI/minds14",
"finetuned_from": "openai/whisper-tiny",
"tasks": "automatic-speech-recognition",
}
You will pass this assignment if your model’s normalised WER (wer
) is lower than 0.37.
Feel free to build a demo of your model, and share it on Discord! If you have questions, post them in the #audio-study-group channel.