This model is a finetuned whisper-large-v2 model with 1M audio samples from the dataset mitermix/audiosnippets