OOM error in long form audio transcription.
Hey Team,
i used the model to transcribe long form audio of upwards of 1 hr and i faced OOM error, however for a 14 min audio file i was able to transcribe with blowing up 88% of VRAM (48GB A6000)
Is there a way for long form audio transcription by still keeping compute consistent ?
Yes, pls check steps as mentioned here: https://nvidia.github.io/NeMo/blogs/2024/2024-01-parakeet/#long-form-speech-inference
thanks!
@nithinraok is the pretraining code + steps to reproduce Parakeet opensourced ? asking because i have a bunch of multilingual private speech corpus, would be great to train a multilingual version of this.
The code is available in NeMo, you can use the steps in the tutorial called Fine-tuning ASR CTC in NeMo tutorials for ASR for finetuning. https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/ASR_CTC_Language_Finetuning.ipynb
Once you've gone through the tutorial, you can follow along and use the finetune script here for multi node multi GPU fine-tuning - https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_to_text_finetune.py
For pretraining, you can use the following tutorial https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/ASR_with_Transducers.ipynb
And follow along with the script here for large scale training - https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_transducer/speech_to_text_rnnt_bpe.py
thanks @smajumdar94