nvidia/parakeet-rnnt-1.1b · OOM error in long form audio transcription.

Jan 4

Hey Team,
i used the model to transcribe long form audio of upwards of 1 hr and i faced OOM error, however for a 14 min audio file i was able to transcribe with blowing up 88% of VRAM (48GB A6000)

Is there a way for long form audio transcription by still keeping compute consistent ?

nithinraok

NVIDIA org Jan 13

Yes, pls check steps as mentioned here: https://nvidia.github.io/NeMo/blogs/2024/2024-01-parakeet/#long-form-speech-inference

StephennFernandes

Jan 19

thanks!

StephennFernandes

Jan 19

@nithinraok is the pretraining code + steps to reproduce Parakeet opensourced ? asking because i have a bunch of multilingual private speech corpus, would be great to train a multilingual version of this.

smajumdar94

NVIDIA org Jan 22

•

edited Jan 22

The code is available in NeMo, you can use the steps in the tutorial called Fine-tuning ASR CTC in NeMo tutorials for ASR for finetuning. https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/ASR_CTC_Language_Finetuning.ipynb

Once you've gone through the tutorial, you can follow along and use the finetune script here for multi node multi GPU fine-tuning - https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_to_text_finetune.py

For pretraining, you can use the following tutorial https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/ASR_with_Transducers.ipynb

And follow along with the script here for large scale training - https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_transducer/speech_to_text_rnnt_bpe.py

StephennFernandes

Jan 23

thanks @smajumdar94

smajumdar94 changed discussion status to closed Feb 7