whisper-base-bn / README.md
shhossain's picture
Update README.md
188022f
metadata
license: apache-2.0
datasets:
  - mozilla-foundation/common_voice_11_0
language:
  - en
  - bn
metrics:
  - wer
library_name: transformers
pipeline_tag: automatic-speech-recognition

Results

  • WER 46

Use with BanglaSpeech2text

Test it in Google Colab

  • Open In Colab

Installation

You can install the library using pip:

pip install banglaspeech2text

Usage

Model Initialization

To use the library, you need to initialize the Speech2Text class with the desired model. By default, it uses the "base" model, but you can choose from different pre-trained models: "tiny", "small", "medium", "base", or "large". Here's an example:

from banglaspeech2text import Speech2Text

stt = Speech2Text(model="base")

# You can use it wihout specifying model name (default model is "base")
stt = Speech2Text()

Transcribing Audio Files

You can transcribe an audio file by calling the transcribe method and passing the path to the audio file. It will return the transcribed text as a string. Here's an example:

transcription = stt.transcribe("audio.wav")
print(transcription)

Use with SpeechRecognition

You can use SpeechRecognition package to get audio from microphone and transcribe it. Here's an example:

import speech_recognition as sr
from banglaspeech2text import Speech2Text

stt = Speech2Text(model="base")

r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)
    output = stt.recognize(audio)

print(output)

Use GPU

You can use GPU for faster inference. Here's an example:


stt = Speech2Text(model="base",use_gpu=True)

Advanced GPU Usage

For more advanced GPU usage you can use device or device_map parameter. Here's an example:

stt = Speech2Text(model="base",device="cuda:0")
stt = Speech2Text(model="base",device_map="auto")

NOTE: Read more about Pytorch Device

Instantly Check with gradio

You can instantly check the model with gradio. Here's an example:

from banglaspeech2text import Speech2Text, available_models
import gradio as gr

stt = Speech2Text(model="base",use_gpu=True)

# You can also open the url and check it in mobile
gr.Interface(
    fn=stt.transcribe, 
    inputs=gr.Audio(source="microphone", type="filepath"), 
    outputs="text").launch(share=True)

Note: For more usecases and models -> BanglaSpeech2Text

Use with transformers

Installation

pip install transformers
pip install torch

Usage

Use with file

from transformers import pipeline

pipe = pipeline('automatic-speech-recognition','shhossain/whisper-base-bn')

def transcribe(audio_path):
  return pipe(audio_path)['text']

audio_file = "test.wav"

print(transcribe(audio_file))