license: apache-2.0
datasets:
- mozilla-foundation/common_voice_11_0
language:
- en
- bn
metrics:
- wer
library_name: transformers
pipeline_tag: automatic-speech-recognition
Results
- WER 46
Use with BanglaSpeech2text
Test it in Google Colab
Installation
You can install the library using pip:
pip install banglaspeech2text
Usage
Model Initialization
To use the library, you need to initialize the Speech2Text class with the desired model. By default, it uses the "base" model, but you can choose from different pre-trained models: "tiny", "small", "medium", "base", or "large". Here's an example:
from banglaspeech2text import Speech2Text
stt = Speech2Text(model="base")
# You can use it wihout specifying model name (default model is "base")
stt = Speech2Text()
Transcribing Audio Files
You can transcribe an audio file by calling the transcribe method and passing the path to the audio file. It will return the transcribed text as a string. Here's an example:
transcription = stt.transcribe("audio.wav")
print(transcription)
Use with SpeechRecognition
You can use SpeechRecognition package to get audio from microphone and transcribe it. Here's an example:
import speech_recognition as sr
from banglaspeech2text import Speech2Text
stt = Speech2Text(model="base")
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
output = stt.recognize(audio)
print(output)
Use GPU
You can use GPU for faster inference. Here's an example:
stt = Speech2Text(model="base",use_gpu=True)
Advanced GPU Usage
For more advanced GPU usage you can use device
or device_map
parameter. Here's an example:
stt = Speech2Text(model="base",device="cuda:0")
stt = Speech2Text(model="base",device_map="auto")
NOTE: Read more about Pytorch Device
Instantly Check with gradio
You can instantly check the model with gradio. Here's an example:
from banglaspeech2text import Speech2Text, available_models
import gradio as gr
stt = Speech2Text(model="base",use_gpu=True)
# You can also open the url and check it in mobile
gr.Interface(
fn=stt.transcribe,
inputs=gr.Audio(source="microphone", type="filepath"),
outputs="text").launch(share=True)
Note: For more usecases and models -> BanglaSpeech2Text
Use with transformers
Installation
pip install transformers
pip install torch
Usage
Use with file
from transformers import pipeline
pipe = pipeline('automatic-speech-recognition','shhossain/whisper-base-bn')
def transcribe(audio_path):
return pipe(audio_path)['text']
audio_file = "test.wav"
print(transcribe(audio_file))