|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- mozilla-foundation/common_voice_11_0 |
|
language: |
|
- en |
|
- bn |
|
metrics: |
|
- wer |
|
library_name: transformers |
|
pipeline_tag: automatic-speech-recognition |
|
--- |
|
## Results |
|
- WER 46 |
|
|
|
# Use with [BanglaSpeech2text](https://github.com/shhossain/BanglaSpeech2Text) |
|
|
|
## Test it in Google Colab |
|
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shhossain/BanglaSpeech2Text/blob/main/banglaspeech2text_in_colab.ipynb) |
|
|
|
## Installation |
|
You can install the library using pip: |
|
|
|
```bash |
|
pip install banglaspeech2text |
|
``` |
|
|
|
## Usage |
|
### Model Initialization |
|
To use the library, you need to initialize the Speech2Text class with the desired model. By default, it uses the "base" model, but you can choose from different pre-trained models: "tiny", "small", "medium", "base", or "large". Here's an example: |
|
|
|
```python |
|
from banglaspeech2text import Speech2Text |
|
|
|
stt = Speech2Text(model="base") |
|
|
|
# You can use it wihout specifying model name (default model is "base") |
|
stt = Speech2Text() |
|
``` |
|
|
|
### Transcribing Audio Files |
|
You can transcribe an audio file by calling the transcribe method and passing the path to the audio file. It will return the transcribed text as a string. Here's an example: |
|
|
|
```python |
|
transcription = stt.transcribe("audio.wav") |
|
print(transcription) |
|
``` |
|
|
|
### Use with SpeechRecognition |
|
You can use [SpeechRecognition](https://pypi.org/project/SpeechRecognition/) package to get audio from microphone and transcribe it. Here's an example: |
|
```python |
|
import speech_recognition as sr |
|
from banglaspeech2text import Speech2Text |
|
|
|
stt = Speech2Text(model="base") |
|
|
|
r = sr.Recognizer() |
|
with sr.Microphone() as source: |
|
print("Say something!") |
|
audio = r.listen(source) |
|
output = stt.recognize(audio) |
|
|
|
print(output) |
|
``` |
|
|
|
### Use GPU |
|
You can use GPU for faster inference. Here's an example: |
|
```python |
|
|
|
stt = Speech2Text(model="base",use_gpu=True) |
|
|
|
``` |
|
### Advanced GPU Usage |
|
For more advanced GPU usage you can use `device` or `device_map` parameter. Here's an example: |
|
```python |
|
stt = Speech2Text(model="base",device="cuda:0") |
|
``` |
|
```python |
|
stt = Speech2Text(model="base",device_map="auto") |
|
``` |
|
__NOTE__: Read more about [Pytorch Device](https://pytorch.org/docs/stable/tensor_attributes.html#torch.torch.device) |
|
|
|
### Instantly Check with gradio |
|
You can instantly check the model with gradio. Here's an example: |
|
```python |
|
from banglaspeech2text import Speech2Text, available_models |
|
import gradio as gr |
|
|
|
stt = Speech2Text(model="base",use_gpu=True) |
|
|
|
# You can also open the url and check it in mobile |
|
gr.Interface( |
|
fn=stt.transcribe, |
|
inputs=gr.Audio(source="microphone", type="filepath"), |
|
outputs="text").launch(share=True) |
|
``` |
|
|
|
__Note__: For more usecases and models -> [BanglaSpeech2Text](https://github.com/shhossain/BanglaSpeech2Text) |
|
|
|
# Use with transformers |
|
### Installation |
|
``` |
|
pip install transformers |
|
pip install torch |
|
``` |
|
|
|
## Usage |
|
|
|
### Use with file |
|
```python |
|
from transformers import pipeline |
|
|
|
pipe = pipeline('automatic-speech-recognition','shhossain/whisper-base-bn') |
|
|
|
def transcribe(audio_path): |
|
return pipe(audio_path)['text'] |
|
|
|
audio_file = "test.wav" |
|
|
|
print(transcribe(audio_file)) |
|
``` |