metadata
license: apache-2.0
pipeline_tag: automatic-speech-recognition
tags:
- pytorch
- audio
- automatic-speech-recognition
language:
- en
- zh
- de
- es
- ru
- ko
- fr
- ja
- pt
- tr
- pl
- ca
- nl
- ar
- sv
- it
- id
- hi
- fi
- vi
- he
- uk
- el
- ms
- cs
- ro
- da
- hu
- ta
- 'no'
- th
- ur
- hr
- bg
- lt
- la
- mi
- ml
- cy
- sk
- te
- fa
- lv
- bn
- sr
- az
- sl
- kn
- et
- mk
- br
- eu
- is
- hy
- ne
- mn
- bs
- kk
- sq
- sw
- gl
- mr
- pa
- si
- km
- sn
- yo
- so
- af
- oc
- ka
- be
- tg
- sd
- gu
- am
- yi
- lo
- uz
- fo
- ht
- ps
- tk
- nn
- mt
- sa
- lb
- my
- bo
- tl
- mg
- as
- tt
- haw
- ln
- ha
- ba
- jw
- su
Versions:
- CUDA: 12.1
- cuDNN Version: 8.9.2.26_1.0-1_amd64
- tensorflow Version: 2.12.0
- torch Version: 2.1.0.dev20230606+cu12135
- transformers Version: 4.30.2
- accelerate Version: 0.20.3
BENCHMARK:
RAM: 2.8 GB (Original_Model: 5.5GB)
VRAM: 1812 MB (Original_Model: 6GB)
test.wav: 23 s (Multilingual Speech i.e. English+Hindi)
- Time in seconds for Processing by each device
Device Name float32 (Original) float16 CudaCores TensorCores 3060 1.7 1.1 3,584 112 1660 Super OOM 3.3 1,408 - Collab (Tesla T4) 2.8 2.2 2,560 320 Collab (CPU) 35 - - - - CPU -> torch.float16 not supported on CPU (AMD Ryzen 5 3600 or Collab GPU)
Punchuation: True
Usage
A file __init__.py
is contained inside this repo which contains all the code to use this model.
Firstly, clone this repo and place all the files inside a folder.
Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/devasheeshG/whisper_medium_fp16_transformers
Please try in jupyter notebook
# Import the Model
from whisper_medium_fp16_transformers import Model
# Initilise the model
model = Model(
model_name_or_path='whisper_medium_fp16_transformers',
cuda_visible_device="0",
device='cuda',
)
# Load Audio
audio = model.load_audio('whisper_medium_fp16_transformers/test.wav')
# Transcribe (First transcription takes time)
model.transcribe(audio)