devasheeshG's picture
added benchmarks
fbc6b69
|
raw
history blame
6.18 kB

Versions:

  • CUDA: 12.1
  • cuDNN Version: 8.9.2.26_1.0-1_amd64
  • tensorflow Version: 2.12.0
  • torch Version: 2.1.0.dev20230606+cu12135
  • transformers Version: 4.30.2
  • accelerate Version: 0.20.3

Model Benchmarks:

  • RAM: 2.8 GB (Original_Model: 5.5GB)

  • VRAM: 1812 MB (Original_Model: 6GB)

  • test.wav: 23 s (Multilingual Speech i.e. English+Hindi)

    • Time in seconds for Processing by each device
    Device Name float32 (Original) float16 CudaCores TensorCores
    3060 1.7 1.1 3,584 112
    1660 Super OOM 3.3 1,408 -
    Collab (Tesla T4) 2.8 2.2 2,560 320
    Collab (CPU) 35 - - -
    M1 (CPU) - - - -
    M1 (GPU -> 'mps') - - - -
    • NOTE: TensorCores are efficient in mixed-precision calculations
    • CPU -> torch.float16 not supported on CPU (AMD Ryzen 5 3600 or Collab GPU)
  • Punchuation: True

Model Error Benchmarks:

  • WER: Word Error Rate
  • MER: Match Error Rate
  • WIL: Word Information Lost
  • WIP: Word Information Preserved
  • CER: Character Error Rate

Hindi (test.tsv -> 2557 samples used) Common Voice 14.0

WER MER WIL WIP CER
Original_Model - - - - -
This_Model - - - - -

English

WER MER WIL WIP CER
Original_Model - - - - -
This_Model - - - - -
  • 'jiwer' library is used for calculations

Code:

Usage

A file __init__.py is contained inside this repo which contains all the code to use this model.

Firstly, clone this repo and place all the files inside a folder.

Make sure you have git-lfs installed (https://git-lfs.com)

git lfs install
git clone https://huggingface.co/devasheeshG/whisper_medium_fp16_transformers

Please try in jupyter notebook

# Import the Model
from whisper_medium_fp16_transformers import Model
# Initilise the model
model = Model(
            model_name_or_path='whisper_medium_fp16_transformers',
            cuda_visible_device="0", 
            device='cuda',
      )
# Load Audio
audio = model.load_audio('whisper_medium_fp16_transformers/test.wav')
# Transcribe (First transcription takes time)
model.transcribe(audio)