--- license: apache-2.0 pipeline_tag: automatic-speech-recognition tags: - pytorch --- ## Versions: - CUDA: 12.1 - cuDNN Version: 8.9.2.26_1.0-1_amd64 * tensorflow Version: 2.12.0 * torch Version: 2.1.0.dev20230606+cu12135 * transformers Version: 4.30.2 * accelerate Version: 0.20.3 ## BENCHMARK: - RAM: 2.8 GB (Original_Model: 5.5GB) - VRAM: 1812 MB (Original_Model: 6GB) - test.wav: 23 s (Multilingual Speech i.e. English+Hindi) | Device Name | float32 (Original) | float16 | CudaCores | TensorCores | | ----------------- | -------------------- | ------- | --------- | ----------- | | 3060 | 1.7 | 1.1 | 3,584 | 112 | | 1660 Super | OOM | 3.3 | 1,408 | - | | Collab (Tesla T4) | 2.8 | 2.2 | 2,560 | 320 | | Collab (CPU) | 35 | - | - | - | - CPU -> torch.float16 not supported on CPU (AMD Ryzen 5 3600 or Collab GPU) - Punchuation: True ## Usage A file ``__init__.py`` is contained inside this repo which contains all the code to use this model. Firstly, clone this repo and place all the files inside a folder. **Please try in jupyter notebook** ```python # Import the Model from whisper_medium_fp16_transformers import Model ``` ```python # Initilise the model model = Model( model_name_or_path='whisper_medium_fp16_transformers', cuda_visible_device="0", device='cuda', ) ``` ```python # Load Audio audio = model.load_audio('test.wav') ``` ```python # Transcribe (First transcription takes time.) model.transcribe(audio) ```