Automatic Speech Recognition
Transformers
PyTorch
distilwhisper
text2text-generation
mzboito's picture
Update README.md
d3b86c8
|
raw
history blame
856 Bytes
metadata
license: mit
datasets:
  - mozilla-foundation/common_voice_13_0
language:
  - ca
  - bg
  - cs
  - fi
  - gl
  - hi
  - hu
  - pl
  - ro
  - sk
  - ta
  - th

About

Multilingual Distilwhisper allows for better inference in target languages by adding lightweight CLSR modules on top of whisper-small. These modules are trained on a mix of cross-entropy (ASR) and knowledge distillation losses, where whisper-large-v2 is used as teacher.

Inference

Loader will be made available soon at https://github.com/naver

Citation (submitted to ICASSP 2024)

@article{ferraz2023distilwhisper,
  title={DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts},
  author={Ferraz, Thomas Palmeira and Boito, Marcely Zanon and Brun, Caroline and Nikoulina, Vassilina},
  journal={arXiv preprint arXiv:2311.01070},
  year={2023}
}