kanyekuthi
/

dsn_afrispeech

Automatic Speech Recognition

hf-asr-leaderboard

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

kanyekuthi commited on Nov 13, 2023

Commit

21e5e2d

·

1 Parent(s): ac4a245

Update README.md

Files changed (1) hide show

README.md +56 -2

README.md CHANGED Viewed

@@ -8,6 +8,60 @@ metrics:
 - wer
 library_name: transformers
 pipeline_tag: automatic-speech-recognition
 tags:
-- code
----

 - wer
 library_name: transformers
 pipeline_tag: automatic-speech-recognition
+finetuned_from: openai/whisper-small
+tasks: automatic-speech-recognition
 tags:
+- audio
+- automatic-speech-recognition
+- hf-asr-leaderboard
+---
+# Whisper Small Model Card
+<!-- Provide a quick summary of what the model is/does. -->
+Whisper Small is a pre-trained model for automatic speech recognition (ASR) and speech translation.
+It is a Transformer-based encoder-decoder model, also referred to as a sequence-to-sequence model.
+It was trained on 680k hours of labelled speech data annotated using large-scale weak supervision.
+The model has 244 million parameters and is multilingual
+##### Performance
+Whisper Small has a high accuracy and can generalize well to many datasets and domains without the need for fine-tuning.
+##### Usage
+To transcribe audio samples, the model has to be used alongside a WhisperProcessor.
+The WhisperProcessor is used to pre-process the audio inputs (converting them to log-Mel spectrograms for the model)
+and post-process the model outputs (converting them from tokens to text).
+##### References
+- ** https://huggingface.co/openai/whisper-small
+- ** https://github.com/openai/whisper
+- ** https://openai.com/research/whisper
+- ** https://www.assemblyai.com/blog/how-to-run-openais-whisper-speech-recognition-model/
+## Model Details
+Whisper is a transformer-based encoder-decoder model, also referred to as a sequence-to-sequence model.
+It was trained on 1 million hours of weakly labeled audio and 4 million hours of pseudolabeled audio collected using Whisper large-v2.
+The models were trained on either English-only data or multilingual data.
+The English-only models were trained on the task of speech recognition.
+The multilingual models were trained on both speech recognition and speech translation.
+For speech recognition, the model predicts transcriptions in the same language as the audio.
+For speech translation, the model predicts transcriptions to a different language to the audio.
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+- Transcription
+- Translation
+## Training hyperparameters
+<!-- Relevant interpretability work for the model goes here -->
+- learning_rate: 1e-5
+- train_batch_size: 8
+- eval_batch_size: 8
+- lr_scheduler_warmup_steps: 500
+- max_steps: 4000
+- metric_for_best_model: wer