kanyekuthi commited on
Commit
21e5e2d
·
1 Parent(s): ac4a245

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -2
README.md CHANGED
@@ -8,6 +8,60 @@ metrics:
8
  - wer
9
  library_name: transformers
10
  pipeline_tag: automatic-speech-recognition
 
 
11
  tags:
12
- - code
13
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  - wer
9
  library_name: transformers
10
  pipeline_tag: automatic-speech-recognition
11
+ finetuned_from: openai/whisper-small
12
+ tasks: automatic-speech-recognition
13
  tags:
14
+ - audio
15
+ - automatic-speech-recognition
16
+ - hf-asr-leaderboard
17
+ ---
18
+ # Whisper Small Model Card
19
+
20
+ <!-- Provide a quick summary of what the model is/does. -->
21
+
22
+ Whisper Small is a pre-trained model for automatic speech recognition (ASR) and speech translation.
23
+ It is a Transformer-based encoder-decoder model, also referred to as a sequence-to-sequence model.
24
+ It was trained on 680k hours of labelled speech data annotated using large-scale weak supervision.
25
+ The model has 244 million parameters and is multilingual
26
+
27
+ ##### Performance
28
+ Whisper Small has a high accuracy and can generalize well to many datasets and domains without the need for fine-tuning.
29
+
30
+ ##### Usage
31
+ To transcribe audio samples, the model has to be used alongside a WhisperProcessor.
32
+ The WhisperProcessor is used to pre-process the audio inputs (converting them to log-Mel spectrograms for the model)
33
+ and post-process the model outputs (converting them from tokens to text).
34
+
35
+ ##### References
36
+ - ** https://huggingface.co/openai/whisper-small
37
+ - ** https://github.com/openai/whisper
38
+ - ** https://openai.com/research/whisper
39
+ - ** https://www.assemblyai.com/blog/how-to-run-openais-whisper-speech-recognition-model/
40
+
41
+
42
+ ## Model Details
43
+ Whisper is a transformer-based encoder-decoder model, also referred to as a sequence-to-sequence model.
44
+ It was trained on 1 million hours of weakly labeled audio and 4 million hours of pseudolabeled audio collected using Whisper large-v2.
45
+
46
+ The models were trained on either English-only data or multilingual data.
47
+ The English-only models were trained on the task of speech recognition.
48
+ The multilingual models were trained on both speech recognition and speech translation.
49
+ For speech recognition, the model predicts transcriptions in the same language as the audio.
50
+ For speech translation, the model predicts transcriptions to a different language to the audio.
51
+
52
+
53
+ ## Uses
54
+
55
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
56
+ - Transcription
57
+ - Translation
58
+
59
+
60
+ ## Training hyperparameters
61
+ <!-- Relevant interpretability work for the model goes here -->
62
+ - learning_rate: 1e-5
63
+ - train_batch_size: 8
64
+ - eval_batch_size: 8
65
+ - lr_scheduler_warmup_steps: 500
66
+ - max_steps: 4000
67
+ - metric_for_best_model: wer