kamahori nielsr HF staff commited on
Commit
6444716
·
verified ·
1 Parent(s): 593d016

Improve model card: add link to code and example usage (#1)

Browse files

- Improve model card: add link to code and example usage (e642600c900164946ec1791311955e78ae44d2a3)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +42 -7
README.md CHANGED
@@ -1,13 +1,13 @@
1
  ---
2
- license: apache-2.0
3
- library_name: transformers
4
  base_model: openai/whisper-large-v3
5
- tags:
6
- - audio
7
- - automatic-speech-recognition
8
- - whisper
9
- - hf-asr-leaderboard
10
  pipeline_tag: automatic-speech-recognition
 
 
 
 
 
11
  ---
12
 
13
  # Model Card for Lite-Whisper large-v3
@@ -16,6 +16,41 @@ pipeline_tag: automatic-speech-recognition
16
 
17
  Lite-Whisper is a compressed version of OpenAI Whisper with LiteASR. See our [GitHub repository](https://github.com/efeslab/LiteASR) and [paper](https://arxiv.org/abs/2502.20583) for details.
18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  ## Benchmark Results
20
 
21
  Following is the average word error rate (WER) evaluated on the [ESB datasets](https://huggingface.co/datasets/hf-audio/esb-datasets-test-only-sorted):
 
1
  ---
 
 
2
  base_model: openai/whisper-large-v3
3
+ library_name: transformers
4
+ license: apache-2.0
 
 
 
5
  pipeline_tag: automatic-speech-recognition
6
+ tags:
7
+ - audio
8
+ - automatic-speech-recognition
9
+ - whisper
10
+ - hf-asr-leaderboard
11
  ---
12
 
13
  # Model Card for Lite-Whisper large-v3
 
16
 
17
  Lite-Whisper is a compressed version of OpenAI Whisper with LiteASR. See our [GitHub repository](https://github.com/efeslab/LiteASR) and [paper](https://arxiv.org/abs/2502.20583) for details.
18
 
19
+ Here's a code snippet to get started:
20
+ ```python
21
+ import librosa
22
+ import torch
23
+ from transformers import AutoProcessor, AutoModel
24
+
25
+ device = "cuda:0"
26
+ dtype = torch.float16
27
+
28
+ # load the compressed Whisper model
29
+ model = AutoModel.from_pretrained(
30
+ "efficient-speech/lite-whisper-large-v3-turbo",
31
+ trust_remote_code=True,
32
+ )
33
+ model.to(dtype).to(device)
34
+
35
+ # we use the same processor as the original model
36
+ processor = AutoProcessor.from_pretrained("openai/whisper-large-v3")
37
+
38
+ # set the path to your audio file
39
+ path = "path/to/audio.wav"
40
+ audio, _ = librosa.load(path, sr=16000)
41
+
42
+ input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
43
+ input_features = input_features.to(dtype).to(device)
44
+
45
+ predicted_ids = model.generate(input_features)
46
+ transcription = processor.batch_decode(
47
+ predicted_ids,
48
+ skip_special_tokens=True
49
+ )[0]
50
+
51
+ print(transcription)
52
+ ```
53
+
54
  ## Benchmark Results
55
 
56
  Following is the average word error rate (WER) evaluated on the [ESB datasets](https://huggingface.co/datasets/hf-audio/esb-datasets-test-only-sorted):