devasheeshG commited on
Commit
9c5bcaf
1 Parent(s): 130a95f

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -330
README.md DELETED
@@ -1,330 +0,0 @@
1
- ---
2
- license: apache-2.0
3
- pipeline_tag: automatic-speech-recognition
4
- tags:
5
- - pytorch
6
- - audio
7
- - speech
8
- - automatic-speech-recognition
9
- - whisper
10
- - wav2vec2
11
-
12
- model-index:
13
- - name: whisper_medium_fp16_transformers
14
- results:
15
- - task:
16
- type: automatic-speech-recognition
17
- name: Automatic Speech Recognition
18
- dataset:
19
- type: librispeech_asr
20
- name: LibriSpeech (clean)
21
- config: clean
22
- split: test
23
- args:
24
- language: en
25
- metrics:
26
- - type: wer
27
- value: 0
28
- name: Test WER
29
- description: Word Error Rate
30
- - type: mer
31
- value: 0
32
- name: Test MER
33
- description: Match Error Rate
34
- - type: wil
35
- value: 0
36
- name: Test WIL
37
- description: Word Information Lost
38
- - type: wip
39
- value: 0
40
- name: Test WIP
41
- description: Word Information Preserved
42
- - type: cer
43
- value: 0
44
- name: Test CER
45
- description: Character Error Rate
46
-
47
- - task:
48
- type: automatic-speech-recognition
49
- name: Automatic Speech Recognition
50
- dataset:
51
- type: librispeech_asr
52
- name: LibriSpeech (other)
53
- config: other
54
- split: test
55
- args:
56
- language: en
57
- metrics:
58
- - type: wer
59
- value: 0
60
- name: Test WER
61
- description: Word Error Rate
62
- - type: mer
63
- value: 0
64
- name: Test MER
65
- description: Match Error Rate
66
- - type: wil
67
- value: 0
68
- name: Test WIL
69
- description: Word Information Lost
70
- - type: wip
71
- value: 0
72
- name: Test WIP
73
- description: Word Information Preserved
74
- - type: cer
75
- value: 0
76
- name: Test CER
77
- description: Character Error Rate
78
-
79
- - task:
80
- type: automatic-speech-recognition
81
- name: Automatic Speech Recognition
82
- dataset:
83
- type: mozilla-foundation/common_voice_14_0
84
- name: Common Voice (14.0) (Hindi)
85
- config: hi
86
- split: test
87
- args:
88
- language: hi
89
- metrics:
90
- - type: wer
91
- value: 44.64
92
- name: Test WER
93
- description: Word Error Rate
94
- - type: mer
95
- value: 41.69
96
- name: Test MER
97
- description: Match Error Rate
98
- - type: wil
99
- value: 59.53
100
- name: Test WIL
101
- description: Word Information Lost
102
- - type: wip
103
- value: 40.46
104
- name: Test WIP
105
- description: Word Information Preserved
106
- - type: cer
107
- value: 16.80
108
- name: Test CER
109
- description: Character Error Rate
110
-
111
- widget:
112
- - example_title: Hinglish Sample
113
- src: https://huggingface.co/devasheeshG/whisper_medium_fp16_transformers/resolve/main/test.wav
114
- - example_title: Librispeech sample 1
115
- src: https://cdn-media.huggingface.co/speech_samples/sample1.flac
116
- - example_title: Librispeech sample 2
117
- src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
118
-
119
- language:
120
- - en
121
- - zh
122
- - de
123
- - es
124
- - ru
125
- - ko
126
- - fr
127
- - ja
128
- - pt
129
- - tr
130
- - pl
131
- - ca
132
- - nl
133
- - ar
134
- - sv
135
- - it
136
- - id
137
- - hi
138
- - fi
139
- - vi
140
- - he
141
- - uk
142
- - el
143
- - ms
144
- - cs
145
- - ro
146
- - da
147
- - hu
148
- - ta
149
- - "no"
150
- - th
151
- - ur
152
- - hr
153
- - bg
154
- - lt
155
- - la
156
- - mi
157
- - ml
158
- - cy
159
- - sk
160
- - te
161
- - fa
162
- - lv
163
- - bn
164
- - sr
165
- - az
166
- - sl
167
- - kn
168
- - et
169
- - mk
170
- - br
171
- - eu
172
- - is
173
- - hy
174
- - ne
175
- - mn
176
- - bs
177
- - kk
178
- - sq
179
- - sw
180
- - gl
181
- - mr
182
- - pa
183
- - si
184
- - km
185
- - sn
186
- - yo
187
- - so
188
- - af
189
- - oc
190
- - ka
191
- - be
192
- - tg
193
- - sd
194
- - gu
195
- - am
196
- - yi
197
- - lo
198
- - uz
199
- - fo
200
- - ht
201
- - ps
202
- - tk
203
- - nn
204
- - mt
205
- - sa
206
- - lb
207
- - my
208
- - bo
209
- - tl
210
- - mg
211
- - as
212
- - tt
213
- - haw
214
- - ln
215
- - ha
216
- - ba
217
- - jw
218
- - su
219
- ---
220
-
221
- ## Versions:
222
-
223
- - CUDA: 12.1
224
- - cuDNN Version: 8.9.2.26_1.0-1_amd64
225
- - tensorflow Version: 2.12.0
226
- - torch Version: 2.1.0.dev20230606+cu12135
227
- - transformers Version: 4.30.2
228
- - accelerate Version: 0.20.3
229
-
230
- ## Model Benchmarks:
231
-
232
- - RAM: 3 GB (Original_Model: 6GB)
233
- - VRAM: 3.7 GB (Original_Model: 11GB)
234
- - test.wav: 23 s (Multilingual Speech i.e. English+Hindi)
235
-
236
- - **Time in seconds for Processing by each device**
237
-
238
- | Device Name | float32 (Original) | float16 | CudaCores | TensorCores |
239
- | ----------------- | ------------------ | ------- | --------- | ----------- |
240
- | 3060 | 2.2 | 1.3 | 3,584 | 112 |
241
- | 1660 Super | OOM | 6 | 1,408 | N/A |
242
- | Collab (Tesla T4) | - | - | 2,560 | 320 |
243
- | Collab (CPU) | - | N/A | N/A | N/A |
244
- | M1 (CPU) | - | - | N/A | N/A |
245
- | M1 (GPU -> 'mps') | - | - | N/A | N/A |
246
-
247
- - **NOTE: TensorCores are efficient in mixed-precision calculations**
248
- - **CPU -> torch.float16 not supported on CPU (AMD Ryzen 5 3600 or Collab CPU)**
249
-
250
- - Punchuation: False ('I don't know the exact reason why this is hapening :)')
251
-
252
- ## Model Error Benchmarks:
253
-
254
- - **WER: Word Error Rate**
255
- - **MER: Match Error Rate**
256
- - **WIL: Word Information Lost**
257
- - **WIP: Word Information Preserved**
258
- - **CER: Character Error Rate**
259
-
260
- ### Hindi (test.tsv) [Common Voice 14.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_14_0)
261
-
262
- **Test done on RTX 3060 on 1000 Samples**
263
-
264
- | | WER | MER | WIL | WIP | CER |
265
- | ----------------------- | ----- | ----- | ----- | ----- | ----- |
266
- | Original_Model (30 min) | 43.99 | 41.65 | 59.47 | 40.52 | 16.23 |
267
- | This_Model (20 min) | 44.64 | 41.69 | 59.53 | 40.46 | 16.80 |
268
-
269
- ### English ([LibriSpeech](https://huggingface.co/datasets/librispeech_asr) -> test-clean)
270
-
271
- **Test done on RTX 3060 on \_\_\_ Samples**
272
-
273
- | | WER | MER | WIL | WIP | CER |
274
- | -------------- | --- | --- | --- | --- | --- |
275
- | Original_Model | - | - | - | - | - |
276
- | This_Model | - | - | - | - | - |
277
-
278
- ### English ([LibriSpeech](https://huggingface.co/datasets/librispeech_asr) -> test-other)
279
-
280
- **Test done on RTX 3060 on \_\_\_ Samples**
281
-
282
- | | WER | MER | WIL | WIP | CER |
283
- | -------------- | --- | --- | --- | --- | --- |
284
- | Original_Model | - | - | - | - | - |
285
- | This_Model | - | - | - | - | - |
286
-
287
- - **'jiwer' library is used for calculations**
288
-
289
- ## Code for conversion:
290
-
291
- - ### [Will be soon Uploaded on Github](https://github.com/devasheeshG)
292
-
293
- ## Usage
294
-
295
- A file `__init__.py` is contained inside this repo which contains all the code to use this model.
296
-
297
- Firstly, clone this repo and place all the files inside a folder.
298
-
299
- ### Make sure you have git-lfs installed (https://git-lfs.com)
300
-
301
- ```bash
302
- git lfs install
303
- git clone https://huggingface.co/devasheeshG/whisper_large_v2_fp16_transformers
304
- ```
305
-
306
- **Please try in jupyter notebook**
307
-
308
- ```python
309
- # Import the Model
310
- from whisper_large_v2_fp16_transformers import Model
311
- ```
312
-
313
- ```python
314
- # Initilise the model
315
- model = Model(
316
- model_name_or_path='whisper_large_v2_fp16_transformers',
317
- cuda_visible_device="0",
318
- device='cuda',
319
- )
320
- ```
321
-
322
- ```python
323
- # Load Audio
324
- audio = model.load_audio('whisper_large_v2_fp16_transformers/test.wav')
325
- ```
326
-
327
- ```python
328
- # Transcribe (First transcription takes time)
329
- model.transcribe(audio)
330
- ```