File size: 3,293 Bytes
5b550ef d31577b 21359da d31577b 5b550ef d31577b 21359da d31577b 21359da d31577b 21359da d31577b 21359da d31577b 21359da d31577b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 |
---
language:
- en
- zh
- de
- es
- ru
- ko
- fr
- ja
- pt
- tr
- pl
- ca
- nl
- ar
- sv
- it
- id
- hi
- fi
- vi
- he
- uk
- el
- ms
- cs
- ro
- da
- hu
- ta
- 'no'
- th
- ur
- hr
- bg
- lt
- la
- mi
- ml
- cy
- sk
- te
- fa
- lv
- bn
- sr
- az
- sl
- kn
- et
- mk
- br
- eu
- is
- hy
- ne
- mn
- bs
- kk
- sq
- sw
- gl
- mr
- pa
- si
- km
- sn
- yo
- so
- af
- oc
- ka
- be
- tg
- sd
- gu
- am
- yi
- lo
- uz
- fo
- ht
- ps
- tk
- nn
- mt
- sa
- lb
- my
- bo
- tl
- mg
- as
- tt
- haw
- ln
- ha
- ba
- jw
- su
- yue
tags:
- audio
- automatic-speech-recognition
license: mit
library_name: ctranslate2
---
faster-whisper officially supports the large-v3 model now. The link is [Systran/faster-whisper-large-v3](https://huggingface.co/Systran/faster-whisper-large-v3)
___
**README.md file is based on "[guillaumekln/faster-whisper-large-v2](https://huggingface.co/guillaumekln/faster-whisper-large-v2)" and has been updated to version 3 content.**
# Whisper large-v3 model for CTranslate2
This repository contains the conversion of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) to the [CTranslate2](https://github.com/OpenNMT/CTranslate2) model format.
This model can be used in CTranslate2 or projects based on CTranslate2 such as [faster-whisper](https://github.com/guillaumekln/faster-whisper).
## Example
```python
from faster_whisper import WhisperModel
model = WhisperModel("large-v3")
segments, info = model.transcribe("audio.mp3")
for segment in segments:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
```
## Conversion details
The original model was converted with the following command:
```
ct2-transformers-converter --model openai/whisper-large-v3 --output_dir faster-whisper-large-v3 \
--copy_files added_tokens.json special_tokens_map.json tokenizer_config.json vocab.json --quantization float16
```
Note that the model weights are saved in FP16. This type can be changed when the model is loaded using the [`compute_type` option in CTranslate2](https://opennmt.net/CTranslate2/quantization.html).
Note that while "openai/whisper-large-v3" does not come with a "tokenizer.json" file, you can generate it using AutoTokenizer.
```python
from transformers import AutoTokenizer
self.hf_tokenizer = AutoTokenizer.from_pretrained("openai/whisper-large-v3")
self.hf_tokenizer.save_pretrained("whisper-large-v3-test")
```
## How faster-whisper working with Whisper-large-v3
**In faster-whisper version 0.10.0, there is no need to perform this handling.**
~~[Working with Whisper-large-v3 #547](https://github.com/guillaumekln/faster-whisper/issues/547) by. UmarRamzan~~
```diff
- from faster_whisper import WhisperModel
- model = WhisperModel(model_url)
- if "large-v3" in model_url:
- model.feature_extractor.mel_filters = model.feature_extractor.get_mel_filters(model.feature_extractor.sampling_rate, model.feature_extractor.n_fft, n_mels=128)
```
## More information
**For more information about the original model, see its [model card](https://huggingface.co/openai/whisper-large-v3).**
|