Adding more information to the model card.
Browse files
README.md
CHANGED
@@ -201,13 +201,54 @@ library_name: transformers
|
|
201 |
# whisper-large-v3-ca-3catparla
|
202 |
- **Paper:** [3CatParla: A New Open-Source Corpus of Broadcast TV in Catalan for Automatic Speech Recognition](https://iberspeech.tech/)
|
203 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
204 |
The "whisper-large-v3-ca-3catparla" is an acoustic model suitable for Automatic Speech Recognition in Catalan. It is the result of finetuning the model ["openai/whisper-large-v3"](https://huggingface.co/openai/whisper-large-v3) with 710 hours of Catalan data released by the [Projecte AINA](https://projecteaina.cat/) from Barcelona, Spain.
|
205 |
|
206 |
-
|
|
|
|
|
207 |
|
208 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
209 |
|
210 |
-
# Evaluation
|
211 |
```python
|
212 |
import torch
|
213 |
from transformers import WhisperForConditionalGeneration, WhisperProcessor
|
@@ -250,8 +291,34 @@ print(WER)
|
|
250 |
```
|
251 |
**Test Result**: 0.96
|
252 |
|
253 |
-
|
254 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
255 |
```bibtex
|
256 |
@misc{mena2024whisperlarge3catparla,
|
257 |
title={Acoustic Model in Catalan: whisper-large-v3-ca-3catparla.},
|
@@ -261,6 +328,24 @@ print(WER)
|
|
261 |
year={2024}
|
262 |
}
|
263 |
```
|
264 |
-
# Acknowledgements
|
265 |
|
266 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
201 |
# whisper-large-v3-ca-3catparla
|
202 |
- **Paper:** [3CatParla: A New Open-Source Corpus of Broadcast TV in Catalan for Automatic Speech Recognition](https://iberspeech.tech/)
|
203 |
|
204 |
+
## Table of Contents
|
205 |
+
<details>
|
206 |
+
<summary>Click to expand</summary>
|
207 |
+
|
208 |
+
- [Model description](#model-description)
|
209 |
+
- [Intended uses and limitations](#intended-uses-and-limitations)
|
210 |
+
- [How to use](#how-to-use)
|
211 |
+
- [Training](#training)
|
212 |
+
- [Evaluation](#evaluation)
|
213 |
+
- [Citation](#citation)
|
214 |
+
- [Additional information](#additional-information)
|
215 |
+
|
216 |
+
</details>
|
217 |
+
|
218 |
+
## Summary
|
219 |
+
|
220 |
+
The "whisper-large-v3-ca-3catparla" is an acoustic model based on ["openai/whisper-large-v3"](https://huggingface.co/openai/whisper-large-v3) suitable for Automatic Speech Recognition in Catalan.
|
221 |
+
|
222 |
+
## Model Description
|
223 |
+
|
224 |
The "whisper-large-v3-ca-3catparla" is an acoustic model suitable for Automatic Speech Recognition in Catalan. It is the result of finetuning the model ["openai/whisper-large-v3"](https://huggingface.co/openai/whisper-large-v3) with 710 hours of Catalan data released by the [Projecte AINA](https://projecteaina.cat/) from Barcelona, Spain.
|
225 |
|
226 |
+
## Intended Uses and Limitations
|
227 |
+
|
228 |
+
This model can used for Automatic Speech Recognition (ASR) in Catalan. The model is intended to transcribe audio files in Catalan to plain text without punctuation.
|
229 |
|
230 |
+
## How to Get Started with the Model
|
231 |
+
|
232 |
+
### Installation
|
233 |
+
|
234 |
+
In order to use this model, you may install [datasets](https://huggingface.co/docs/datasets/installation) and [transformers](https://huggingface.co/docs/transformers/installation):
|
235 |
+
|
236 |
+
Create a virtual environment:
|
237 |
+
```bash
|
238 |
+
python -m venv /path/to/venv
|
239 |
+
```
|
240 |
+
Activate the environment:
|
241 |
+
```bash
|
242 |
+
source /path/to/venv/bin/activate
|
243 |
+
```
|
244 |
+
Install the modules:
|
245 |
+
```bash
|
246 |
+
pip install datasets transformers
|
247 |
+
```
|
248 |
+
|
249 |
+
### For Inference
|
250 |
+
In order to transcribe audio in Catalan using this model, you can follow this example:
|
251 |
|
|
|
252 |
```python
|
253 |
import torch
|
254 |
from transformers import WhisperForConditionalGeneration, WhisperProcessor
|
|
|
291 |
```
|
292 |
**Test Result**: 0.96
|
293 |
|
294 |
+
## Training Details
|
295 |
+
|
296 |
+
### Training data
|
297 |
+
|
298 |
+
The specific dataset used to create the model is called ["3CatParla"](https://huggingface.co/datasets/projecte-aina/3catparla_asr).
|
299 |
+
|
300 |
+
### Training procedure
|
301 |
+
|
302 |
+
This model is the result of finetuning the model ["openai/whisper-large-v3"](https://huggingface.co/openai/whisper-large-v3) by following this [tutorial](https://huggingface.co/blog/fine-tune-whisper) provided by Hugging Face.
|
303 |
+
|
304 |
+
### Training Hyperparameters
|
305 |
+
|
306 |
+
* language: catalan
|
307 |
+
* hours of training audio: 710
|
308 |
+
* learning rate: 1.95e-07
|
309 |
+
* sample rate: 16000
|
310 |
+
* train batch size: 32 (x4 GPUs)
|
311 |
+
* gradient accumulation steps: 1
|
312 |
+
* eval batch size: 32
|
313 |
+
* save total limit: 3
|
314 |
+
* max steps: 19842
|
315 |
+
* warmup steps: 1984
|
316 |
+
* eval steps: 3307
|
317 |
+
* save steps: 3307
|
318 |
+
* shuffle buffer size: 480
|
319 |
+
|
320 |
+
## Citation
|
321 |
+
If this code contributes to your research, please cite the work:
|
322 |
```bibtex
|
323 |
@misc{mena2024whisperlarge3catparla,
|
324 |
title={Acoustic Model in Catalan: whisper-large-v3-ca-3catparla.},
|
|
|
328 |
year={2024}
|
329 |
}
|
330 |
```
|
|
|
331 |
|
332 |
+
## Additional Information
|
333 |
+
|
334 |
+
### Author
|
335 |
+
|
336 |
+
The fine-tuning process was perform during July (2024) in the [Language Technologies Unit](https://huggingface.co/BSC-LT) of the [Barcelona Supercomputing Center](https://www.bsc.es/) by [Carlos Daniel Hernández Mena](https://huggingface.co/carlosdanielhernandezmena).
|
337 |
+
|
338 |
+
### Contact
|
339 |
+
For further information, please send an email to <[email protected]>.
|
340 |
+
|
341 |
+
### Copyright
|
342 |
+
Copyright(c) 2024 by Language Technologies Unit, Barcelona Supercomputing Center.
|
343 |
+
|
344 |
+
### License
|
345 |
+
|
346 |
+
[Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
347 |
+
|
348 |
+
### Funding
|
349 |
+
This work has been promoted and financed by the Generalitat de Catalunya through the [Aina project](https://projecteaina.cat/).
|
350 |
+
|
351 |
+
The training of the model was possible thanks to the compute time provided by [Barcelona Supercomputing Center](https://www.bsc.es/) through MareNostrum 5.
|