xezpeleta
/

whisper-base-eu

@@ -16,70 +16,85 @@ model-index:
       name: Automatic Speech Recognition
       type: automatic-speech-recognition
     dataset:
-      name: asierhv/composite_corpus_eu_v2.1
-      type: asierhv/composite_corpus_eu_v2.1
     metrics:
     - name: Wer
       type: wer
-      value: 12.307980517047582
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # Whisper Base Basque
-This model is a fine-tuned version of [openai/whisper-base](https://huggingface.co/openai/whisper-base) on the asierhv/composite_corpus_eu_v2.1 dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.3183
-- Wer: 12.3080
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 2.5e-05
-- train_batch_size: 32
-- eval_batch_size: 16
-- seed: 42
-- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 500
-- training_steps: 10000
-- mixed_precision_training: Native AMP
-### Training results
-| Training Loss | Epoch | Step  | Validation Loss | Wer     |
-|:-------------:|:-----:|:-----:|:---------------:|:-------:|
-| 0.4816        | 0.1   | 1000  | 0.5136          | 25.7525 |
-| 0.2515        | 0.2   | 2000  | 0.4336          | 19.9950 |
-| 0.1792        | 0.3   | 3000  | 0.4054          | 17.6408 |
-| 0.2485        | 0.4   | 4000  | 0.3804          | 16.3794 |
-| 0.1007        | 0.5   | 5000  | 0.4056          | 15.2554 |
-| 0.1296        | 0.6   | 6000  | 0.3731          | 15.3241 |
-| 0.1555        | 0.7   | 7000  | 0.3764          | 13.3820 |
-| 0.114         | 0.8   | 8000  | 0.3097          | 12.7513 |
-| 0.0775        | 0.9   | 9000  | 0.3170          | 12.4578 |
-| 0.0836        | 1.0   | 10000 | 0.3183          | 12.3080 |
 ### Framework versions
-- Transformers 4.49.0.dev0
-- Pytorch 2.6.0+cu124
-- Datasets 3.3.1.dev0
-- Tokenizers 0.21.0

       name: Automatic Speech Recognition
       type: automatic-speech-recognition
     dataset:
+      name: Mozilla Common Voice 18.0
+      type: mozilla-foundation/common_voice_18_0
     metrics:
     - name: Wer
       type: wer
+      value: 10.78
+language:
+- eu
 ---
 # Whisper Base Basque
+This model is a fine-tuned version of [openai/whisper-base](https://huggingface.co/openai/whisper-base) specifically for Basque (eu) language Automatic Speech Recognition (ASR). It was trained on the [asierhv/composite_corpus_eu_v2.1](https://huggingface.co/datasets/asierhv/composite_corpus_eu_v2.1) dataset, which is a composite corpus designed to improve Basque ASR performance.
+**Key improvements and results compared to the base model:**
+* **Significant WER reduction:** The fine-tuned model achieves a Word Error Rate (WER) of 12.3080 on the validation set of the `asierhv/composite_corpus_eu_v2.1` dataset, demonstrating improved accuracy compared to the base `whisper-base` model for Basque.
+* **Performance on Common Voice:** When evaluated on the Mozilla Common Voice 18.0 dataset, the model achieved a WER of 10.78. This demonstrates the model's ability to generalize to other Basque speech datasets, and highlights the improvement in accuracy due to the larger base model.
 ## Model description
+This model builds upon the `whisper-base` architecture, known for its strong performance in multilingual speech recognition. By fine-tuning this model on a dedicated Basque speech corpus, it specializes in accurately transcribing Basque speech. The `whisper-base` model offers a larger capacity than `whisper-tiny`, resulting in higher accuracy, albeit with increased computational requirements.
 ## Intended uses & limitations
+**Intended uses:**
+* High-accuracy automatic transcription of Basque speech.
+* Development of advanced Basque speech-based applications.
+* Research in Basque speech processing requiring higher accuracy.
+* Professional transcription services for Basque language.
+* Applications where slightly higher computational cost is acceptable for improved accuracy.
+**Limitations:**
+* Performance remains dependent on audio quality, with challenges posed by background noise and poor recording conditions.
+* Accuracy may still be affected by highly dialectal or informal Basque speech.
+* While demonstrating improved performance, the model may still produce errors, especially with complex linguistic structures.
+* The base model is larger than the tiny, so inference will be slower and require more resources.
 ## Training and evaluation data
+* **Training dataset:** [asierhv/composite_corpus_eu_v2.1](https://huggingface.co/datasets/asierhv/composite_corpus_eu_v2.1). This dataset is a carefully curated compilation of Basque speech data, designed to enhance the effectiveness of Basque ASR systems.
+* **Evaluation Dataset:** The `test` portion of `asierhv/composite_corpus_eu_v2.1`.
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
+* **learning_rate:** 2.5e-05
+* **train_batch_size:** 32
+* **eval_batch_size:** 16
+* **seed:** 42
+* **optimizer:** AdamW with betas=(0.9, 0.999) and epsilon=1e-08
+* **lr_scheduler_type:** linear
+* **lr_scheduler_warmup_steps:** 500
+* **training_steps:** 10000
+* **mixed_precision_training:** Native AMP
+### Training results
+| Training Loss | Epoch | Step  | Validation Loss | WER      |
+|---------------|-------|-------|-----------------|----------|
+| 0.4816        | 0.1   | 1000  | 0.5136          | 25.7525  |
+| 0.2515        | 0.2   | 2000  | 0.4336          | 19.9950  |
+| 0.1792        | 0.3   | 3000  | 0.4054          | 17.6408  |
+| 0.2485        | 0.4   | 4000  | 0.3804          | 16.3794  |
+| 0.1007        | 0.5   | 5000  | 0.4056          | 15.2554  |
+| 0.1296        | 0.6   | 6000  | 0.3731          | 15.3241  |
+| 0.1555        | 0.7   | 7000  | 0.3764          | 13.3820  |
+| 0.114         | 0.8   | 8000  | 0.3097          | 12.7513  |
+| 0.0775        | 0.9   | 9000  | 0.3170          | 12.4578  |
+| 0.0836        | 1.0   | 10000 | 0.3183          | 12.3080  |
 ### Framework versions
+* Transformers 4.49.0.dev0
+* Pytorch 2.6.0+cu124
+* Datasets 3.3.1.dev0
+* Tokenizers 0.21.0