xezpeleta commited on
Commit
b6567cc
·
verified ·
1 Parent(s): 0d1f7dd

Training in progress, step 1000

Browse files
Files changed (4) hide show
  1. README.md +15 -1
  2. model.safetensors +1 -1
  3. run.sh +2 -2
  4. training_args.bin +1 -1
README.md CHANGED
@@ -2,6 +2,9 @@
2
  library_name: transformers
3
  license: apache-2.0
4
  base_model: openai/whisper-tiny
 
 
 
5
  tags:
6
  - whisper-event
7
  - generated_from_trainer
@@ -81,7 +84,18 @@ should probably proofread and complete it, then remove this comment. -->
81
 
82
  # Whisper Tiny Basque
83
 
84
- This model is a fine-tuned version of [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) on the asierhv/composite_corpus_eu_v2.1 dataset.
 
 
 
 
 
 
 
 
 
 
 
85
  It achieves the following results on the evaluation set:
86
  - Loss: 0.3002
87
  - Wer: 14.9855
 
2
  library_name: transformers
3
  license: apache-2.0
4
  base_model: openai/whisper-tiny
5
+ language: eu
6
+ task_categories:
7
+ - automatic-speech-recognition
8
  tags:
9
  - whisper-event
10
  - generated_from_trainer
 
84
 
85
  # Whisper Tiny Basque
86
 
87
+ This model is a fine-tuned version of [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) on the [Composite Corpus eu v2.1](asierhv/composite_corpus_eu_v2.1), which is a mix of the following datasets:
88
+
89
+ | Split tag | Source | Hours | Sentences |
90
+ |:---------:|:--------------------:|:------------:|:----------:|
91
+ | - | [Mozilla Common Voice 18.0](https://commonvoice.mozilla.org/eu/datasets) | 300.05 h | 198498 |
92
+ | - | [Basque Parliament 1](https://www.mdpi.com/2076-3417/14/5/1951) | 369.65 h | 185699 |
93
+ | - | [OpenSLR](https://openslr.org/76/) | 6.28 h | 3229 |
94
+ | train | **Total** | **675.98 h** | **387426** |
95
+
96
+
97
+
98
+
99
  It achieves the following results on the evaluation set:
100
  - Loss: 0.3002
101
  - Wer: 14.9855
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:fcebbc1a74d7f85df51b51865b36a2eea643706d380a9109e5042e2fa9dabae1
3
  size 151061672
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9de0167984a9b5d63188a5b5377ca1c7574a0dbba1d32eb0b66180fe2320b689
3
  size 151061672
run.sh CHANGED
@@ -4,7 +4,7 @@ python run_speech_recognition_seq2seq_streaming.py \
4
  --dataset_name="asierhv/composite_corpus_eu_v2.1" \
5
  --language="basque" \
6
  --train_split_name="train" \
7
- --eval_split_name="dev_parl+test_parl+test_cv+test_oslr" \
8
  --model_index_name="Whisper Tiny Basque" \
9
  --max_steps="10000" \
10
  --output_dir="./" \
@@ -39,4 +39,4 @@ python run_speech_recognition_seq2seq_streaming.py \
39
  --use_auth_token \
40
  --push_to_hub \
41
  --report_to "wandb" \
42
- --run_name "whisper-tiny-eu-2025.02"
 
4
  --dataset_name="asierhv/composite_corpus_eu_v2.1" \
5
  --language="basque" \
6
  --train_split_name="train" \
7
+ --eval_split_name="dev" \
8
  --model_index_name="Whisper Tiny Basque" \
9
  --max_steps="10000" \
10
  --output_dir="./" \
 
39
  --use_auth_token \
40
  --push_to_hub \
41
  --report_to "wandb" \
42
+ --run_name "whisper-tiny-eu-25.02-r1"
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c8fab03bf304fb55d38b90b211076f3d0e1c4fa363c503e1759ca46e2d581f5d
3
  size 5432
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4389fb57a99dec97399adc10b1e73349730a2d707c30fadbc0f182203ba3467d
3
  size 5432