mms-1b-swagen-combined-30hrs-model

This model is a fine-tuned version of facebook/mms-1b-all on the SWAGEN - SWA dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2278
  • Wer: 0.1922

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 30.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
16.9208 0.0399 100 4.2681 0.9999
7.2021 0.0798 200 3.4068 1.0027
6.725 0.1196 300 3.1580 1.0146
6.2214 0.1595 400 3.0439 1.0027
5.9996 0.1994 500 3.0050 1.0016
5.8133 0.2393 600 2.9415 1.0
5.805 0.2792 700 2.8951 0.9997
5.7621 0.3190 800 2.9054 0.9996
5.6973 0.3589 900 2.8109 0.9909
5.6541 0.3988 1000 2.8355 0.9911
5.4159 0.4387 1100 2.7632 0.9783
5.4112 0.4786 1200 2.6499 0.9793
4.2059 0.5184 1300 0.3675 0.2617
0.6493 0.5583 1400 0.2747 0.2077
0.5624 0.5982 1500 0.2649 0.2023
0.5197 0.6381 1600 0.2619 0.1988
0.4715 0.6780 1700 0.2589 0.1982
0.5126 0.7178 1800 0.2518 0.1979
0.4916 0.7577 1900 0.2549 0.1958
0.4667 0.7976 2000 0.2501 0.1947
0.4713 0.8375 2100 0.2479 0.1943
0.4875 0.8774 2200 0.2449 0.1931
0.4611 0.9172 2300 0.2436 0.1935
0.4587 0.9571 2400 0.2434 0.1928
0.4679 0.9970 2500 0.2409 0.1895
0.4141 1.0367 2600 0.2331 0.1896
0.4263 1.0766 2700 0.2329 0.1920
0.4142 1.1165 2800 0.2324 0.1918
0.4606 1.1563 2900 0.2257 0.1943
0.4048 1.1962 3000 0.2289 0.1928
0.4172 1.2361 3100 0.2326 0.1938
0.4294 1.2760 3200 0.2327 0.1941
0.468 1.3159 3300 0.2277 0.1922

Framework versions

  • Transformers 4.47.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
11
Safetensors
Model size
965M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for csikasote/mms-1b-swagen-combined-30hrs-model

Finetuned
(214)
this model