Hubert-common_voice-phoneme-onlyJSUT

This model is a fine-tuned version of rinna/japanese-hubert-base on the MOZILLA-FOUNDATION/COMMON_VOICE_13_0 - JA dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1563
  • Wer: 1.0
  • Cer: 0.1052

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 12500
  • num_epochs: 20.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer Cer
No log 0.7092 100 11.3614 1.054 0.9861
No log 1.4184 200 5.9358 1.0 0.9851
No log 2.1277 300 5.3101 1.0 0.9851
No log 2.8369 400 4.8953 1.0 0.9851
6.9061 3.5461 500 4.4021 1.0 0.9851
6.9061 4.2553 600 3.9323 1.0 0.9851
6.9061 4.9645 700 3.4932 1.0 0.9851
6.9061 5.6738 800 3.2092 1.0 0.9850
6.9061 6.3830 900 3.0484 1.0 0.9851
3.4303 7.0922 1000 2.9961 1.0 0.9850
3.4303 7.8014 1100 2.8000 1.0 0.9850
3.4303 8.5106 1200 1.9061 1.0 0.5949
3.4303 9.2199 1300 0.8767 1.0 0.1547
3.4303 9.9291 1400 0.5386 1.0 0.1268
1.6163 10.6383 1500 0.3820 1.0 0.1190
1.6163 11.3475 1600 0.2983 1.0 0.1138
1.6163 12.0567 1700 0.2524 1.0 0.1117
1.6163 12.7660 1800 0.2260 1.0 0.1104
1.6163 13.4752 1900 0.2096 1.0 0.1110
0.332 14.1844 2000 0.1896 0.998 0.1092
0.332 14.8936 2100 0.1838 1.0 0.1095
0.332 15.6028 2200 0.1766 1.0 0.1081
0.332 16.3121 2300 0.1688 0.998 0.1071
0.332 17.0213 2400 0.1667 0.998 0.1069
0.2296 17.7305 2500 0.1643 1.0 0.1069
0.2296 18.4397 2600 0.1602 1.0 0.1071
0.2296 19.1489 2700 0.1654 1.0 0.1068
0.2296 19.8582 2800 0.1617 0.998 0.1060

Framework versions

  • Transformers 4.47.0.dev0
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
16
Safetensors
Model size
94.4M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for utakumi/Hubert-common_voice-phoneme-onlyJSUT

Finetuned
(21)
this model