Hubert-common_voice-phoneme-ctc_zero_infinity

This model is a fine-tuned version of rinna/japanese-hubert-base on the MOZILLA-FOUNDATION/COMMON_VOICE_13_0 - JA dataset. It achieves the following results on the evaluation set:

Loss: 0.5230
Wer: 1.0
Cer: 0.1953

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 12500
num_epochs: 20.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
No log	0.2660	100	18.2381	1.1471	1.8153
No log	0.5319	200	8.1726	1.0	0.9817
No log	0.7979	300	6.9386	1.0	0.9817
No log	1.0638	400	6.2389	1.0	0.9817
8.8178	1.3298	500	5.4653	1.0	0.9817
8.8178	1.5957	600	4.6745	1.0	0.9817
8.8178	1.8617	700	3.9771	1.0	0.9817
8.8178	2.1277	800	3.4579	1.0	0.9817
8.8178	2.3936	900	3.1745	1.0	0.9817
3.6858	2.6596	1000	3.0675	1.0	0.9817
3.6858	2.9255	1100	3.0343	1.0	0.9817
3.6858	3.1915	1200	3.0102	1.0	0.9817
3.6858	3.4574	1300	2.9925	1.0	0.9817
3.6858	3.7234	1400	2.5595	1.0	0.9367
2.7891	3.9894	1500	1.5432	1.0	0.3742
2.7891	4.2553	1600	1.0799	1.0	0.2972
2.7891	4.5213	1700	0.8670	1.0	0.2639
2.7891	4.7872	1800	0.7350	1.0	0.2559
2.7891	5.0532	1900	0.6753	1.0	0.2468
0.9179	5.3191	2000	0.6171	1.0	0.2389
0.9179	5.5851	2100	0.5866	1.0	0.2386
0.9179	5.8511	2200	0.5649	1.0	0.2389
0.9179	6.1170	2300	0.5368	1.0	0.2321
0.9179	6.3830	2400	0.5225	1.0	0.2289
0.563	6.6489	2500	0.5042	1.0	0.2293
0.563	6.9149	2600	0.4918	1.0	0.2247
0.563	7.1809	2700	0.4881	1.0	0.2208
0.563	7.4468	2800	0.4787	1.0	0.2198
0.563	7.7128	2900	0.4692	1.0	0.2181
0.4453	7.9787	3000	0.4733	1.0	0.2151
0.4453	8.2447	3100	0.4585	1.0	0.2147
0.4453	8.5106	3200	0.4463	1.0	0.2116
0.4453	8.7766	3300	0.4183	1.0	0.2055
0.4453	9.0426	3400	0.4308	0.9998	0.2032
0.3596	9.3085	3500	0.4070	1.0	0.2022
0.3596	9.5745	3600	0.4259	1.0	0.2024
0.3596	9.8404	3700	0.4038	1.0	0.1985
0.3596	10.1064	3800	0.4272	1.0	0.1976
0.3596	10.3723	3900	0.3961	0.9998	0.1969
0.2945	10.6383	4000	0.4180	1.0	0.1943
0.2945	10.9043	4100	0.3999	1.0	0.1975
0.2945	11.1702	4200	0.3879	1.0	0.1930
0.2945	11.4362	4300	0.3799	1.0	0.1918
0.2945	11.7021	4400	0.3764	0.9998	0.1927
0.2605	11.9681	4500	0.3725	1.0	0.1919
0.2605	12.2340	4600	0.3910	1.0	0.1919
0.2605	12.5	4700	0.3851	0.9996	0.1908
0.2605	12.7660	4800	0.4115	1.0	0.1906
0.2605	13.0319	4900	0.3779	1.0	0.1894
0.2223	13.2979	5000	0.3956	1.0	0.1904
0.2223	13.5638	5100	0.4001	1.0	0.1907
0.2223	13.8298	5200	0.3891	1.0	0.1948
0.2223	14.0957	5300	0.3940	1.0	0.1902
0.2223	14.3617	5400	0.4056	1.0	0.1909
0.211	14.6277	5500	0.4000	0.9998	0.1929
0.211	14.8936	5600	0.3926	1.0	0.1895
0.211	15.1596	5700	0.3852	0.9998	0.1930
0.211	15.4255	5800	0.3864	1.0	0.1886
0.211	15.6915	5900	0.3951	0.9998	0.1909
0.1983	15.9574	6000	0.3951	1.0	0.1882
0.1983	16.2234	6100	0.4087	1.0	0.1918
0.1983	16.4894	6200	0.4150	1.0	0.1891
0.1983	16.7553	6300	0.4008	0.9998	0.1907
0.1983	17.0213	6400	0.4220	1.0	0.1943
0.1829	17.2872	6500	0.4154	1.0	0.1925
0.1829	17.5532	6600	0.4482	1.0	0.1959
0.1829	17.8191	6700	0.4217	0.9998	0.1939
0.1829	18.0851	6800	0.4383	0.9998	0.1916
0.1829	18.3511	6900	0.4226	1.0	0.1926
0.1757	18.6170	7000	0.4170	0.9998	0.1916
0.1757	18.8830	7100	0.4162	1.0	0.1918
0.1757	19.1489	7200	0.4350	0.9998	0.1910
0.1757	19.4149	7300	0.4403	1.0	0.2022
0.1757	19.6809	7400	0.4325	0.9998	0.1944
0.1801	19.9468	7500	0.5488	1.0	0.1977

Framework versions

Transformers 4.47.0.dev0
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.20.3

utakumi
/

Hubert-common_voice-phoneme-ctc_zero_infinity

Hubert-common_voice-phoneme-ctc_zero_infinity

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for utakumi/Hubert-common_voice-phoneme-ctc_zero_infinity

Evaluation results