mc4-xglm-tagged-base

This model is a fine-tuned version of bowphs/xglm-257M on the None dataset. It achieves the following results on the evaluation set:

Accuracy: 0.3726
Loss: 3.7613

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
training_steps: 1000000

Training results

Training Loss	Epoch	Step	Accuracy	Validation Loss
No log	1e-06	1	0.0294	12.3133
No log	2e-06	2	0.0307	12.2422
No log	4e-06	4	0.0034	12.2271
No log	8e-06	8	0.0306	12.0499
No log	0.0000	16	0.0280	11.8602
No log	0.0000	32	0.0306	11.4877
No log	0.0001	64	0.0307	10.8011
No log	0.0001	128	0.0306	9.8703
No log	0.0003	256	0.0301	9.3693
No log	0.0005	512	0.0416	9.0687
No log	0.0010	1024	0.0525	8.3611
17.2276	0.002	2000	0.0678	7.6542
17.2276	0.0020	2048	0.0678	7.6283
14.6302	0.004	4000	0.0892	7.0092
14.6302	0.0041	4096	0.0913	6.9777
13.6739	0.006	6000	0.1067	6.6596
13.0658	0.008	8000	0.1232	6.3721
13.0658	0.0082	8192	0.1237	6.3621
12.6223	0.01	10000	0.1376	6.1424
12.1785	0.012	12000	0.1563	5.8987
11.6802	0.014	14000	0.1857	5.6095
11.1743	0.016	16000	0.2213	5.3249
11.1743	0.0164	16384	0.2281	5.2768
10.7738	0.018	18000	0.2480	5.1219
10.4178	0.02	20000	0.2640	4.9929
10.1491	0.022	22000	0.2732	4.8845
9.934	0.024	24000	0.2802	4.7947
9.7615	0.026	26000	0.2860	4.7243
9.5947	0.028	28000	0.2918	4.6485
9.5259	0.03	30000	0.2956	4.5979
9.3701	0.032	32000	0.2993	4.5421
9.3701	0.0328	32768	0.3012	4.5245
9.3106	0.034	34000	0.3035	4.4962
9.2248	0.036	36000	0.3069	4.4722
9.1151	0.038	38000	0.3099	4.4183
9.0054	0.04	40000	0.3132	4.3826
8.9254	0.042	42000	0.3167	4.3473
8.899	0.044	44000	0.3191	4.3118
8.7951	0.046	46000	0.3213	4.2810
8.7668	0.048	48000	0.3238	4.2601
8.6323	0.05	50000	0.3259	4.2360
8.5954	1.0001	52000	0.3278	4.2179
8.6282	1.0021	54000	0.3309	4.1831
8.5655	1.0041	56000	0.3323	4.1695
8.4928	1.0061	58000	0.3342	4.1452
8.5173	1.0081	60000	0.3356	4.1297
8.4115	1.0101	62000	0.3377	4.1152
8.4264	1.0121	64000	0.3390	4.0969
8.4264	1.0136	65536	0.3401	4.0832
8.3665	1.0141	66000	0.3403	4.0796
8.3269	1.0161	68000	0.3424	4.0587
8.3235	1.0181	70000	0.3442	4.0449
8.3084	1.0201	72000	0.3447	4.0344
8.2626	1.0221	74000	0.3465	4.0173
8.2159	1.0241	76000	0.3481	4.0052
8.1909	1.0261	78000	0.3470	4.0121
8.1668	1.0281	80000	0.3482	3.9881
8.1834	1.0301	82000	0.3502	3.9693
8.1224	1.0321	84000	0.3525	3.9584
8.1373	1.0341	86000	0.3532	3.9491
8.1292	1.0361	88000	0.3533	3.9458
8.0558	1.0381	90000	0.3550	3.9267
8.0256	1.0401	92000	0.3571	3.9133
7.9978	1.0421	94000	0.3578	3.9021
8.0016	1.0441	96000	0.3583	3.8952
7.9416	1.0461	98000	0.3591	3.8897
7.9475	1.0481	100000	0.3600	3.8779
7.8433	1.0501	102000	0.3612	3.8705
7.8756	2.0002	104000	0.3620	3.8657
7.9009	2.0022	106000	0.3628	3.8557
7.89	2.0042	108000	0.3633	3.8487
7.8288	2.0062	110000	0.3651	3.8367
7.8747	2.0082	112000	0.3656	3.8324
7.8107	2.0102	114000	0.3654	3.8288
7.8301	2.0122	116000	0.3672	3.8152
7.7904	2.0142	118000	0.3671	3.8140
7.7856	2.0162	120000	0.3680	3.8058
7.7961	2.0182	122000	0.3688	3.7996
7.7909	2.0202	124000	0.3694	3.7952
7.7698	2.0222	126000	0.3701	3.7863
7.7299	2.0242	128000	0.3703	3.7823
7.718	2.0262	130000	0.3697	3.7823
7.718	2.0272	131072	0.3707	3.7746
7.7039	2.0282	132000	0.3708	3.7754
7.7561	2.0302	134000	0.3726	3.7613

Framework versions

Transformers 4.48.0.dev0
Pytorch 2.5.1+cu124
Datasets 3.2.0
Tokenizers 0.21.0

bowphs
/

mc4-xglm-tagged-base

mc4-xglm-tagged-base

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for bowphs/mc4-xglm-tagged-base

Evaluation results