mc4-xglm-tagged

This model is a fine-tuned version of bowphs/xglm-163M on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 4.0534
  • Accuracy: 0.3398

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • training_steps: 200000

Training results

Training Loss Epoch Step Accuracy Validation Loss
No log 5e-06 1 0.0020 12.4240
No log 1e-05 2 0.0029 12.3374
No log 2e-05 4 0.0231 12.2448
No log 4e-05 8 0.0306 12.0572
No log 8e-05 16 0.0307 11.8282
No log 0.0002 32 0.0306 11.4492
No log 0.0003 64 0.0306 10.7609
No log 0.0006 128 0.0306 9.8285
No log 0.0013 256 0.0306 9.3843
No log 0.0026 512 0.0378 9.1485
No log 0.0051 1024 0.0518 8.4360
17.3414 0.01 2000 0.0680 7.6883
17.3414 0.0102 2048 0.0683 7.6605
14.6968 0.02 4000 0.0914 7.0249
14.6968 0.0205 4096 0.0931 6.9956
13.7374 0.03 6000 0.1091 6.6715
13.1405 0.04 8000 0.1233 6.4087
13.1405 0.0410 8192 0.1236 6.4003
12.7338 0.05 10000 0.1349 6.2102
12.3665 0.06 12000 0.1468 6.0366
12.0031 0.07 14000 0.1590 5.8640
11.6931 0.08 16000 0.1745 5.6796
11.6931 0.0819 16384 0.1781 5.6404
11.4682 0.09 18000 0.1930 5.5012
11.1718 0.1 20000 0.2140 5.3323
10.8851 0.11 22000 0.2342 5.1822
10.6261 0.12 24000 0.2479 5.0688
10.4129 0.13 26000 0.2584 4.9804
10.2244 0.14 28000 0.2658 4.9035
10.1275 0.15 30000 0.2715 4.8421
9.9696 0.16 32000 0.2757 4.7917
9.9696 0.1638 32768 0.2771 4.7730
9.9088 0.17 34000 0.2797 4.7403
9.8163 0.18 36000 0.2829 4.7128
9.714 0.19 38000 0.2860 4.6710
9.6061 0.2 40000 0.2885 4.6433
9.5284 0.21 42000 0.2915 4.6084
9.5082 0.22 44000 0.2931 4.5818
9.3385 0.23 46000 4.5475 0.2959
9.3818 0.24 48000 4.5309 0.2975
9.2562 0.25 50000 4.5150 0.2994
9.2209 1.0004 52000 4.4888 0.3011
9.2619 1.0104 54000 4.4623 0.3034
9.1992 1.0204 56000 4.4588 0.3042
9.132 1.0304 58000 4.4341 0.3060
9.1641 1.0404 60000 4.4246 0.3071
9.0598 1.0504 62000 4.4065 0.3085
9.0775 1.0604 64000 4.3928 0.3097
9.0233 1.0704 66000 4.3776 0.3108
8.992 1.0804 68000 4.3605 0.3122
8.9967 1.0904 70000 4.3472 0.3136
8.9847 1.1004 72000 4.3430 0.3138
8.9378 1.1104 74000 4.3249 0.3155
8.899 1.1204 76000 4.3146 0.3160
8.8777 1.1304 78000 4.3077 0.3164
8.8575 1.1404 80000 4.2992 0.3175
8.8785 1.1504 82000 4.2820 0.3185
8.825 1.1604 84000 4.2734 0.3200
8.8469 1.1704 86000 4.2661 0.3204
8.8329 1.1804 88000 4.2586 0.3214
8.7714 1.1904 90000 4.2488 0.3223
8.7412 1.2004 92000 4.2404 0.3230
8.7164 1.2104 94000 4.2331 0.3239
8.7257 1.2204 96000 4.2220 0.3242
8.675 1.2304 98000 4.2185 0.3248
8.6753 1.2404 100000 4.2104 0.3256
8.5776 1.2504 102000 4.2026 0.3260
8.6105 2.0008 104000 4.1961 0.3266
8.6428 2.0109 106000 4.1912 0.3275
8.6302 2.0208 108000 4.1849 0.3275
8.5735 2.0309 110000 4.1791 0.3285
8.6267 2.0408 112000 4.1791 0.3291
8.5621 2.0509 114000 4.1696 0.3292
8.5838 2.0608 116000 4.1626 0.3298
8.5444 2.0709 118000 4.1546 0.3302
8.5542 2.0808 120000 4.1502 0.3312
8.5692 2.0909 122000 4.1496 0.3311
8.5635 2.1008 124000 4.1431 0.3313
8.5386 2.1109 126000 4.1394 0.3323
8.5085 2.1208 128000 4.1350 0.3322
8.5007 2.1309 130000 4.1291 0.3328
8.4887 2.1408 132000 4.1282 0.3329
8.5454 2.1509 134000 4.1185 0.3334
8.4889 2.1608 136000 4.1187 0.3340
8.526 2.1709 138000 4.1142 0.3339
8.5198 2.1808 140000 4.1111 0.3343
8.4588 2.1909 142000 4.1063 0.3347
8.455 2.2008 144000 4.1030 0.3352
8.4381 2.2109 146000 4.0991 0.3355
8.444 2.2208 148000 4.0952 0.3354
8.4109 2.2309 150000 4.0936 0.3361
8.4216 2.2409 152000 4.0916 0.3362
8.3161 2.2508 154000 4.0881 0.3367
8.3631 3.0013 156000 4.0874 0.3368
8.4031 3.0113 158000 4.0881 0.3368
8.4078 3.0213 160000 4.0827 0.3371
8.3695 3.0313 162000 4.0804 0.3373
8.4121 3.0413 164000 4.0744 0.3376
8.3457 3.0513 166000 4.0762 0.3382
8.3958 3.0613 168000 4.0742 0.3381
8.3473 3.0713 170000 4.0705 0.3383
8.3675 3.0813 172000 4.0683 0.3387
8.3904 3.0913 174000 4.0692 0.3385
8.3825 3.1013 176000 4.0659 0.3386
8.3694 3.1113 178000 4.0635 0.3390
8.3522 3.1213 180000 4.0630 0.3391
8.352 3.1313 182000 4.0617 0.3392
8.3289 3.1413 184000 4.0601 0.3393
8.3981 3.1513 186000 4.0582 0.3394
8.3441 3.1613 188000 4.0576 0.3396
8.3991 3.1713 190000 4.0568 0.3396
8.3922 3.1813 192000 4.0555 0.3396
8.3236 3.1913 194000 4.0556 0.3398
8.3345 3.2013 196000 4.0541 0.3398
8.3259 3.2113 198000 4.0534 0.3400
8.3298 3.2213 200000 4.0534 0.3398

Framework versions

  • Transformers 4.48.0.dev0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
15,770
Safetensors
Model size
163M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.