mc4-xglm-tagged
This model is a fine-tuned version of bowphs/xglm-163M on the None dataset. It achieves the following results on the evaluation set:
- Loss: 4.0534
- Accuracy: 0.3398
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- training_steps: 200000
Training results
Training Loss | Epoch | Step | Accuracy | Validation Loss |
---|---|---|---|---|
No log | 5e-06 | 1 | 0.0020 | 12.4240 |
No log | 1e-05 | 2 | 0.0029 | 12.3374 |
No log | 2e-05 | 4 | 0.0231 | 12.2448 |
No log | 4e-05 | 8 | 0.0306 | 12.0572 |
No log | 8e-05 | 16 | 0.0307 | 11.8282 |
No log | 0.0002 | 32 | 0.0306 | 11.4492 |
No log | 0.0003 | 64 | 0.0306 | 10.7609 |
No log | 0.0006 | 128 | 0.0306 | 9.8285 |
No log | 0.0013 | 256 | 0.0306 | 9.3843 |
No log | 0.0026 | 512 | 0.0378 | 9.1485 |
No log | 0.0051 | 1024 | 0.0518 | 8.4360 |
17.3414 | 0.01 | 2000 | 0.0680 | 7.6883 |
17.3414 | 0.0102 | 2048 | 0.0683 | 7.6605 |
14.6968 | 0.02 | 4000 | 0.0914 | 7.0249 |
14.6968 | 0.0205 | 4096 | 0.0931 | 6.9956 |
13.7374 | 0.03 | 6000 | 0.1091 | 6.6715 |
13.1405 | 0.04 | 8000 | 0.1233 | 6.4087 |
13.1405 | 0.0410 | 8192 | 0.1236 | 6.4003 |
12.7338 | 0.05 | 10000 | 0.1349 | 6.2102 |
12.3665 | 0.06 | 12000 | 0.1468 | 6.0366 |
12.0031 | 0.07 | 14000 | 0.1590 | 5.8640 |
11.6931 | 0.08 | 16000 | 0.1745 | 5.6796 |
11.6931 | 0.0819 | 16384 | 0.1781 | 5.6404 |
11.4682 | 0.09 | 18000 | 0.1930 | 5.5012 |
11.1718 | 0.1 | 20000 | 0.2140 | 5.3323 |
10.8851 | 0.11 | 22000 | 0.2342 | 5.1822 |
10.6261 | 0.12 | 24000 | 0.2479 | 5.0688 |
10.4129 | 0.13 | 26000 | 0.2584 | 4.9804 |
10.2244 | 0.14 | 28000 | 0.2658 | 4.9035 |
10.1275 | 0.15 | 30000 | 0.2715 | 4.8421 |
9.9696 | 0.16 | 32000 | 0.2757 | 4.7917 |
9.9696 | 0.1638 | 32768 | 0.2771 | 4.7730 |
9.9088 | 0.17 | 34000 | 0.2797 | 4.7403 |
9.8163 | 0.18 | 36000 | 0.2829 | 4.7128 |
9.714 | 0.19 | 38000 | 0.2860 | 4.6710 |
9.6061 | 0.2 | 40000 | 0.2885 | 4.6433 |
9.5284 | 0.21 | 42000 | 0.2915 | 4.6084 |
9.5082 | 0.22 | 44000 | 0.2931 | 4.5818 |
9.3385 | 0.23 | 46000 | 4.5475 | 0.2959 |
9.3818 | 0.24 | 48000 | 4.5309 | 0.2975 |
9.2562 | 0.25 | 50000 | 4.5150 | 0.2994 |
9.2209 | 1.0004 | 52000 | 4.4888 | 0.3011 |
9.2619 | 1.0104 | 54000 | 4.4623 | 0.3034 |
9.1992 | 1.0204 | 56000 | 4.4588 | 0.3042 |
9.132 | 1.0304 | 58000 | 4.4341 | 0.3060 |
9.1641 | 1.0404 | 60000 | 4.4246 | 0.3071 |
9.0598 | 1.0504 | 62000 | 4.4065 | 0.3085 |
9.0775 | 1.0604 | 64000 | 4.3928 | 0.3097 |
9.0233 | 1.0704 | 66000 | 4.3776 | 0.3108 |
8.992 | 1.0804 | 68000 | 4.3605 | 0.3122 |
8.9967 | 1.0904 | 70000 | 4.3472 | 0.3136 |
8.9847 | 1.1004 | 72000 | 4.3430 | 0.3138 |
8.9378 | 1.1104 | 74000 | 4.3249 | 0.3155 |
8.899 | 1.1204 | 76000 | 4.3146 | 0.3160 |
8.8777 | 1.1304 | 78000 | 4.3077 | 0.3164 |
8.8575 | 1.1404 | 80000 | 4.2992 | 0.3175 |
8.8785 | 1.1504 | 82000 | 4.2820 | 0.3185 |
8.825 | 1.1604 | 84000 | 4.2734 | 0.3200 |
8.8469 | 1.1704 | 86000 | 4.2661 | 0.3204 |
8.8329 | 1.1804 | 88000 | 4.2586 | 0.3214 |
8.7714 | 1.1904 | 90000 | 4.2488 | 0.3223 |
8.7412 | 1.2004 | 92000 | 4.2404 | 0.3230 |
8.7164 | 1.2104 | 94000 | 4.2331 | 0.3239 |
8.7257 | 1.2204 | 96000 | 4.2220 | 0.3242 |
8.675 | 1.2304 | 98000 | 4.2185 | 0.3248 |
8.6753 | 1.2404 | 100000 | 4.2104 | 0.3256 |
8.5776 | 1.2504 | 102000 | 4.2026 | 0.3260 |
8.6105 | 2.0008 | 104000 | 4.1961 | 0.3266 |
8.6428 | 2.0109 | 106000 | 4.1912 | 0.3275 |
8.6302 | 2.0208 | 108000 | 4.1849 | 0.3275 |
8.5735 | 2.0309 | 110000 | 4.1791 | 0.3285 |
8.6267 | 2.0408 | 112000 | 4.1791 | 0.3291 |
8.5621 | 2.0509 | 114000 | 4.1696 | 0.3292 |
8.5838 | 2.0608 | 116000 | 4.1626 | 0.3298 |
8.5444 | 2.0709 | 118000 | 4.1546 | 0.3302 |
8.5542 | 2.0808 | 120000 | 4.1502 | 0.3312 |
8.5692 | 2.0909 | 122000 | 4.1496 | 0.3311 |
8.5635 | 2.1008 | 124000 | 4.1431 | 0.3313 |
8.5386 | 2.1109 | 126000 | 4.1394 | 0.3323 |
8.5085 | 2.1208 | 128000 | 4.1350 | 0.3322 |
8.5007 | 2.1309 | 130000 | 4.1291 | 0.3328 |
8.4887 | 2.1408 | 132000 | 4.1282 | 0.3329 |
8.5454 | 2.1509 | 134000 | 4.1185 | 0.3334 |
8.4889 | 2.1608 | 136000 | 4.1187 | 0.3340 |
8.526 | 2.1709 | 138000 | 4.1142 | 0.3339 |
8.5198 | 2.1808 | 140000 | 4.1111 | 0.3343 |
8.4588 | 2.1909 | 142000 | 4.1063 | 0.3347 |
8.455 | 2.2008 | 144000 | 4.1030 | 0.3352 |
8.4381 | 2.2109 | 146000 | 4.0991 | 0.3355 |
8.444 | 2.2208 | 148000 | 4.0952 | 0.3354 |
8.4109 | 2.2309 | 150000 | 4.0936 | 0.3361 |
8.4216 | 2.2409 | 152000 | 4.0916 | 0.3362 |
8.3161 | 2.2508 | 154000 | 4.0881 | 0.3367 |
8.3631 | 3.0013 | 156000 | 4.0874 | 0.3368 |
8.4031 | 3.0113 | 158000 | 4.0881 | 0.3368 |
8.4078 | 3.0213 | 160000 | 4.0827 | 0.3371 |
8.3695 | 3.0313 | 162000 | 4.0804 | 0.3373 |
8.4121 | 3.0413 | 164000 | 4.0744 | 0.3376 |
8.3457 | 3.0513 | 166000 | 4.0762 | 0.3382 |
8.3958 | 3.0613 | 168000 | 4.0742 | 0.3381 |
8.3473 | 3.0713 | 170000 | 4.0705 | 0.3383 |
8.3675 | 3.0813 | 172000 | 4.0683 | 0.3387 |
8.3904 | 3.0913 | 174000 | 4.0692 | 0.3385 |
8.3825 | 3.1013 | 176000 | 4.0659 | 0.3386 |
8.3694 | 3.1113 | 178000 | 4.0635 | 0.3390 |
8.3522 | 3.1213 | 180000 | 4.0630 | 0.3391 |
8.352 | 3.1313 | 182000 | 4.0617 | 0.3392 |
8.3289 | 3.1413 | 184000 | 4.0601 | 0.3393 |
8.3981 | 3.1513 | 186000 | 4.0582 | 0.3394 |
8.3441 | 3.1613 | 188000 | 4.0576 | 0.3396 |
8.3991 | 3.1713 | 190000 | 4.0568 | 0.3396 |
8.3922 | 3.1813 | 192000 | 4.0555 | 0.3396 |
8.3236 | 3.1913 | 194000 | 4.0556 | 0.3398 |
8.3345 | 3.2013 | 196000 | 4.0541 | 0.3398 |
8.3259 | 3.2113 | 198000 | 4.0534 | 0.3400 |
8.3298 | 3.2213 | 200000 | 4.0534 | 0.3398 |
Framework versions
- Transformers 4.48.0.dev0
- Pytorch 2.5.1+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0
- Downloads last month
- 15,770
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.