mc4-xglm-tagged-base
This model is a fine-tuned version of bowphs/xglm-257M on the None dataset. It achieves the following results on the evaluation set:
- Accuracy: 0.3726
- Loss: 3.7613
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- training_steps: 1000000
Training results
Training Loss | Epoch | Step | Accuracy | Validation Loss |
---|---|---|---|---|
No log | 1e-06 | 1 | 0.0294 | 12.3133 |
No log | 2e-06 | 2 | 0.0307 | 12.2422 |
No log | 4e-06 | 4 | 0.0034 | 12.2271 |
No log | 8e-06 | 8 | 0.0306 | 12.0499 |
No log | 0.0000 | 16 | 0.0280 | 11.8602 |
No log | 0.0000 | 32 | 0.0306 | 11.4877 |
No log | 0.0001 | 64 | 0.0307 | 10.8011 |
No log | 0.0001 | 128 | 0.0306 | 9.8703 |
No log | 0.0003 | 256 | 0.0301 | 9.3693 |
No log | 0.0005 | 512 | 0.0416 | 9.0687 |
No log | 0.0010 | 1024 | 0.0525 | 8.3611 |
17.2276 | 0.002 | 2000 | 0.0678 | 7.6542 |
17.2276 | 0.0020 | 2048 | 0.0678 | 7.6283 |
14.6302 | 0.004 | 4000 | 0.0892 | 7.0092 |
14.6302 | 0.0041 | 4096 | 0.0913 | 6.9777 |
13.6739 | 0.006 | 6000 | 0.1067 | 6.6596 |
13.0658 | 0.008 | 8000 | 0.1232 | 6.3721 |
13.0658 | 0.0082 | 8192 | 0.1237 | 6.3621 |
12.6223 | 0.01 | 10000 | 0.1376 | 6.1424 |
12.1785 | 0.012 | 12000 | 0.1563 | 5.8987 |
11.6802 | 0.014 | 14000 | 0.1857 | 5.6095 |
11.1743 | 0.016 | 16000 | 0.2213 | 5.3249 |
11.1743 | 0.0164 | 16384 | 0.2281 | 5.2768 |
10.7738 | 0.018 | 18000 | 0.2480 | 5.1219 |
10.4178 | 0.02 | 20000 | 0.2640 | 4.9929 |
10.1491 | 0.022 | 22000 | 0.2732 | 4.8845 |
9.934 | 0.024 | 24000 | 0.2802 | 4.7947 |
9.7615 | 0.026 | 26000 | 0.2860 | 4.7243 |
9.5947 | 0.028 | 28000 | 0.2918 | 4.6485 |
9.5259 | 0.03 | 30000 | 0.2956 | 4.5979 |
9.3701 | 0.032 | 32000 | 0.2993 | 4.5421 |
9.3701 | 0.0328 | 32768 | 0.3012 | 4.5245 |
9.3106 | 0.034 | 34000 | 0.3035 | 4.4962 |
9.2248 | 0.036 | 36000 | 0.3069 | 4.4722 |
9.1151 | 0.038 | 38000 | 0.3099 | 4.4183 |
9.0054 | 0.04 | 40000 | 0.3132 | 4.3826 |
8.9254 | 0.042 | 42000 | 0.3167 | 4.3473 |
8.899 | 0.044 | 44000 | 0.3191 | 4.3118 |
8.7951 | 0.046 | 46000 | 0.3213 | 4.2810 |
8.7668 | 0.048 | 48000 | 0.3238 | 4.2601 |
8.6323 | 0.05 | 50000 | 0.3259 | 4.2360 |
8.5954 | 1.0001 | 52000 | 0.3278 | 4.2179 |
8.6282 | 1.0021 | 54000 | 0.3309 | 4.1831 |
8.5655 | 1.0041 | 56000 | 0.3323 | 4.1695 |
8.4928 | 1.0061 | 58000 | 0.3342 | 4.1452 |
8.5173 | 1.0081 | 60000 | 0.3356 | 4.1297 |
8.4115 | 1.0101 | 62000 | 0.3377 | 4.1152 |
8.4264 | 1.0121 | 64000 | 0.3390 | 4.0969 |
8.4264 | 1.0136 | 65536 | 0.3401 | 4.0832 |
8.3665 | 1.0141 | 66000 | 0.3403 | 4.0796 |
8.3269 | 1.0161 | 68000 | 0.3424 | 4.0587 |
8.3235 | 1.0181 | 70000 | 0.3442 | 4.0449 |
8.3084 | 1.0201 | 72000 | 0.3447 | 4.0344 |
8.2626 | 1.0221 | 74000 | 0.3465 | 4.0173 |
8.2159 | 1.0241 | 76000 | 0.3481 | 4.0052 |
8.1909 | 1.0261 | 78000 | 0.3470 | 4.0121 |
8.1668 | 1.0281 | 80000 | 0.3482 | 3.9881 |
8.1834 | 1.0301 | 82000 | 0.3502 | 3.9693 |
8.1224 | 1.0321 | 84000 | 0.3525 | 3.9584 |
8.1373 | 1.0341 | 86000 | 0.3532 | 3.9491 |
8.1292 | 1.0361 | 88000 | 0.3533 | 3.9458 |
8.0558 | 1.0381 | 90000 | 0.3550 | 3.9267 |
8.0256 | 1.0401 | 92000 | 0.3571 | 3.9133 |
7.9978 | 1.0421 | 94000 | 0.3578 | 3.9021 |
8.0016 | 1.0441 | 96000 | 0.3583 | 3.8952 |
7.9416 | 1.0461 | 98000 | 0.3591 | 3.8897 |
7.9475 | 1.0481 | 100000 | 0.3600 | 3.8779 |
7.8433 | 1.0501 | 102000 | 0.3612 | 3.8705 |
7.8756 | 2.0002 | 104000 | 0.3620 | 3.8657 |
7.9009 | 2.0022 | 106000 | 0.3628 | 3.8557 |
7.89 | 2.0042 | 108000 | 0.3633 | 3.8487 |
7.8288 | 2.0062 | 110000 | 0.3651 | 3.8367 |
7.8747 | 2.0082 | 112000 | 0.3656 | 3.8324 |
7.8107 | 2.0102 | 114000 | 0.3654 | 3.8288 |
7.8301 | 2.0122 | 116000 | 0.3672 | 3.8152 |
7.7904 | 2.0142 | 118000 | 0.3671 | 3.8140 |
7.7856 | 2.0162 | 120000 | 0.3680 | 3.8058 |
7.7961 | 2.0182 | 122000 | 0.3688 | 3.7996 |
7.7909 | 2.0202 | 124000 | 0.3694 | 3.7952 |
7.7698 | 2.0222 | 126000 | 0.3701 | 3.7863 |
7.7299 | 2.0242 | 128000 | 0.3703 | 3.7823 |
7.718 | 2.0262 | 130000 | 0.3697 | 3.7823 |
7.718 | 2.0272 | 131072 | 0.3707 | 3.7746 |
7.7039 | 2.0282 | 132000 | 0.3708 | 3.7754 |
7.7561 | 2.0302 | 134000 | 0.3726 | 3.7613 |
Framework versions
- Transformers 4.48.0.dev0
- Pytorch 2.5.1+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0
- Downloads last month
- 24,614
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for bowphs/mc4-xglm-tagged-base
Base model
bowphs/xglm-257M