mc4-xglm-tagged-base

This model is a fine-tuned version of bowphs/xglm-257M on the None dataset. It achieves the following results on the evaluation set:

  • Accuracy: 0.3726
  • Loss: 3.7613

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • training_steps: 1000000

Training results

Training Loss Epoch Step Accuracy Validation Loss
No log 1e-06 1 0.0294 12.3133
No log 2e-06 2 0.0307 12.2422
No log 4e-06 4 0.0034 12.2271
No log 8e-06 8 0.0306 12.0499
No log 0.0000 16 0.0280 11.8602
No log 0.0000 32 0.0306 11.4877
No log 0.0001 64 0.0307 10.8011
No log 0.0001 128 0.0306 9.8703
No log 0.0003 256 0.0301 9.3693
No log 0.0005 512 0.0416 9.0687
No log 0.0010 1024 0.0525 8.3611
17.2276 0.002 2000 0.0678 7.6542
17.2276 0.0020 2048 0.0678 7.6283
14.6302 0.004 4000 0.0892 7.0092
14.6302 0.0041 4096 0.0913 6.9777
13.6739 0.006 6000 0.1067 6.6596
13.0658 0.008 8000 0.1232 6.3721
13.0658 0.0082 8192 0.1237 6.3621
12.6223 0.01 10000 0.1376 6.1424
12.1785 0.012 12000 0.1563 5.8987
11.6802 0.014 14000 0.1857 5.6095
11.1743 0.016 16000 0.2213 5.3249
11.1743 0.0164 16384 0.2281 5.2768
10.7738 0.018 18000 0.2480 5.1219
10.4178 0.02 20000 0.2640 4.9929
10.1491 0.022 22000 0.2732 4.8845
9.934 0.024 24000 0.2802 4.7947
9.7615 0.026 26000 0.2860 4.7243
9.5947 0.028 28000 0.2918 4.6485
9.5259 0.03 30000 0.2956 4.5979
9.3701 0.032 32000 0.2993 4.5421
9.3701 0.0328 32768 0.3012 4.5245
9.3106 0.034 34000 0.3035 4.4962
9.2248 0.036 36000 0.3069 4.4722
9.1151 0.038 38000 0.3099 4.4183
9.0054 0.04 40000 0.3132 4.3826
8.9254 0.042 42000 0.3167 4.3473
8.899 0.044 44000 0.3191 4.3118
8.7951 0.046 46000 0.3213 4.2810
8.7668 0.048 48000 0.3238 4.2601
8.6323 0.05 50000 0.3259 4.2360
8.5954 1.0001 52000 0.3278 4.2179
8.6282 1.0021 54000 0.3309 4.1831
8.5655 1.0041 56000 0.3323 4.1695
8.4928 1.0061 58000 0.3342 4.1452
8.5173 1.0081 60000 0.3356 4.1297
8.4115 1.0101 62000 0.3377 4.1152
8.4264 1.0121 64000 0.3390 4.0969
8.4264 1.0136 65536 0.3401 4.0832
8.3665 1.0141 66000 0.3403 4.0796
8.3269 1.0161 68000 0.3424 4.0587
8.3235 1.0181 70000 0.3442 4.0449
8.3084 1.0201 72000 0.3447 4.0344
8.2626 1.0221 74000 0.3465 4.0173
8.2159 1.0241 76000 0.3481 4.0052
8.1909 1.0261 78000 0.3470 4.0121
8.1668 1.0281 80000 0.3482 3.9881
8.1834 1.0301 82000 0.3502 3.9693
8.1224 1.0321 84000 0.3525 3.9584
8.1373 1.0341 86000 0.3532 3.9491
8.1292 1.0361 88000 0.3533 3.9458
8.0558 1.0381 90000 0.3550 3.9267
8.0256 1.0401 92000 0.3571 3.9133
7.9978 1.0421 94000 0.3578 3.9021
8.0016 1.0441 96000 0.3583 3.8952
7.9416 1.0461 98000 0.3591 3.8897
7.9475 1.0481 100000 0.3600 3.8779
7.8433 1.0501 102000 0.3612 3.8705
7.8756 2.0002 104000 0.3620 3.8657
7.9009 2.0022 106000 0.3628 3.8557
7.89 2.0042 108000 0.3633 3.8487
7.8288 2.0062 110000 0.3651 3.8367
7.8747 2.0082 112000 0.3656 3.8324
7.8107 2.0102 114000 0.3654 3.8288
7.8301 2.0122 116000 0.3672 3.8152
7.7904 2.0142 118000 0.3671 3.8140
7.7856 2.0162 120000 0.3680 3.8058
7.7961 2.0182 122000 0.3688 3.7996
7.7909 2.0202 124000 0.3694 3.7952
7.7698 2.0222 126000 0.3701 3.7863
7.7299 2.0242 128000 0.3703 3.7823
7.718 2.0262 130000 0.3697 3.7823
7.718 2.0272 131072 0.3707 3.7746
7.7039 2.0282 132000 0.3708 3.7754
7.7561 2.0302 134000 0.3726 3.7613

Framework versions

  • Transformers 4.48.0.dev0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
24,614
Safetensors
Model size
257M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for bowphs/mc4-xglm-tagged-base

Base model

bowphs/xglm-257M
Finetuned
(1)
this model