llm3br256
This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the spinny dataset. It achieves the following results on the evaluation set:
- Loss: 0.0079
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 5.0
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.0598 | 0.0578 | 5 | 0.0557 |
0.0388 | 0.1156 | 10 | 0.0346 |
0.0275 | 0.1734 | 15 | 0.0276 |
0.0218 | 0.2312 | 20 | 0.0228 |
0.0236 | 0.2890 | 25 | 0.0203 |
0.0182 | 0.3468 | 30 | 0.0179 |
0.019 | 0.4046 | 35 | 0.0162 |
0.017 | 0.4624 | 40 | 0.0147 |
0.0147 | 0.5202 | 45 | 0.0137 |
0.0118 | 0.5780 | 50 | 0.0132 |
0.0107 | 0.6358 | 55 | 0.0127 |
0.016 | 0.6936 | 60 | 0.0123 |
0.0144 | 0.7514 | 65 | 0.0116 |
0.0119 | 0.8092 | 70 | 0.0113 |
0.0111 | 0.8671 | 75 | 0.0109 |
0.012 | 0.9249 | 80 | 0.0107 |
0.0139 | 0.9827 | 85 | 0.0102 |
0.0085 | 1.0405 | 90 | 0.0104 |
0.01 | 1.0983 | 95 | 0.0102 |
0.009 | 1.1561 | 100 | 0.0099 |
0.0094 | 1.2139 | 105 | 0.0098 |
0.0069 | 1.2717 | 110 | 0.0099 |
0.0108 | 1.3295 | 115 | 0.0096 |
0.0066 | 1.3873 | 120 | 0.0095 |
0.0089 | 1.4451 | 125 | 0.0094 |
0.0084 | 1.5029 | 130 | 0.0093 |
0.0102 | 1.5607 | 135 | 0.0093 |
0.01 | 1.6185 | 140 | 0.0091 |
0.0098 | 1.6763 | 145 | 0.0088 |
0.0071 | 1.7341 | 150 | 0.0087 |
0.0094 | 1.7919 | 155 | 0.0086 |
0.008 | 1.8497 | 160 | 0.0086 |
0.01 | 1.9075 | 165 | 0.0085 |
0.0084 | 1.9653 | 170 | 0.0086 |
0.0058 | 2.0231 | 175 | 0.0087 |
0.0056 | 2.0809 | 180 | 0.0090 |
0.0077 | 2.1387 | 185 | 0.0086 |
0.0061 | 2.1965 | 190 | 0.0086 |
0.008 | 2.2543 | 195 | 0.0083 |
0.0058 | 2.3121 | 200 | 0.0083 |
0.0047 | 2.3699 | 205 | 0.0084 |
0.0066 | 2.4277 | 210 | 0.0084 |
0.0055 | 2.4855 | 215 | 0.0082 |
0.0056 | 2.5434 | 220 | 0.0083 |
0.005 | 2.6012 | 225 | 0.0082 |
0.0065 | 2.6590 | 230 | 0.0082 |
0.0061 | 2.7168 | 235 | 0.0081 |
0.0052 | 2.7746 | 240 | 0.0082 |
0.0053 | 2.8324 | 245 | 0.0081 |
0.0058 | 2.8902 | 250 | 0.0079 |
0.0052 | 2.9480 | 255 | 0.0078 |
0.0071 | 3.0058 | 260 | 0.0080 |
0.0051 | 3.0636 | 265 | 0.0082 |
0.0033 | 3.1214 | 270 | 0.0086 |
0.004 | 3.1792 | 275 | 0.0084 |
0.0032 | 3.2370 | 280 | 0.0082 |
0.0042 | 3.2948 | 285 | 0.0082 |
0.0035 | 3.3526 | 290 | 0.0082 |
0.0041 | 3.4104 | 295 | 0.0081 |
0.0048 | 3.4682 | 300 | 0.0080 |
0.0046 | 3.5260 | 305 | 0.0080 |
0.004 | 3.5838 | 310 | 0.0080 |
0.0032 | 3.6416 | 315 | 0.0081 |
0.0039 | 3.6994 | 320 | 0.0084 |
0.0042 | 3.7572 | 325 | 0.0083 |
0.0046 | 3.8150 | 330 | 0.0080 |
0.0035 | 3.8728 | 335 | 0.0081 |
0.0048 | 3.9306 | 340 | 0.0081 |
0.0056 | 3.9884 | 345 | 0.0080 |
0.0025 | 4.0462 | 350 | 0.0080 |
0.0035 | 4.1040 | 355 | 0.0082 |
0.0028 | 4.1618 | 360 | 0.0083 |
0.0028 | 4.2197 | 365 | 0.0084 |
0.003 | 4.2775 | 370 | 0.0085 |
0.0033 | 4.3353 | 375 | 0.0085 |
0.003 | 4.3931 | 380 | 0.0086 |
0.0022 | 4.4509 | 385 | 0.0086 |
0.0028 | 4.5087 | 390 | 0.0086 |
0.0028 | 4.5665 | 395 | 0.0085 |
0.0031 | 4.6243 | 400 | 0.0085 |
0.0038 | 4.6821 | 405 | 0.0084 |
0.0024 | 4.7399 | 410 | 0.0084 |
0.0024 | 4.7977 | 415 | 0.0084 |
0.0024 | 4.8555 | 420 | 0.0084 |
0.0026 | 4.9133 | 425 | 0.0084 |
0.0029 | 4.9711 | 430 | 0.0084 |
Framework versions
- PEFT 0.12.0
- Transformers 4.46.1
- Pytorch 2.4.0+cu121
- Datasets 3.1.0
- Tokenizers 0.20.3
- Downloads last month
- 2
Model tree for sizhkhy/spinny
Base model
meta-llama/Llama-3.2-3B-Instruct
Finetuned
unsloth/Llama-3.2-3B-Instruct