llama-3.2-3B-lora-r2
This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the gommt-oneshot-train dataset. It achieves the following results on the evaluation set:
- Loss: 0.0120
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 5.0
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.0578 | 0.0372 | 5 | 0.0572 |
0.0546 | 0.0745 | 10 | 0.0571 |
0.0544 | 0.1117 | 15 | 0.0569 |
0.0546 | 0.1490 | 20 | 0.0561 |
0.0584 | 0.1862 | 25 | 0.0540 |
0.0461 | 0.2235 | 30 | 0.0504 |
0.0472 | 0.2607 | 35 | 0.0462 |
0.0422 | 0.2980 | 40 | 0.0430 |
0.0354 | 0.3352 | 45 | 0.0408 |
0.038 | 0.3724 | 50 | 0.0386 |
0.034 | 0.4097 | 55 | 0.0363 |
0.0325 | 0.4469 | 60 | 0.0345 |
0.0299 | 0.4842 | 65 | 0.0331 |
0.0286 | 0.5214 | 70 | 0.0313 |
0.0299 | 0.5587 | 75 | 0.0292 |
0.028 | 0.5959 | 80 | 0.0276 |
0.025 | 0.6331 | 85 | 0.0261 |
0.024 | 0.6704 | 90 | 0.0248 |
0.0231 | 0.7076 | 95 | 0.0241 |
0.0222 | 0.7449 | 100 | 0.0234 |
0.021 | 0.7821 | 105 | 0.0229 |
0.0228 | 0.8194 | 110 | 0.0222 |
0.0184 | 0.8566 | 115 | 0.0216 |
0.0227 | 0.8939 | 120 | 0.0211 |
0.0188 | 0.9311 | 125 | 0.0209 |
0.0238 | 0.9683 | 130 | 0.0205 |
0.0328 | 1.0056 | 135 | 0.0201 |
0.0208 | 1.0428 | 140 | 0.0198 |
0.0199 | 1.0801 | 145 | 0.0194 |
0.0201 | 1.1173 | 150 | 0.0191 |
0.0163 | 1.1546 | 155 | 0.0189 |
0.0164 | 1.1918 | 160 | 0.0186 |
0.0174 | 1.2291 | 165 | 0.0185 |
0.0175 | 1.2663 | 170 | 0.0182 |
0.0175 | 1.3035 | 175 | 0.0180 |
0.0184 | 1.3408 | 180 | 0.0175 |
0.0185 | 1.3780 | 185 | 0.0174 |
0.0155 | 1.4153 | 190 | 0.0173 |
0.016 | 1.4525 | 195 | 0.0170 |
0.0191 | 1.4898 | 200 | 0.0167 |
0.0186 | 1.5270 | 205 | 0.0165 |
0.0171 | 1.5642 | 210 | 0.0164 |
0.0192 | 1.6015 | 215 | 0.0165 |
0.0154 | 1.6387 | 220 | 0.0159 |
0.0179 | 1.6760 | 225 | 0.0160 |
0.0153 | 1.7132 | 230 | 0.0157 |
0.0162 | 1.7505 | 235 | 0.0155 |
0.0166 | 1.7877 | 240 | 0.0154 |
0.0147 | 1.8250 | 245 | 0.0153 |
0.016 | 1.8622 | 250 | 0.0153 |
0.0153 | 1.8994 | 255 | 0.0150 |
0.0157 | 1.9367 | 260 | 0.0149 |
0.0165 | 1.9739 | 265 | 0.0150 |
0.0153 | 2.0112 | 270 | 0.0148 |
0.015 | 2.0484 | 275 | 0.0149 |
0.0159 | 2.0857 | 280 | 0.0148 |
0.0166 | 2.1229 | 285 | 0.0146 |
0.0153 | 2.1601 | 290 | 0.0146 |
0.013 | 2.1974 | 295 | 0.0143 |
0.0139 | 2.2346 | 300 | 0.0143 |
0.016 | 2.2719 | 305 | 0.0145 |
0.0142 | 2.3091 | 310 | 0.0144 |
0.0138 | 2.3464 | 315 | 0.0143 |
0.0151 | 2.3836 | 320 | 0.0143 |
0.0152 | 2.4209 | 325 | 0.0140 |
0.0141 | 2.4581 | 330 | 0.0142 |
0.0137 | 2.4953 | 335 | 0.0139 |
0.0132 | 2.5326 | 340 | 0.0138 |
0.0132 | 2.5698 | 345 | 0.0136 |
0.0162 | 2.6071 | 350 | 0.0135 |
0.0133 | 2.6443 | 355 | 0.0135 |
0.0134 | 2.6816 | 360 | 0.0134 |
0.0147 | 2.7188 | 365 | 0.0135 |
0.0127 | 2.7561 | 370 | 0.0134 |
0.0144 | 2.7933 | 375 | 0.0132 |
0.0166 | 2.8305 | 380 | 0.0131 |
0.0136 | 2.8678 | 385 | 0.0131 |
0.0158 | 2.9050 | 390 | 0.0132 |
0.0118 | 2.9423 | 395 | 0.0131 |
0.0133 | 2.9795 | 400 | 0.0130 |
0.0126 | 3.0168 | 405 | 0.0128 |
0.0121 | 3.0540 | 410 | 0.0128 |
0.0127 | 3.0912 | 415 | 0.0128 |
0.0128 | 3.1285 | 420 | 0.0127 |
0.0121 | 3.1657 | 425 | 0.0127 |
0.0121 | 3.2030 | 430 | 0.0127 |
0.0141 | 3.2402 | 435 | 0.0127 |
0.013 | 3.2775 | 440 | 0.0126 |
0.0123 | 3.3147 | 445 | 0.0125 |
0.0153 | 3.3520 | 450 | 0.0126 |
0.0148 | 3.3892 | 455 | 0.0126 |
0.0136 | 3.4264 | 460 | 0.0126 |
0.0175 | 3.4637 | 465 | 0.0125 |
0.0143 | 3.5009 | 470 | 0.0125 |
0.0116 | 3.5382 | 475 | 0.0124 |
0.012 | 3.5754 | 480 | 0.0123 |
0.0116 | 3.6127 | 485 | 0.0124 |
0.0127 | 3.6499 | 490 | 0.0125 |
0.0158 | 3.6872 | 495 | 0.0124 |
0.0126 | 3.7244 | 500 | 0.0124 |
0.0138 | 3.7616 | 505 | 0.0124 |
0.0135 | 3.7989 | 510 | 0.0124 |
0.0126 | 3.8361 | 515 | 0.0124 |
0.0138 | 3.8734 | 520 | 0.0123 |
0.0117 | 3.9106 | 525 | 0.0123 |
0.0126 | 3.9479 | 530 | 0.0123 |
0.0132 | 3.9851 | 535 | 0.0123 |
0.013 | 4.0223 | 540 | 0.0123 |
0.0131 | 4.0596 | 545 | 0.0122 |
0.0156 | 4.0968 | 550 | 0.0122 |
0.0129 | 4.1341 | 555 | 0.0122 |
0.0128 | 4.1713 | 560 | 0.0122 |
0.0098 | 4.2086 | 565 | 0.0121 |
0.0109 | 4.2458 | 570 | 0.0121 |
0.0128 | 4.2831 | 575 | 0.0121 |
0.0116 | 4.3203 | 580 | 0.0121 |
0.0126 | 4.3575 | 585 | 0.0121 |
0.0118 | 4.3948 | 590 | 0.0121 |
0.0136 | 4.4320 | 595 | 0.0121 |
0.0122 | 4.4693 | 600 | 0.0121 |
0.0142 | 4.5065 | 605 | 0.0120 |
0.011 | 4.5438 | 610 | 0.0120 |
0.011 | 4.5810 | 615 | 0.0120 |
0.011 | 4.6182 | 620 | 0.0120 |
0.0104 | 4.6555 | 625 | 0.0120 |
0.0113 | 4.6927 | 630 | 0.0120 |
0.0123 | 4.7300 | 635 | 0.0120 |
0.011 | 4.7672 | 640 | 0.0120 |
0.0125 | 4.8045 | 645 | 0.0120 |
0.0121 | 4.8417 | 650 | 0.0120 |
0.0124 | 4.8790 | 655 | 0.0120 |
0.0102 | 4.9162 | 660 | 0.0120 |
0.0151 | 4.9534 | 665 | 0.0121 |
0.0125 | 4.9907 | 670 | 0.0120 |
Framework versions
- PEFT 0.12.0
- Transformers 4.46.1
- Pytorch 2.4.0+cu121
- Datasets 3.1.0
- Tokenizers 0.20.3
- Downloads last month
- 1
Model tree for sizhkhy/llama-3.2-3B-lora-r2
Base model
meta-llama/Llama-3.2-3B-Instruct
Finetuned
unsloth/Llama-3.2-3B-Instruct