llm3br256
This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the relianceada-oneshot-train dataset. It achieves the following results on the evaluation set:
- Loss: 0.0057
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 5.0
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.0536 | 0.0291 | 5 | 0.0522 |
0.0383 | 0.0582 | 10 | 0.0333 |
0.0217 | 0.0873 | 15 | 0.0211 |
0.0192 | 0.1164 | 20 | 0.0160 |
0.0144 | 0.1456 | 25 | 0.0133 |
0.0127 | 0.1747 | 30 | 0.0119 |
0.0113 | 0.2038 | 35 | 0.0109 |
0.0115 | 0.2329 | 40 | 0.0104 |
0.0105 | 0.2620 | 45 | 0.0098 |
0.0099 | 0.2911 | 50 | 0.0093 |
0.0102 | 0.3202 | 55 | 0.0090 |
0.0092 | 0.3493 | 60 | 0.0087 |
0.0093 | 0.3785 | 65 | 0.0085 |
0.0084 | 0.4076 | 70 | 0.0083 |
0.0088 | 0.4367 | 75 | 0.0083 |
0.0084 | 0.4658 | 80 | 0.0079 |
0.0086 | 0.4949 | 85 | 0.0079 |
0.0079 | 0.5240 | 90 | 0.0078 |
0.0083 | 0.5531 | 95 | 0.0077 |
0.0086 | 0.5822 | 100 | 0.0078 |
0.0089 | 0.6114 | 105 | 0.0076 |
0.0075 | 0.6405 | 110 | 0.0076 |
0.0078 | 0.6696 | 115 | 0.0075 |
0.0079 | 0.6987 | 120 | 0.0074 |
0.0078 | 0.7278 | 125 | 0.0073 |
0.008 | 0.7569 | 130 | 0.0072 |
0.0077 | 0.7860 | 135 | 0.0070 |
0.0079 | 0.8151 | 140 | 0.0070 |
0.0071 | 0.8443 | 145 | 0.0070 |
0.0072 | 0.8734 | 150 | 0.0071 |
0.0076 | 0.9025 | 155 | 0.0070 |
0.0075 | 0.9316 | 160 | 0.0070 |
0.0074 | 0.9607 | 165 | 0.0069 |
0.0073 | 0.9898 | 170 | 0.0069 |
0.007 | 1.0189 | 175 | 0.0069 |
0.0072 | 1.0480 | 180 | 0.0069 |
0.0068 | 1.0771 | 185 | 0.0068 |
0.0067 | 1.1063 | 190 | 0.0069 |
0.0075 | 1.1354 | 195 | 0.0068 |
0.0072 | 1.1645 | 200 | 0.0068 |
0.0075 | 1.1936 | 205 | 0.0068 |
0.0066 | 1.2227 | 210 | 0.0067 |
0.0068 | 1.2518 | 215 | 0.0068 |
0.007 | 1.2809 | 220 | 0.0069 |
0.0065 | 1.3100 | 225 | 0.0068 |
0.0063 | 1.3392 | 230 | 0.0068 |
0.0068 | 1.3683 | 235 | 0.0067 |
0.0066 | 1.3974 | 240 | 0.0067 |
0.0063 | 1.4265 | 245 | 0.0068 |
0.0069 | 1.4556 | 250 | 0.0068 |
0.0068 | 1.4847 | 255 | 0.0067 |
0.0067 | 1.5138 | 260 | 0.0067 |
0.0063 | 1.5429 | 265 | 0.0065 |
0.0066 | 1.5721 | 270 | 0.0067 |
0.0063 | 1.6012 | 275 | 0.0066 |
0.0064 | 1.6303 | 280 | 0.0066 |
0.0066 | 1.6594 | 285 | 0.0066 |
0.0068 | 1.6885 | 290 | 0.0065 |
0.0065 | 1.7176 | 295 | 0.0064 |
0.0064 | 1.7467 | 300 | 0.0064 |
0.0068 | 1.7758 | 305 | 0.0064 |
0.0063 | 1.8049 | 310 | 0.0064 |
0.0067 | 1.8341 | 315 | 0.0064 |
0.0065 | 1.8632 | 320 | 0.0065 |
0.006 | 1.8923 | 325 | 0.0064 |
0.0064 | 1.9214 | 330 | 0.0064 |
0.0065 | 1.9505 | 335 | 0.0064 |
0.0061 | 1.9796 | 340 | 0.0063 |
0.006 | 2.0087 | 345 | 0.0063 |
0.0058 | 2.0378 | 350 | 0.0063 |
0.0057 | 2.0670 | 355 | 0.0063 |
0.0059 | 2.0961 | 360 | 0.0062 |
0.0061 | 2.1252 | 365 | 0.0063 |
0.006 | 2.1543 | 370 | 0.0063 |
0.0062 | 2.1834 | 375 | 0.0063 |
0.0064 | 2.2125 | 380 | 0.0063 |
0.006 | 2.2416 | 385 | 0.0062 |
0.0062 | 2.2707 | 390 | 0.0061 |
0.0061 | 2.2999 | 395 | 0.0062 |
0.0063 | 2.3290 | 400 | 0.0063 |
0.006 | 2.3581 | 405 | 0.0062 |
0.006 | 2.3872 | 410 | 0.0063 |
0.0057 | 2.4163 | 415 | 0.0062 |
0.0063 | 2.4454 | 420 | 0.0063 |
0.0065 | 2.4745 | 425 | 0.0062 |
0.006 | 2.5036 | 430 | 0.0062 |
0.0059 | 2.5328 | 435 | 0.0062 |
0.0058 | 2.5619 | 440 | 0.0062 |
0.0061 | 2.5910 | 445 | 0.0061 |
0.0061 | 2.6201 | 450 | 0.0061 |
0.0059 | 2.6492 | 455 | 0.0062 |
0.0057 | 2.6783 | 460 | 0.0062 |
0.0059 | 2.7074 | 465 | 0.0061 |
0.0058 | 2.7365 | 470 | 0.0062 |
0.0057 | 2.7656 | 475 | 0.0061 |
0.0058 | 2.7948 | 480 | 0.0061 |
0.0057 | 2.8239 | 485 | 0.0060 |
0.0059 | 2.8530 | 490 | 0.0060 |
0.0058 | 2.8821 | 495 | 0.0061 |
0.0059 | 2.9112 | 500 | 0.0060 |
0.0058 | 2.9403 | 505 | 0.0060 |
0.0057 | 2.9694 | 510 | 0.0061 |
0.0066 | 2.9985 | 515 | 0.0061 |
0.0055 | 3.0277 | 520 | 0.0060 |
0.005 | 3.0568 | 525 | 0.0060 |
0.0055 | 3.0859 | 530 | 0.0060 |
0.0054 | 3.1150 | 535 | 0.0060 |
0.0055 | 3.1441 | 540 | 0.0060 |
0.0056 | 3.1732 | 545 | 0.0060 |
0.0057 | 3.2023 | 550 | 0.0060 |
0.0058 | 3.2314 | 555 | 0.0060 |
0.0052 | 3.2606 | 560 | 0.0060 |
0.0058 | 3.2897 | 565 | 0.0060 |
0.0051 | 3.3188 | 570 | 0.0058 |
0.0051 | 3.3479 | 575 | 0.0059 |
0.0053 | 3.3770 | 580 | 0.0059 |
0.0053 | 3.4061 | 585 | 0.0059 |
0.0055 | 3.4352 | 590 | 0.0058 |
0.0051 | 3.4643 | 595 | 0.0059 |
0.0051 | 3.4934 | 600 | 0.0059 |
0.0055 | 3.5226 | 605 | 0.0059 |
0.0055 | 3.5517 | 610 | 0.0058 |
0.0055 | 3.5808 | 615 | 0.0058 |
0.0051 | 3.6099 | 620 | 0.0058 |
0.0054 | 3.6390 | 625 | 0.0057 |
0.0053 | 3.6681 | 630 | 0.0057 |
0.0052 | 3.6972 | 635 | 0.0057 |
0.0052 | 3.7263 | 640 | 0.0057 |
0.0052 | 3.7555 | 645 | 0.0058 |
0.0049 | 3.7846 | 650 | 0.0057 |
0.0055 | 3.8137 | 655 | 0.0057 |
0.0052 | 3.8428 | 660 | 0.0057 |
0.005 | 3.8719 | 665 | 0.0057 |
0.005 | 3.9010 | 670 | 0.0057 |
0.0051 | 3.9301 | 675 | 0.0057 |
0.0054 | 3.9592 | 680 | 0.0057 |
0.0052 | 3.9884 | 685 | 0.0057 |
0.0046 | 4.0175 | 690 | 0.0057 |
0.0047 | 4.0466 | 695 | 0.0057 |
0.0044 | 4.0757 | 700 | 0.0057 |
0.0047 | 4.1048 | 705 | 0.0057 |
0.0046 | 4.1339 | 710 | 0.0057 |
0.0046 | 4.1630 | 715 | 0.0057 |
0.0048 | 4.1921 | 720 | 0.0057 |
0.0047 | 4.2213 | 725 | 0.0057 |
0.0048 | 4.2504 | 730 | 0.0057 |
0.0047 | 4.2795 | 735 | 0.0057 |
0.0047 | 4.3086 | 740 | 0.0057 |
0.0046 | 4.3377 | 745 | 0.0057 |
0.0047 | 4.3668 | 750 | 0.0057 |
0.005 | 4.3959 | 755 | 0.0057 |
0.0043 | 4.4250 | 760 | 0.0057 |
0.0047 | 4.4541 | 765 | 0.0057 |
0.0047 | 4.4833 | 770 | 0.0057 |
0.0046 | 4.5124 | 775 | 0.0057 |
0.0047 | 4.5415 | 780 | 0.0057 |
0.0047 | 4.5706 | 785 | 0.0057 |
0.0048 | 4.5997 | 790 | 0.0057 |
0.0045 | 4.6288 | 795 | 0.0057 |
0.0045 | 4.6579 | 800 | 0.0057 |
0.0049 | 4.6870 | 805 | 0.0057 |
0.0045 | 4.7162 | 810 | 0.0057 |
0.0045 | 4.7453 | 815 | 0.0057 |
0.0044 | 4.7744 | 820 | 0.0057 |
0.0045 | 4.8035 | 825 | 0.0057 |
0.0046 | 4.8326 | 830 | 0.0057 |
0.0044 | 4.8617 | 835 | 0.0057 |
0.0044 | 4.8908 | 840 | 0.0057 |
0.0048 | 4.9199 | 845 | 0.0057 |
0.0047 | 4.9491 | 850 | 0.0057 |
0.0044 | 4.9782 | 855 | 0.0057 |
Framework versions
- PEFT 0.12.0
- Transformers 4.46.1
- Pytorch 2.4.0+cu121
- Datasets 3.1.0
- Tokenizers 0.20.3
- Downloads last month
- 1
Model tree for neel-nanonets/relianceada
Base model
meta-llama/Llama-3.2-3B-Instruct
Finetuned
unsloth/Llama-3.2-3B-Instruct