llm3br256

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the goavanto2-oneshot-train dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0032

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5.0

Training results

Training Loss Epoch Step Validation Loss
0.0512 0.0741 5 0.0457
0.034 0.1481 10 0.0291
0.0258 0.2222 15 0.0234
0.0185 0.2963 20 0.0186
0.014 0.3704 25 0.0155
0.0178 0.4444 30 0.0133
0.0157 0.5185 35 0.0116
0.014 0.5926 40 0.0102
0.0098 0.6667 45 0.0091
0.0074 0.7407 50 0.0082
0.007 0.8148 55 0.0076
0.0078 0.8889 60 0.0073
0.0095 0.9630 65 0.0070
0.0064 1.0370 70 0.0067
0.0114 1.1111 75 0.0064
0.0059 1.1852 80 0.0060
0.0091 1.2593 85 0.0059
0.0051 1.3333 90 0.0055
0.0093 1.4074 95 0.0054
0.0048 1.4815 100 0.0051
0.0042 1.5556 105 0.0050
0.0044 1.6296 110 0.0049
0.0047 1.7037 115 0.0048
0.0047 1.7778 120 0.0047
0.0054 1.8519 125 0.0046
0.0042 1.9259 130 0.0043
0.0053 2.0 135 0.0043
0.0023 2.0741 140 0.0043
0.0053 2.1481 145 0.0043
0.0029 2.2222 150 0.0042
0.0036 2.2963 155 0.0041
0.0035 2.3704 160 0.0041
0.0031 2.4444 165 0.0041
0.003 2.5185 170 0.0040
0.0039 2.5926 175 0.0040
0.0036 2.6667 180 0.0038
0.0042 2.7407 185 0.0037
0.0032 2.8148 190 0.0036
0.0041 2.8889 195 0.0036
0.0053 2.9630 200 0.0035
0.0036 3.0370 205 0.0034
0.0054 3.1111 210 0.0035
0.0047 3.1852 215 0.0036
0.0022 3.2593 220 0.0034
0.003 3.3333 225 0.0034
0.0019 3.4074 230 0.0033
0.0034 3.4815 235 0.0034
0.0025 3.5556 240 0.0033
0.002 3.6296 245 0.0033
0.0015 3.7037 250 0.0033
0.0027 3.7778 255 0.0033
0.0015 3.8519 260 0.0032
0.0017 3.9259 265 0.0032
0.0027 4.0 270 0.0031
0.0014 4.0741 275 0.0031
0.0015 4.1481 280 0.0032
0.0014 4.2222 285 0.0032
0.002 4.2963 290 0.0033
0.0021 4.3704 295 0.0033
0.0035 4.4444 300 0.0032
0.0014 4.5185 305 0.0032
0.0023 4.5926 310 0.0032
0.0016 4.6667 315 0.0032
0.0016 4.7407 320 0.0032
0.0015 4.8148 325 0.0032
0.0014 4.8889 330 0.0032
0.0017 4.9630 335 0.0032

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.1
  • Pytorch 2.4.0+cu121
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for neel-nanonets/goavanto_2

Adapter
(244)
this model