llm3br256

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the brasingh_publicis_f5f dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0205

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5.0

Training results

Training Loss Epoch Step Validation Loss
0.1038 0.0606 5 0.0979
0.0759 0.1212 10 0.0765
0.069 0.1818 15 0.0683
0.0729 0.2424 20 0.0620
0.0545 0.3030 25 0.0571
0.0589 0.3636 30 0.0528
0.0461 0.4242 35 0.0501
0.0522 0.4848 40 0.0493
0.052 0.5455 45 0.0483
0.0459 0.6061 50 0.0458
0.0363 0.6667 55 0.0434
0.0553 0.7273 60 0.0418
0.0444 0.7879 65 0.0403
0.0469 0.8485 70 0.0397
0.0417 0.9091 75 0.0386
0.0388 0.9697 80 0.0372
0.0309 1.0303 85 0.0358
0.0487 1.0909 90 0.0354
0.0348 1.1515 95 0.0340
0.0308 1.2121 100 0.0334
0.0318 1.2727 105 0.0330
0.028 1.3333 110 0.0322
0.0311 1.3939 115 0.0321
0.0382 1.4545 120 0.0315
0.0316 1.5152 125 0.0304
0.0278 1.5758 130 0.0299
0.0285 1.6364 135 0.0292
0.0257 1.6970 140 0.0285
0.0244 1.7576 145 0.0281
0.0256 1.8182 150 0.0278
0.0338 1.8788 155 0.0270
0.0309 1.9394 160 0.0262
0.0378 2.0 165 0.0261
0.0275 2.0606 170 0.0263
0.0225 2.1212 175 0.0259
0.0232 2.1818 180 0.0256
0.0193 2.2424 185 0.0255
0.0251 2.3030 190 0.0253
0.0228 2.3636 195 0.0249
0.0195 2.4242 200 0.0249
0.0219 2.4848 205 0.0241
0.0184 2.5455 210 0.0238
0.0199 2.6061 215 0.0236
0.023 2.6667 220 0.0232
0.0227 2.7273 225 0.0234
0.0206 2.7879 230 0.0230
0.0217 2.8485 235 0.0225
0.0186 2.9091 240 0.0224
0.0201 2.9697 245 0.0220
0.0147 3.0303 250 0.0220
0.0142 3.0909 255 0.0226
0.0149 3.1515 260 0.0218
0.0151 3.2121 265 0.0215
0.0174 3.2727 270 0.0217
0.0172 3.3333 275 0.0213
0.017 3.3939 280 0.0211
0.0223 3.4545 285 0.0212
0.0144 3.5152 290 0.0211
0.0125 3.5758 295 0.0208
0.0163 3.6364 300 0.0207
0.015 3.6970 305 0.0207
0.0154 3.7576 310 0.0206
0.0186 3.8182 315 0.0203
0.0135 3.8788 320 0.0202
0.0159 3.9394 325 0.0201
0.0211 4.0 330 0.0200
0.0134 4.0606 335 0.0202
0.0113 4.1212 340 0.0206
0.0117 4.1818 345 0.0208
0.0108 4.2424 350 0.0209
0.012 4.3030 355 0.0207
0.0111 4.3636 360 0.0206
0.0118 4.4242 365 0.0205
0.0099 4.4848 370 0.0206
0.0118 4.5455 375 0.0206
0.0119 4.6061 380 0.0206
0.0114 4.6667 385 0.0206
0.0109 4.7273 390 0.0206
0.0124 4.7879 395 0.0205
0.0111 4.8485 400 0.0206
0.012 4.9091 405 0.0206
0.0104 4.9697 410 0.0205

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.1
  • Pytorch 2.4.0+cu121
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
4
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for sizhkhy/brasingh_publicis_f5f

Adapter
(169)
this model