Llama-2-7b-hf-DPO-LookAhead-5_Q2_TTree1.4_TT0.9_TP0.7_TE0.2_V5

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8926
  • Rewards/chosen: -1.4220
  • Rewards/rejected: -1.3388
  • Rewards/accuracies: 0.4000
  • Rewards/margins: -0.0832
  • Logps/rejected: -160.5441
  • Logps/chosen: -150.5625
  • Logits/rejected: -0.0984
  • Logits/chosen: -0.0941

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6642 0.2998 64 0.6861 0.1096 0.0712 0.7000 0.0384 -146.4445 -135.2473 0.3892 0.3857
0.7679 0.5995 128 0.6702 0.2151 0.1325 0.5 0.0826 -145.8312 -134.1918 0.3616 0.3586
0.6956 0.8993 192 0.7032 0.0993 0.1090 0.5 -0.0098 -146.0662 -135.3502 0.3503 0.3473
0.428 1.1991 256 0.7001 0.0275 -0.0647 0.5 0.0922 -147.8036 -136.0676 0.2753 0.2734
0.3326 1.4988 320 0.7460 -0.5011 -0.5860 0.6000 0.0849 -153.0164 -141.3538 0.1433 0.1439
0.498 1.7986 384 0.7965 -0.6044 -0.6122 0.5 0.0078 -153.2779 -142.3867 0.0688 0.0703
0.364 2.0984 448 0.8243 -0.7682 -0.6945 0.4000 -0.0737 -154.1017 -144.0248 0.0654 0.0667
0.2876 2.3981 512 0.8566 -1.3864 -1.3678 0.4000 -0.0186 -160.8344 -150.2071 -0.0854 -0.0815
0.0473 2.6979 576 0.8926 -1.4220 -1.3388 0.4000 -0.0832 -160.5441 -150.5625 -0.0984 -0.0941

Framework versions

  • PEFT 0.12.0
  • Transformers 4.45.2
  • Pytorch 2.4.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.20.3
Downloads last month
2
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for LBK95/Llama-2-7b-hf-DPO-LookAhead-5_Q2_TTree1.4_TT0.9_TP0.7_TE0.2_V5

Adapter
(1799)
this model