Llama-2-7b-hf-DPO-LookAhead-5_Q2_TTree1.4_TT0.9_TP0.7_TE0.2_V5
This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.8926
- Rewards/chosen: -1.4220
- Rewards/rejected: -1.3388
- Rewards/accuracies: 0.4000
- Rewards/margins: -0.0832
- Logps/rejected: -160.5441
- Logps/chosen: -150.5625
- Logits/rejected: -0.0984
- Logits/chosen: -0.0941
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6642 | 0.2998 | 64 | 0.6861 | 0.1096 | 0.0712 | 0.7000 | 0.0384 | -146.4445 | -135.2473 | 0.3892 | 0.3857 |
0.7679 | 0.5995 | 128 | 0.6702 | 0.2151 | 0.1325 | 0.5 | 0.0826 | -145.8312 | -134.1918 | 0.3616 | 0.3586 |
0.6956 | 0.8993 | 192 | 0.7032 | 0.0993 | 0.1090 | 0.5 | -0.0098 | -146.0662 | -135.3502 | 0.3503 | 0.3473 |
0.428 | 1.1991 | 256 | 0.7001 | 0.0275 | -0.0647 | 0.5 | 0.0922 | -147.8036 | -136.0676 | 0.2753 | 0.2734 |
0.3326 | 1.4988 | 320 | 0.7460 | -0.5011 | -0.5860 | 0.6000 | 0.0849 | -153.0164 | -141.3538 | 0.1433 | 0.1439 |
0.498 | 1.7986 | 384 | 0.7965 | -0.6044 | -0.6122 | 0.5 | 0.0078 | -153.2779 | -142.3867 | 0.0688 | 0.0703 |
0.364 | 2.0984 | 448 | 0.8243 | -0.7682 | -0.6945 | 0.4000 | -0.0737 | -154.1017 | -144.0248 | 0.0654 | 0.0667 |
0.2876 | 2.3981 | 512 | 0.8566 | -1.3864 | -1.3678 | 0.4000 | -0.0186 | -160.8344 | -150.2071 | -0.0854 | -0.0815 |
0.0473 | 2.6979 | 576 | 0.8926 | -1.4220 | -1.3388 | 0.4000 | -0.0832 | -160.5441 | -150.5625 | -0.0984 | -0.0941 |
Framework versions
- PEFT 0.12.0
- Transformers 4.45.2
- Pytorch 2.4.0+cu121
- Datasets 3.2.0
- Tokenizers 0.20.3
- Downloads last month
- 2
Model tree for LBK95/Llama-2-7b-hf-DPO-LookAhead-5_Q2_TTree1.4_TT0.9_TP0.7_TE0.2_V5
Base model
meta-llama/Llama-2-7b-hf