llama-7b-SFT-qlora-eli5-wiki_DPO_ds_RM_top_2_1024_r_64_alpha_16
This model is a fine-tuned version of dhmeltzer/llama-7b-SFT_eli5_wiki65k_1024_r_64_alpha_16_merged on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.6538
- Rewards/chosen: 0.1408
- Rewards/rejected: -0.0291
- Rewards/accuracies: 0.6248
- Rewards/margins: 0.1699
- Logps/rejected: -199.6676
- Logps/chosen: -203.9681
- Logits/rejected: 0.8159
- Logits/chosen: 0.8393
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.03
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6828 | 0.2 | 37 | 0.6867 | -0.3470 | -0.4719 | 0.5792 | 0.1249 | -201.8816 | -206.4072 | 0.7977 | 0.8213 |
0.6666 | 0.41 | 74 | 0.6731 | -0.1233 | -0.2593 | 0.5855 | 0.1361 | -200.8187 | -205.2885 | 0.8159 | 0.8381 |
0.6713 | 0.61 | 111 | 0.6645 | 0.0492 | -0.1110 | 0.6019 | 0.1602 | -200.0772 | -204.4260 | 0.8299 | 0.8526 |
0.6749 | 0.82 | 148 | 0.6593 | 0.2291 | 0.0917 | 0.5912 | 0.1374 | -199.0636 | -203.5266 | 0.8189 | 0.8414 |
0.6688 | 1.02 | 185 | 0.6538 | 0.1408 | -0.0291 | 0.6248 | 0.1699 | -199.6676 | -203.9681 | 0.8159 | 0.8393 |
0.3721 | 1.23 | 222 | 0.6911 | -0.3548 | -0.6171 | 0.6007 | 0.2623 | -202.6077 | -206.4462 | 0.8193 | 0.8406 |
0.2845 | 1.43 | 259 | 0.6989 | -0.3528 | -0.5968 | 0.5984 | 0.2441 | -202.5062 | -206.4359 | 0.7886 | 0.8059 |
0.2646 | 1.64 | 296 | 0.6991 | -0.4016 | -0.6359 | 0.5880 | 0.2343 | -202.7015 | -206.6800 | 0.7696 | 0.7875 |
0.2263 | 1.84 | 333 | 0.7063 | -0.4773 | -0.7137 | 0.5925 | 0.2365 | -203.0908 | -207.0584 | 0.7653 | 0.7833 |
Framework versions
- Transformers 4.32.1
- Pytorch 2.0.1+cu118
- Datasets 2.14.4
- Tokenizers 0.13.3
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 44.03 |
ARC (25-shot) | 54.1 |
HellaSwag (10-shot) | 78.74 |
MMLU (5-shot) | 45.44 |
TruthfulQA (0-shot) | 43.4 |
Winogrande (5-shot) | 73.64 |
GSM8K (5-shot) | 4.55 |
DROP (3-shot) | 8.35 |