File size: 6,712 Bytes
09dff77
 
 
 
 
 
 
 
 
df106bb
09dff77
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---
license: apache-2.0
library_name: peft
tags:
- generated_from_trainer
base_model: TheBloke/OpenHermes-2-Mistral-7B-GPTQ
model-index:
- name: mistral-dpo
  results: []
pipeline_tag: text-generation
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# mistral-dpo

This model is a fine-tuned version of [TheBloke/OpenHermes-2-Mistral-7B-GPTQ](https://huggingface.co/TheBloke/OpenHermes-2-Mistral-7B-GPTQ) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.8911
- Rewards/chosen: 0.5387
- Rewards/rejected: 0.4878
- Rewards/accuracies: 0.5096
- Rewards/margins: 0.0509
- Logps/rejected: -174.3804
- Logps/chosen: -178.5185
- Logits/rejected: -2.5028
- Logits/chosen: -2.5350

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2
- training_steps: 250
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6703        | 0.0   | 10   | 0.6842          | -0.0001        | -0.0268          | 0.5865             | 0.0267          | -179.5257      | -183.9063    | -2.4290         | -2.4720       |
| 0.7119        | 0.0   | 20   | 0.6751          | 0.1584         | 0.0990           | 0.5769             | 0.0594          | -178.2678      | -182.3211    | -2.4542         | -2.4988       |
| 0.647         | 0.0   | 30   | 0.6702          | 0.3569         | 0.2540           | 0.5769             | 0.1029          | -176.7180      | -180.3367    | -2.4886         | -2.5306       |
| 0.6748        | 0.0   | 40   | 0.6712          | 0.3439         | 0.2229           | 0.5288             | 0.1210          | -177.0292      | -180.4664    | -2.5206         | -2.5581       |
| 0.6513        | 0.0   | 50   | 0.6707          | 0.4403         | 0.2838           | 0.5577             | 0.1565          | -176.4200      | -179.5021    | -2.5608         | -2.5853       |
| 0.6103        | 0.0   | 60   | 0.6695          | 0.6831         | 0.4769           | 0.5577             | 0.2063          | -174.4892      | -177.0740    | -2.5719         | -2.5933       |
| 1.0313        | 0.01  | 70   | 0.6724          | 0.7062         | 0.5084           | 0.5577             | 0.1978          | -174.1739      | -176.8436    | -2.5543         | -2.5843       |
| 0.6876        | 0.01  | 80   | 0.6804          | 0.6995         | 0.5144           | 0.5385             | 0.1850          | -174.1135      | -176.9104    | -2.5443         | -2.5829       |
| 0.9661        | 0.01  | 90   | 0.6828          | 0.7118         | 0.5376           | 0.5385             | 0.1742          | -173.8821      | -176.7873    | -2.5479         | -2.5846       |
| 0.7354        | 0.01  | 100  | 0.6757          | 0.6765         | 0.5039           | 0.5577             | 0.1726          | -174.2186      | -177.1401    | -2.5399         | -2.5758       |
| 1.0127        | 0.01  | 110  | 0.7129          | 0.6089         | 0.4855           | 0.5288             | 0.1234          | -174.4033      | -177.8165    | -2.5464         | -2.5760       |
| 1.0366        | 0.01  | 120  | 0.7440          | 0.6068         | 0.4946           | 0.5481             | 0.1122          | -174.3115      | -177.8369    | -2.5516         | -2.5804       |
| 1.2145        | 0.01  | 130  | 0.7564          | 0.6521         | 0.5396           | 0.5673             | 0.1125          | -173.8620      | -177.3846    | -2.5608         | -2.5878       |
| 0.8342        | 0.01  | 140  | 0.7649          | 0.6639         | 0.5519           | 0.5385             | 0.1119          | -173.7388      | -177.2668    | -2.5547         | -2.5828       |
| 0.7402        | 0.01  | 150  | 0.7991          | 0.5831         | 0.4883           | 0.5                | 0.0948          | -174.3747      | -178.0745    | -2.5498         | -2.5775       |
| 0.7162        | 0.01  | 160  | 0.8396          | 0.6134         | 0.5474           | 0.5096             | 0.0659          | -173.7835      | -177.7718    | -2.5445         | -2.5713       |
| 0.9396        | 0.01  | 170  | 0.8573          | 0.5700         | 0.5144           | 0.5288             | 0.0556          | -174.1144      | -178.2057    | -2.5326         | -2.5629       |
| 0.5958        | 0.01  | 180  | 0.8708          | 0.5526         | 0.5017           | 0.5288             | 0.0509          | -174.2406      | -178.3789    | -2.5227         | -2.5540       |
| 0.7588        | 0.02  | 190  | 0.8865          | 0.5428         | 0.4977           | 0.5288             | 0.0450          | -174.2806      | -178.4775    | -2.5207         | -2.5493       |
| 0.7811        | 0.02  | 200  | 0.8933          | 0.5797         | 0.5429           | 0.5192             | 0.0368          | -173.8286      | -178.1080    | -2.5171         | -2.5434       |
| 0.5735        | 0.02  | 210  | 0.8907          | 0.5577         | 0.5174           | 0.5288             | 0.0403          | -174.0838      | -178.3279    | -2.5069         | -2.5366       |
| 0.7709        | 0.02  | 220  | 0.8886          | 0.5602         | 0.5167           | 0.5192             | 0.0435          | -174.0907      | -178.3035    | -2.5041         | -2.5361       |
| 0.4914        | 0.02  | 230  | 0.8884          | 0.5237         | 0.4766           | 0.5192             | 0.0471          | -174.4924      | -178.6684    | -2.5050         | -2.5375       |
| 0.739         | 0.02  | 240  | 0.8910          | 0.5281         | 0.4796           | 0.5192             | 0.0485          | -174.4621      | -178.6240    | -2.5027         | -2.5351       |
| 0.5743        | 0.02  | 250  | 0.8911          | 0.5387         | 0.4878           | 0.5096             | 0.0509          | -174.3804      | -178.5185    | -2.5028         | -2.5350       |


### Framework versions

- PEFT 0.7.1
- Transformers 4.36.0
- Pytorch 2.0.1+cu117
- Datasets 2.15.0
- Tokenizers 0.15.0