just1nseo commited on
Commit
f1cea8d
·
verified ·
1 Parent(s): 611eebf

Model save

Browse files
README.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: alignment-handbook/zephyr-7b-sft-full
3
+ library_name: peft
4
+ license: apache-2.0
5
+ tags:
6
+ - trl
7
+ - dpo
8
+ - generated_from_trainer
9
+ model-index:
10
+ - name: zephyr-dpo-qlora-gpt4-5e-7
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # zephyr-dpo-qlora-gpt4-5e-7
18
+
19
+ This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the None dataset.
20
+ It achieves the following results on the evaluation set:
21
+ - Loss: 0.6862
22
+ - Rewards/chosen: -0.0539
23
+ - Rewards/rejected: -0.0744
24
+ - Rewards/accuracies: 0.5870
25
+ - Rewards/margins: 0.0206
26
+ - Rewards/margins Max: 0.1479
27
+ - Rewards/margins Min: -0.0932
28
+ - Rewards/margins Std: 0.0795
29
+ - Logps/rejected: -266.0218
30
+ - Logps/chosen: -289.9789
31
+ - Logits/rejected: -2.7191
32
+ - Logits/chosen: -2.7573
33
+
34
+ ## Model description
35
+
36
+ More information needed
37
+
38
+ ## Intended uses & limitations
39
+
40
+ More information needed
41
+
42
+ ## Training and evaluation data
43
+
44
+ More information needed
45
+
46
+ ## Training procedure
47
+
48
+ ### Training hyperparameters
49
+
50
+ The following hyperparameters were used during training:
51
+ - learning_rate: 5e-07
52
+ - train_batch_size: 4
53
+ - eval_batch_size: 8
54
+ - seed: 42
55
+ - distributed_type: multi-GPU
56
+ - num_devices: 2
57
+ - gradient_accumulation_steps: 2
58
+ - total_train_batch_size: 16
59
+ - total_eval_batch_size: 16
60
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
61
+ - lr_scheduler_type: cosine
62
+ - lr_scheduler_warmup_ratio: 0.1
63
+ - num_epochs: 1
64
+
65
+ ### Training results
66
+
67
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
68
+ |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:-------------------:|:-------------------:|:-------------------:|:--------------:|:------------:|:---------------:|:-------------:|
69
+ | 0.6666 | 0.28 | 100 | 0.6907 | -0.0061 | -0.0128 | 0.5780 | 0.0067 | 0.0489 | -0.0295 | 0.0259 | -259.8633 | -285.2067 | -2.7596 | -2.7978 |
70
+ | 0.6251 | 0.56 | 200 | 0.6876 | -0.0360 | -0.0522 | 0.5870 | 0.0162 | 0.1195 | -0.0752 | 0.0641 | -263.7989 | -288.1948 | -2.7319 | -2.7700 |
71
+ | 0.6029 | 0.85 | 300 | 0.6862 | -0.0539 | -0.0744 | 0.5870 | 0.0206 | 0.1479 | -0.0932 | 0.0795 | -266.0218 | -289.9789 | -2.7191 | -2.7573 |
72
+
73
+
74
+ ### Framework versions
75
+
76
+ - PEFT 0.7.1
77
+ - Transformers 4.39.0.dev0
78
+ - Pytorch 2.1.2+cu121
79
+ - Datasets 2.14.6
80
+ - Tokenizers 0.15.2
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1f8d90cf81614e95328cd8a081c8144118eb13c193fc72becca698198d35f162
3
  size 671150064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7a4a037c90b4c405b59a5974e15735c71b54ccc9029b823605aad60e65a95c10
3
  size 671150064
all_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.6394854995566355,
4
+ "train_runtime": 4022.9516,
5
+ "train_samples": 5678,
6
+ "train_samples_per_second": 1.411,
7
+ "train_steps_per_second": 0.088
8
+ }
runs/Jul28_07-09-30_notebook-deployment-48-7d9b6c99-khd85/events.out.tfevents.1722150666.notebook-deployment-48-7d9b6c99-khd85.1186379.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:bd4035af4c6c9626d3826038fd6dd59b9d6471c2f3364ef50a287af248b9028f
3
- size 35193
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:49f4a432632994ade570fdbc2be07835bdd7de53be862412ca172ffb3c6072e8
3
+ size 39947
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.6394854995566355,
4
+ "train_runtime": 4022.9516,
5
+ "train_samples": 5678,
6
+ "train_samples_per_second": 1.411,
7
+ "train_steps_per_second": 0.088
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,735 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 1.0,
5
+ "eval_steps": 100,
6
+ "global_step": 355,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0,
13
+ "grad_norm": 2.158629831791895,
14
+ "learning_rate": 1.3888888888888887e-08,
15
+ "logits/chosen": -2.804708957672119,
16
+ "logits/rejected": -2.8150453567504883,
17
+ "logps/chosen": -217.97438049316406,
18
+ "logps/rejected": -216.58865356445312,
19
+ "loss": 0.6931,
20
+ "rewards/accuracies": 0.0,
21
+ "rewards/chosen": 0.0,
22
+ "rewards/margins": 0.0,
23
+ "rewards/margins_max": 0.0,
24
+ "rewards/margins_min": 0.0,
25
+ "rewards/margins_std": 0.0,
26
+ "rewards/rejected": 0.0,
27
+ "step": 1
28
+ },
29
+ {
30
+ "epoch": 0.03,
31
+ "grad_norm": 8.21578821692664,
32
+ "learning_rate": 1.3888888888888888e-07,
33
+ "logits/chosen": -2.8844423294067383,
34
+ "logits/rejected": -2.799159526824951,
35
+ "logps/chosen": -366.7507629394531,
36
+ "logps/rejected": -275.4356384277344,
37
+ "loss": 0.6932,
38
+ "rewards/accuracies": 0.4027777910232544,
39
+ "rewards/chosen": -0.00016000178584363312,
40
+ "rewards/margins": -0.0003212409501429647,
41
+ "rewards/margins_max": 0.002417487557977438,
42
+ "rewards/margins_min": -0.004127700813114643,
43
+ "rewards/margins_std": 0.0029805246740579605,
44
+ "rewards/rejected": 0.00016123917885124683,
45
+ "step": 10
46
+ },
47
+ {
48
+ "epoch": 0.06,
49
+ "grad_norm": 2.0515925664507124,
50
+ "learning_rate": 2.7777777777777776e-07,
51
+ "logits/chosen": -2.74739670753479,
52
+ "logits/rejected": -2.6935513019561768,
53
+ "logps/chosen": -329.11138916015625,
54
+ "logps/rejected": -216.7494659423828,
55
+ "loss": 0.6929,
56
+ "rewards/accuracies": 0.625,
57
+ "rewards/chosen": 0.00035773837589658797,
58
+ "rewards/margins": 0.00041456689359620214,
59
+ "rewards/margins_max": 0.0031081512570381165,
60
+ "rewards/margins_min": -0.002314414829015732,
61
+ "rewards/margins_std": 0.0023837233893573284,
62
+ "rewards/rejected": -5.682848859578371e-05,
63
+ "step": 20
64
+ },
65
+ {
66
+ "epoch": 0.08,
67
+ "grad_norm": 2.310838256933408,
68
+ "learning_rate": 4.1666666666666667e-07,
69
+ "logits/chosen": -2.835951328277588,
70
+ "logits/rejected": -2.7546262741088867,
71
+ "logps/chosen": -329.10211181640625,
72
+ "logps/rejected": -233.05508422851562,
73
+ "loss": 0.6917,
74
+ "rewards/accuracies": 0.75,
75
+ "rewards/chosen": 0.0018310332670807838,
76
+ "rewards/margins": 0.002559047192335129,
77
+ "rewards/margins_max": 0.006475468166172504,
78
+ "rewards/margins_min": -0.0006590075790882111,
79
+ "rewards/margins_std": 0.0032629654742777348,
80
+ "rewards/rejected": -0.0007280135178007185,
81
+ "step": 30
82
+ },
83
+ {
84
+ "epoch": 0.11,
85
+ "grad_norm": 1.9718465521886885,
86
+ "learning_rate": 4.998060489154965e-07,
87
+ "logits/chosen": -2.8140110969543457,
88
+ "logits/rejected": -2.76088285446167,
89
+ "logps/chosen": -285.4794006347656,
90
+ "logps/rejected": -227.7167205810547,
91
+ "loss": 0.6905,
92
+ "rewards/accuracies": 0.800000011920929,
93
+ "rewards/chosen": 0.0035794698633253574,
94
+ "rewards/margins": 0.0046408590860664845,
95
+ "rewards/margins_max": 0.011065036058425903,
96
+ "rewards/margins_min": -0.0008336328901350498,
97
+ "rewards/margins_std": 0.005423419643193483,
98
+ "rewards/rejected": -0.0010613898048177361,
99
+ "step": 40
100
+ },
101
+ {
102
+ "epoch": 0.14,
103
+ "grad_norm": 2.3429157043740894,
104
+ "learning_rate": 4.976275538042932e-07,
105
+ "logits/chosen": -2.813694477081299,
106
+ "logits/rejected": -2.7310705184936523,
107
+ "logps/chosen": -317.00640869140625,
108
+ "logps/rejected": -234.43209838867188,
109
+ "loss": 0.688,
110
+ "rewards/accuracies": 0.887499988079071,
111
+ "rewards/chosen": 0.007789776660501957,
112
+ "rewards/margins": 0.010368594899773598,
113
+ "rewards/margins_max": 0.021328028291463852,
114
+ "rewards/margins_min": 0.0012870692880824208,
115
+ "rewards/margins_std": 0.009302936494350433,
116
+ "rewards/rejected": -0.00257881754077971,
117
+ "step": 50
118
+ },
119
+ {
120
+ "epoch": 0.17,
121
+ "grad_norm": 2.326595034093221,
122
+ "learning_rate": 4.930493069997119e-07,
123
+ "logits/chosen": -2.7512717247009277,
124
+ "logits/rejected": -2.7030184268951416,
125
+ "logps/chosen": -343.24273681640625,
126
+ "logps/rejected": -264.2438049316406,
127
+ "loss": 0.6845,
128
+ "rewards/accuracies": 0.925000011920929,
129
+ "rewards/chosen": 0.015377024188637733,
130
+ "rewards/margins": 0.01811446249485016,
131
+ "rewards/margins_max": 0.03716810420155525,
132
+ "rewards/margins_min": 0.003869078354910016,
133
+ "rewards/margins_std": 0.01510803122073412,
134
+ "rewards/rejected": -0.0027374387718737125,
135
+ "step": 60
136
+ },
137
+ {
138
+ "epoch": 0.2,
139
+ "grad_norm": 1.802371962995753,
140
+ "learning_rate": 4.861156761634013e-07,
141
+ "logits/chosen": -2.8008124828338623,
142
+ "logits/rejected": -2.7141239643096924,
143
+ "logps/chosen": -360.14227294921875,
144
+ "logps/rejected": -237.1912841796875,
145
+ "loss": 0.6809,
146
+ "rewards/accuracies": 0.9375,
147
+ "rewards/chosen": 0.02196549065411091,
148
+ "rewards/margins": 0.026430394500494003,
149
+ "rewards/margins_max": 0.05226575583219528,
150
+ "rewards/margins_min": 0.005380354821681976,
151
+ "rewards/margins_std": 0.02180148847401142,
152
+ "rewards/rejected": -0.0044649080373346806,
153
+ "step": 70
154
+ },
155
+ {
156
+ "epoch": 0.23,
157
+ "grad_norm": 2.0226092318731266,
158
+ "learning_rate": 4.768938549177392e-07,
159
+ "logits/chosen": -2.842362403869629,
160
+ "logits/rejected": -2.778277635574341,
161
+ "logps/chosen": -329.4476318359375,
162
+ "logps/rejected": -288.3177795410156,
163
+ "loss": 0.6774,
164
+ "rewards/accuracies": 0.9624999761581421,
165
+ "rewards/chosen": 0.02400829829275608,
166
+ "rewards/margins": 0.03225512057542801,
167
+ "rewards/margins_max": 0.06589716672897339,
168
+ "rewards/margins_min": 0.006357196718454361,
169
+ "rewards/margins_std": 0.027716059237718582,
170
+ "rewards/rejected": -0.008246822282671928,
171
+ "step": 80
172
+ },
173
+ {
174
+ "epoch": 0.25,
175
+ "grad_norm": 2.439721586727848,
176
+ "learning_rate": 4.654732116743193e-07,
177
+ "logits/chosen": -2.7840921878814697,
178
+ "logits/rejected": -2.700878620147705,
179
+ "logps/chosen": -336.05194091796875,
180
+ "logps/rejected": -200.1630096435547,
181
+ "loss": 0.672,
182
+ "rewards/accuracies": 0.949999988079071,
183
+ "rewards/chosen": 0.029071927070617676,
184
+ "rewards/margins": 0.04093674570322037,
185
+ "rewards/margins_max": 0.08617201447486877,
186
+ "rewards/margins_min": 0.006826425436884165,
187
+ "rewards/margins_std": 0.0362737737596035,
188
+ "rewards/rejected": -0.011864816769957542,
189
+ "step": 90
190
+ },
191
+ {
192
+ "epoch": 0.28,
193
+ "grad_norm": 2.2225465127306605,
194
+ "learning_rate": 4.519644235671752e-07,
195
+ "logits/chosen": -2.8582470417022705,
196
+ "logits/rejected": -2.7655489444732666,
197
+ "logps/chosen": -342.58416748046875,
198
+ "logps/rejected": -265.08441162109375,
199
+ "loss": 0.6666,
200
+ "rewards/accuracies": 0.862500011920929,
201
+ "rewards/chosen": 0.037609733641147614,
202
+ "rewards/margins": 0.050220172852277756,
203
+ "rewards/margins_max": 0.10150803625583649,
204
+ "rewards/margins_min": 0.007549063768237829,
205
+ "rewards/margins_std": 0.0440022274851799,
206
+ "rewards/rejected": -0.01261043269187212,
207
+ "step": 100
208
+ },
209
+ {
210
+ "epoch": 0.28,
211
+ "eval_logits/chosen": -2.7978174686431885,
212
+ "eval_logits/rejected": -2.7595677375793457,
213
+ "eval_logps/chosen": -285.2066650390625,
214
+ "eval_logps/rejected": -259.86334228515625,
215
+ "eval_loss": 0.6906961798667908,
216
+ "eval_rewards/accuracies": 0.578000009059906,
217
+ "eval_rewards/chosen": -0.00613220501691103,
218
+ "eval_rewards/margins": 0.006711836438626051,
219
+ "eval_rewards/margins_max": 0.04891812801361084,
220
+ "eval_rewards/margins_min": -0.02950645610690117,
221
+ "eval_rewards/margins_std": 0.025911005213856697,
222
+ "eval_rewards/rejected": -0.012844040989875793,
223
+ "eval_runtime": 428.4446,
224
+ "eval_samples_per_second": 4.668,
225
+ "eval_steps_per_second": 0.292,
226
+ "step": 100
227
+ },
228
+ {
229
+ "epoch": 0.31,
230
+ "grad_norm": 2.4760394100510057,
231
+ "learning_rate": 4.364984038837727e-07,
232
+ "logits/chosen": -2.8690743446350098,
233
+ "logits/rejected": -2.7577908039093018,
234
+ "logps/chosen": -385.70233154296875,
235
+ "logps/rejected": -288.461669921875,
236
+ "loss": 0.6591,
237
+ "rewards/accuracies": 0.9375,
238
+ "rewards/chosen": 0.0566016249358654,
239
+ "rewards/margins": 0.06779664754867554,
240
+ "rewards/margins_max": 0.13572999835014343,
241
+ "rewards/margins_min": 0.010938728228211403,
242
+ "rewards/margins_std": 0.05764765292406082,
243
+ "rewards/rejected": -0.011195014230906963,
244
+ "step": 110
245
+ },
246
+ {
247
+ "epoch": 0.34,
248
+ "grad_norm": 2.035920276115207,
249
+ "learning_rate": 4.1922503338800447e-07,
250
+ "logits/chosen": -2.8610854148864746,
251
+ "logits/rejected": -2.7858219146728516,
252
+ "logps/chosen": -387.9818115234375,
253
+ "logps/rejected": -267.68585205078125,
254
+ "loss": 0.657,
255
+ "rewards/accuracies": 0.925000011920929,
256
+ "rewards/chosen": 0.06763629615306854,
257
+ "rewards/margins": 0.07911892235279083,
258
+ "rewards/margins_max": 0.16764040291309357,
259
+ "rewards/margins_min": 0.013401249423623085,
260
+ "rewards/margins_std": 0.07113669812679291,
261
+ "rewards/rejected": -0.01148262806236744,
262
+ "step": 120
263
+ },
264
+ {
265
+ "epoch": 0.37,
266
+ "grad_norm": 2.010676971608138,
267
+ "learning_rate": 4.003117078299021e-07,
268
+ "logits/chosen": -2.818753957748413,
269
+ "logits/rejected": -2.741856098175049,
270
+ "logps/chosen": -396.28985595703125,
271
+ "logps/rejected": -302.45050048828125,
272
+ "loss": 0.6454,
273
+ "rewards/accuracies": 0.9750000238418579,
274
+ "rewards/chosen": 0.08936750888824463,
275
+ "rewards/margins": 0.10461701452732086,
276
+ "rewards/margins_max": 0.20179173350334167,
277
+ "rewards/margins_min": 0.02413741685450077,
278
+ "rewards/margins_std": 0.08073713630437851,
279
+ "rewards/rejected": -0.015249502845108509,
280
+ "step": 130
281
+ },
282
+ {
283
+ "epoch": 0.39,
284
+ "grad_norm": 1.7425219980216828,
285
+ "learning_rate": 3.799417157181075e-07,
286
+ "logits/chosen": -2.7920029163360596,
287
+ "logits/rejected": -2.7359843254089355,
288
+ "logps/chosen": -364.29058837890625,
289
+ "logps/rejected": -272.58355712890625,
290
+ "loss": 0.6467,
291
+ "rewards/accuracies": 0.9125000238418579,
292
+ "rewards/chosen": 0.08406248688697815,
293
+ "rewards/margins": 0.10730169713497162,
294
+ "rewards/margins_max": 0.22186696529388428,
295
+ "rewards/margins_min": 0.012349050492048264,
296
+ "rewards/margins_std": 0.09653683751821518,
297
+ "rewards/rejected": -0.02323923259973526,
298
+ "step": 140
299
+ },
300
+ {
301
+ "epoch": 0.42,
302
+ "grad_norm": 2.0933384277297957,
303
+ "learning_rate": 3.583124620760659e-07,
304
+ "logits/chosen": -2.825629711151123,
305
+ "logits/rejected": -2.7282826900482178,
306
+ "logps/chosen": -315.4014892578125,
307
+ "logps/rejected": -216.2842254638672,
308
+ "loss": 0.6435,
309
+ "rewards/accuracies": 0.9375,
310
+ "rewards/chosen": 0.07449642568826675,
311
+ "rewards/margins": 0.09953755140304565,
312
+ "rewards/margins_max": 0.21898682415485382,
313
+ "rewards/margins_min": 0.014027351513504982,
314
+ "rewards/margins_std": 0.09459034353494644,
315
+ "rewards/rejected": -0.0250411219894886,
316
+ "step": 150
317
+ },
318
+ {
319
+ "epoch": 0.45,
320
+ "grad_norm": 1.769669348441601,
321
+ "learning_rate": 3.356335553954679e-07,
322
+ "logits/chosen": -2.74135684967041,
323
+ "logits/rejected": -2.6822197437286377,
324
+ "logps/chosen": -335.69464111328125,
325
+ "logps/rejected": -237.88046264648438,
326
+ "loss": 0.6336,
327
+ "rewards/accuracies": 0.9750000238418579,
328
+ "rewards/chosen": 0.09816019237041473,
329
+ "rewards/margins": 0.1330377608537674,
330
+ "rewards/margins_max": 0.2625694274902344,
331
+ "rewards/margins_min": 0.02169904112815857,
332
+ "rewards/margins_std": 0.1116378903388977,
333
+ "rewards/rejected": -0.03487757220864296,
334
+ "step": 160
335
+ },
336
+ {
337
+ "epoch": 0.48,
338
+ "grad_norm": 1.8260362870579057,
339
+ "learning_rate": 3.121247763262235e-07,
340
+ "logits/chosen": -2.8216443061828613,
341
+ "logits/rejected": -2.7401599884033203,
342
+ "logps/chosen": -364.33587646484375,
343
+ "logps/rejected": -299.15887451171875,
344
+ "loss": 0.635,
345
+ "rewards/accuracies": 0.887499988079071,
346
+ "rewards/chosen": 0.11069444566965103,
347
+ "rewards/margins": 0.1395573914051056,
348
+ "rewards/margins_max": 0.2983313202857971,
349
+ "rewards/margins_min": 0.007289635483175516,
350
+ "rewards/margins_std": 0.13639435172080994,
351
+ "rewards/rejected": -0.02886294387280941,
352
+ "step": 170
353
+ },
354
+ {
355
+ "epoch": 0.51,
356
+ "grad_norm": 2.082827875491237,
357
+ "learning_rate": 2.880139477883347e-07,
358
+ "logits/chosen": -2.789100408554077,
359
+ "logits/rejected": -2.700629949569702,
360
+ "logps/chosen": -339.28125,
361
+ "logps/rejected": -296.9674377441406,
362
+ "loss": 0.6302,
363
+ "rewards/accuracies": 0.9125000238418579,
364
+ "rewards/chosen": 0.08692200481891632,
365
+ "rewards/margins": 0.11842750012874603,
366
+ "rewards/margins_max": 0.23567883670330048,
367
+ "rewards/margins_min": 0.011810391210019588,
368
+ "rewards/margins_std": 0.10012297332286835,
369
+ "rewards/rejected": -0.03150549530982971,
370
+ "step": 180
371
+ },
372
+ {
373
+ "epoch": 0.54,
374
+ "grad_norm": 2.575609586836664,
375
+ "learning_rate": 2.635347271463544e-07,
376
+ "logits/chosen": -2.787972927093506,
377
+ "logits/rejected": -2.6533846855163574,
378
+ "logps/chosen": -349.08880615234375,
379
+ "logps/rejected": -242.5450897216797,
380
+ "loss": 0.6257,
381
+ "rewards/accuracies": 0.987500011920929,
382
+ "rewards/chosen": 0.10282168537378311,
383
+ "rewards/margins": 0.14908696711063385,
384
+ "rewards/margins_max": 0.28802552819252014,
385
+ "rewards/margins_min": 0.025781046599149704,
386
+ "rewards/margins_std": 0.1190432757139206,
387
+ "rewards/rejected": -0.04626528546214104,
388
+ "step": 190
389
+ },
390
+ {
391
+ "epoch": 0.56,
392
+ "grad_norm": 2.049618345880406,
393
+ "learning_rate": 2.3892434184240534e-07,
394
+ "logits/chosen": -2.857001543045044,
395
+ "logits/rejected": -2.7506966590881348,
396
+ "logps/chosen": -387.255126953125,
397
+ "logps/rejected": -270.194091796875,
398
+ "loss": 0.6251,
399
+ "rewards/accuracies": 0.9750000238418579,
400
+ "rewards/chosen": 0.11739673465490341,
401
+ "rewards/margins": 0.1590125858783722,
402
+ "rewards/margins_max": 0.32226094603538513,
403
+ "rewards/margins_min": 0.03523118048906326,
404
+ "rewards/margins_std": 0.12896928191184998,
405
+ "rewards/rejected": -0.04161586984992027,
406
+ "step": 200
407
+ },
408
+ {
409
+ "epoch": 0.56,
410
+ "eval_logits/chosen": -2.769979953765869,
411
+ "eval_logits/rejected": -2.7318813800811768,
412
+ "eval_logps/chosen": -288.19482421875,
413
+ "eval_logps/rejected": -263.79888916015625,
414
+ "eval_loss": 0.6876310110092163,
415
+ "eval_rewards/accuracies": 0.5870000123977661,
416
+ "eval_rewards/chosen": -0.03601397946476936,
417
+ "eval_rewards/margins": 0.016185704618692398,
418
+ "eval_rewards/margins_max": 0.11952462792396545,
419
+ "eval_rewards/margins_min": -0.07521206140518188,
420
+ "eval_rewards/margins_std": 0.0641048476099968,
421
+ "eval_rewards/rejected": -0.05219968408346176,
422
+ "eval_runtime": 427.8872,
423
+ "eval_samples_per_second": 4.674,
424
+ "eval_steps_per_second": 0.292,
425
+ "step": 200
426
+ },
427
+ {
428
+ "epoch": 0.59,
429
+ "grad_norm": 1.9845466839870691,
430
+ "learning_rate": 2.1442129043167873e-07,
431
+ "logits/chosen": -2.751984119415283,
432
+ "logits/rejected": -2.6815638542175293,
433
+ "logps/chosen": -344.3485412597656,
434
+ "logps/rejected": -262.97393798828125,
435
+ "loss": 0.6188,
436
+ "rewards/accuracies": 0.9750000238418579,
437
+ "rewards/chosen": 0.10976312309503555,
438
+ "rewards/margins": 0.16779468953609467,
439
+ "rewards/margins_max": 0.34073972702026367,
440
+ "rewards/margins_min": 0.03692127764225006,
441
+ "rewards/margins_std": 0.14643600583076477,
442
+ "rewards/rejected": -0.05803157761693001,
443
+ "step": 210
444
+ },
445
+ {
446
+ "epoch": 0.62,
447
+ "grad_norm": 2.0373527988549145,
448
+ "learning_rate": 1.9026303129961048e-07,
449
+ "logits/chosen": -2.8502397537231445,
450
+ "logits/rejected": -2.7268834114074707,
451
+ "logps/chosen": -393.9187927246094,
452
+ "logps/rejected": -280.2196960449219,
453
+ "loss": 0.6142,
454
+ "rewards/accuracies": 0.9375,
455
+ "rewards/chosen": 0.13092514872550964,
456
+ "rewards/margins": 0.18590331077575684,
457
+ "rewards/margins_max": 0.3502216637134552,
458
+ "rewards/margins_min": 0.03089449368417263,
459
+ "rewards/margins_std": 0.14906269311904907,
460
+ "rewards/rejected": -0.0549781434237957,
461
+ "step": 220
462
+ },
463
+ {
464
+ "epoch": 0.65,
465
+ "grad_norm": 2.1828985591845402,
466
+ "learning_rate": 1.6668368145931396e-07,
467
+ "logits/chosen": -2.875049114227295,
468
+ "logits/rejected": -2.744711399078369,
469
+ "logps/chosen": -390.4495849609375,
470
+ "logps/rejected": -268.98565673828125,
471
+ "loss": 0.6067,
472
+ "rewards/accuracies": 0.949999988079071,
473
+ "rewards/chosen": 0.12363851070404053,
474
+ "rewards/margins": 0.17946995794773102,
475
+ "rewards/margins_max": 0.34318283200263977,
476
+ "rewards/margins_min": 0.036444298923015594,
477
+ "rewards/margins_std": 0.13844837248325348,
478
+ "rewards/rejected": -0.05583144351840019,
479
+ "step": 230
480
+ },
481
+ {
482
+ "epoch": 0.68,
483
+ "grad_norm": 1.8193698783653034,
484
+ "learning_rate": 1.4391174773015834e-07,
485
+ "logits/chosen": -2.802640199661255,
486
+ "logits/rejected": -2.71109938621521,
487
+ "logps/chosen": -333.38397216796875,
488
+ "logps/rejected": -289.92462158203125,
489
+ "loss": 0.6224,
490
+ "rewards/accuracies": 0.9125000238418579,
491
+ "rewards/chosen": 0.09798085689544678,
492
+ "rewards/margins": 0.14519774913787842,
493
+ "rewards/margins_max": 0.293338418006897,
494
+ "rewards/margins_min": 0.01530275959521532,
495
+ "rewards/margins_std": 0.12239019572734833,
496
+ "rewards/rejected": -0.047216884791851044,
497
+ "step": 240
498
+ },
499
+ {
500
+ "epoch": 0.7,
501
+ "grad_norm": 1.9709337022102749,
502
+ "learning_rate": 1.2216791228457775e-07,
503
+ "logits/chosen": -2.7975411415100098,
504
+ "logits/rejected": -2.6804046630859375,
505
+ "logps/chosen": -351.70257568359375,
506
+ "logps/rejected": -260.0617370605469,
507
+ "loss": 0.6084,
508
+ "rewards/accuracies": 0.9624999761581421,
509
+ "rewards/chosen": 0.12191393226385117,
510
+ "rewards/margins": 0.1914171278476715,
511
+ "rewards/margins_max": 0.36713144183158875,
512
+ "rewards/margins_min": 0.05136305093765259,
513
+ "rewards/margins_std": 0.142560213804245,
514
+ "rewards/rejected": -0.06950321048498154,
515
+ "step": 250
516
+ },
517
+ {
518
+ "epoch": 0.73,
519
+ "grad_norm": 1.752772587188972,
520
+ "learning_rate": 1.0166289402331391e-07,
521
+ "logits/chosen": -2.8487606048583984,
522
+ "logits/rejected": -2.737738847732544,
523
+ "logps/chosen": -345.0237731933594,
524
+ "logps/rejected": -265.47198486328125,
525
+ "loss": 0.6074,
526
+ "rewards/accuracies": 0.9375,
527
+ "rewards/chosen": 0.11421672999858856,
528
+ "rewards/margins": 0.17133933305740356,
529
+ "rewards/margins_max": 0.3663348853588104,
530
+ "rewards/margins_min": 0.02202555350959301,
531
+ "rewards/margins_std": 0.15875253081321716,
532
+ "rewards/rejected": -0.057122599333524704,
533
+ "step": 260
534
+ },
535
+ {
536
+ "epoch": 0.76,
537
+ "grad_norm": 2.056956615608033,
538
+ "learning_rate": 8.259540650444734e-08,
539
+ "logits/chosen": -2.8006067276000977,
540
+ "logits/rejected": -2.7100348472595215,
541
+ "logps/chosen": -365.325927734375,
542
+ "logps/rejected": -270.2814636230469,
543
+ "loss": 0.6098,
544
+ "rewards/accuracies": 0.9125000238418579,
545
+ "rewards/chosen": 0.1257423460483551,
546
+ "rewards/margins": 0.20250901579856873,
547
+ "rewards/margins_max": 0.3718946874141693,
548
+ "rewards/margins_min": 0.04439568892121315,
549
+ "rewards/margins_std": 0.1491011530160904,
550
+ "rewards/rejected": -0.07676666229963303,
551
+ "step": 270
552
+ },
553
+ {
554
+ "epoch": 0.79,
555
+ "grad_norm": 1.9417182779069821,
556
+ "learning_rate": 6.515023221586721e-08,
557
+ "logits/chosen": -2.7494287490844727,
558
+ "logits/rejected": -2.7017343044281006,
559
+ "logps/chosen": -320.38360595703125,
560
+ "logps/rejected": -279.5456848144531,
561
+ "loss": 0.6125,
562
+ "rewards/accuracies": 0.8999999761581421,
563
+ "rewards/chosen": 0.10271165519952774,
564
+ "rewards/margins": 0.16274654865264893,
565
+ "rewards/margins_max": 0.3613172173500061,
566
+ "rewards/margins_min": 0.03712720423936844,
567
+ "rewards/margins_std": 0.14939478039741516,
568
+ "rewards/rejected": -0.060034893453121185,
569
+ "step": 280
570
+ },
571
+ {
572
+ "epoch": 0.82,
573
+ "grad_norm": 2.159880856830845,
574
+ "learning_rate": 4.949643185335287e-08,
575
+ "logits/chosen": -2.7616562843322754,
576
+ "logits/rejected": -2.6814732551574707,
577
+ "logps/chosen": -331.0811462402344,
578
+ "logps/rejected": -272.906982421875,
579
+ "loss": 0.6168,
580
+ "rewards/accuracies": 0.9375,
581
+ "rewards/chosen": 0.0991094559431076,
582
+ "rewards/margins": 0.16621682047843933,
583
+ "rewards/margins_max": 0.34492212533950806,
584
+ "rewards/margins_min": 0.02063518390059471,
585
+ "rewards/margins_std": 0.14825591444969177,
586
+ "rewards/rejected": -0.06710737198591232,
587
+ "step": 290
588
+ },
589
+ {
590
+ "epoch": 0.85,
591
+ "grad_norm": 2.247267229121733,
592
+ "learning_rate": 3.578570595810274e-08,
593
+ "logits/chosen": -2.805422306060791,
594
+ "logits/rejected": -2.7308857440948486,
595
+ "logps/chosen": -351.537109375,
596
+ "logps/rejected": -296.57861328125,
597
+ "loss": 0.6029,
598
+ "rewards/accuracies": 0.9375,
599
+ "rewards/chosen": 0.12097059190273285,
600
+ "rewards/margins": 0.19219207763671875,
601
+ "rewards/margins_max": 0.3688820004463196,
602
+ "rewards/margins_min": 0.05189325660467148,
603
+ "rewards/margins_std": 0.14433594048023224,
604
+ "rewards/rejected": -0.07122147083282471,
605
+ "step": 300
606
+ },
607
+ {
608
+ "epoch": 0.85,
609
+ "eval_logits/chosen": -2.757275342941284,
610
+ "eval_logits/rejected": -2.7190775871276855,
611
+ "eval_logps/chosen": -289.97894287109375,
612
+ "eval_logps/rejected": -266.02178955078125,
613
+ "eval_loss": 0.6861926913261414,
614
+ "eval_rewards/accuracies": 0.5870000123977661,
615
+ "eval_rewards/chosen": -0.05385516211390495,
616
+ "eval_rewards/margins": 0.020573224872350693,
617
+ "eval_rewards/margins_max": 0.14790384471416473,
618
+ "eval_rewards/margins_min": -0.09322728216648102,
619
+ "eval_rewards/margins_std": 0.079450324177742,
620
+ "eval_rewards/rejected": -0.07442838698625565,
621
+ "eval_runtime": 427.9454,
622
+ "eval_samples_per_second": 4.673,
623
+ "eval_steps_per_second": 0.292,
624
+ "step": 300
625
+ },
626
+ {
627
+ "epoch": 0.87,
628
+ "grad_norm": 1.8974226134430692,
629
+ "learning_rate": 2.415092479103503e-08,
630
+ "logits/chosen": -2.840935230255127,
631
+ "logits/rejected": -2.709672212600708,
632
+ "logps/chosen": -345.2643737792969,
633
+ "logps/rejected": -222.6641082763672,
634
+ "loss": 0.6093,
635
+ "rewards/accuracies": 0.9624999761581421,
636
+ "rewards/chosen": 0.1101798266172409,
637
+ "rewards/margins": 0.1844477355480194,
638
+ "rewards/margins_max": 0.3727789521217346,
639
+ "rewards/margins_min": 0.046926215291023254,
640
+ "rewards/margins_std": 0.1535920351743698,
641
+ "rewards/rejected": -0.0742679089307785,
642
+ "step": 310
643
+ },
644
+ {
645
+ "epoch": 0.9,
646
+ "grad_norm": 1.7695491697233519,
647
+ "learning_rate": 1.4704840690808656e-08,
648
+ "logits/chosen": -2.796245813369751,
649
+ "logits/rejected": -2.7119815349578857,
650
+ "logps/chosen": -339.24664306640625,
651
+ "logps/rejected": -268.58984375,
652
+ "loss": 0.6037,
653
+ "rewards/accuracies": 0.949999988079071,
654
+ "rewards/chosen": 0.1139553040266037,
655
+ "rewards/margins": 0.17954252660274506,
656
+ "rewards/margins_max": 0.37754225730895996,
657
+ "rewards/margins_min": 0.0253077894449234,
658
+ "rewards/margins_std": 0.16321782767772675,
659
+ "rewards/rejected": -0.06558724492788315,
660
+ "step": 320
661
+ },
662
+ {
663
+ "epoch": 0.93,
664
+ "grad_norm": 2.0645664151522136,
665
+ "learning_rate": 7.538995394063995e-09,
666
+ "logits/chosen": -2.8658013343811035,
667
+ "logits/rejected": -2.760768175125122,
668
+ "logps/chosen": -386.96258544921875,
669
+ "logps/rejected": -275.399658203125,
670
+ "loss": 0.6073,
671
+ "rewards/accuracies": 0.887499988079071,
672
+ "rewards/chosen": 0.13314954936504364,
673
+ "rewards/margins": 0.20348164439201355,
674
+ "rewards/margins_max": 0.3888625502586365,
675
+ "rewards/margins_min": 0.05053550750017166,
676
+ "rewards/margins_std": 0.15889115631580353,
677
+ "rewards/rejected": -0.0703321173787117,
678
+ "step": 330
679
+ },
680
+ {
681
+ "epoch": 0.96,
682
+ "grad_norm": 2.0081843762801617,
683
+ "learning_rate": 2.7228329070159705e-09,
684
+ "logits/chosen": -2.7621803283691406,
685
+ "logits/rejected": -2.6747400760650635,
686
+ "logps/chosen": -334.4164123535156,
687
+ "logps/rejected": -258.71417236328125,
688
+ "loss": 0.607,
689
+ "rewards/accuracies": 0.8999999761581421,
690
+ "rewards/chosen": 0.10603541135787964,
691
+ "rewards/margins": 0.17292913794517517,
692
+ "rewards/margins_max": 0.366424024105072,
693
+ "rewards/margins_min": 0.027338892221450806,
694
+ "rewards/margins_std": 0.15261869132518768,
695
+ "rewards/rejected": -0.06689374148845673,
696
+ "step": 340
697
+ },
698
+ {
699
+ "epoch": 0.99,
700
+ "grad_norm": 3.396556816184061,
701
+ "learning_rate": 3.0302652553296226e-10,
702
+ "logits/chosen": -2.754178285598755,
703
+ "logits/rejected": -2.6804542541503906,
704
+ "logps/chosen": -348.5409851074219,
705
+ "logps/rejected": -294.7231750488281,
706
+ "loss": 0.6046,
707
+ "rewards/accuracies": 0.9624999761581421,
708
+ "rewards/chosen": 0.10722661018371582,
709
+ "rewards/margins": 0.18621695041656494,
710
+ "rewards/margins_max": 0.36825960874557495,
711
+ "rewards/margins_min": 0.045198000967502594,
712
+ "rewards/margins_std": 0.14424237608909607,
713
+ "rewards/rejected": -0.07899035513401031,
714
+ "step": 350
715
+ },
716
+ {
717
+ "epoch": 1.0,
718
+ "step": 355,
719
+ "total_flos": 0.0,
720
+ "train_loss": 0.6394854995566355,
721
+ "train_runtime": 4022.9516,
722
+ "train_samples_per_second": 1.411,
723
+ "train_steps_per_second": 0.088
724
+ }
725
+ ],
726
+ "logging_steps": 10,
727
+ "max_steps": 355,
728
+ "num_input_tokens_seen": 0,
729
+ "num_train_epochs": 1,
730
+ "save_steps": 100,
731
+ "total_flos": 0.0,
732
+ "train_batch_size": 4,
733
+ "trial_name": null,
734
+ "trial_params": null
735
+ }