BraylonDash commited on
Commit
eec7ad9
·
verified ·
1 Parent(s): 705d619

Model save

Browse files
README.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ tags:
4
+ - trl
5
+ - dpo
6
+ - generated_from_trainer
7
+ base_model: DUAL-GPO/zephyr-7b-ipo-qlora-v0-merged
8
+ model-index:
9
+ - name: zephyr-7b-ipo-0k-15k-i1
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ # zephyr-7b-ipo-0k-15k-i1
17
+
18
+ This model is a fine-tuned version of [DUAL-GPO/zephyr-7b-ipo-qlora-v0-merged](https://huggingface.co/DUAL-GPO/zephyr-7b-ipo-qlora-v0-merged) on the None dataset.
19
+
20
+ ## Model description
21
+
22
+ More information needed
23
+
24
+ ## Intended uses & limitations
25
+
26
+ More information needed
27
+
28
+ ## Training and evaluation data
29
+
30
+ More information needed
31
+
32
+ ## Training procedure
33
+
34
+ ### Training hyperparameters
35
+
36
+ The following hyperparameters were used during training:
37
+ - learning_rate: 5e-06
38
+ - train_batch_size: 2
39
+ - eval_batch_size: 2
40
+ - seed: 42
41
+ - distributed_type: multi-GPU
42
+ - num_devices: 3
43
+ - gradient_accumulation_steps: 2
44
+ - total_train_batch_size: 12
45
+ - total_eval_batch_size: 6
46
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
+ - lr_scheduler_type: cosine
48
+ - lr_scheduler_warmup_ratio: 0.1
49
+ - num_epochs: 1
50
+
51
+ ### Training results
52
+
53
+
54
+
55
+ ### Framework versions
56
+
57
+ - PEFT 0.7.1
58
+ - Transformers 4.36.2
59
+ - Pytorch 2.1.2
60
+ - Datasets 2.14.6
61
+ - Tokenizers 0.15.2
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:292002b6fe69f98790c1c80f6568636cd538387dea710ef9293e2297c52cee06
3
  size 671150064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dabb54b0af44b6a72514e758d1508a00493cf5748b8afe79b0691c1be63d84ce
3
  size 671150064
all_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.47997802200317385,
4
+ "train_runtime": 13155.732,
5
+ "train_samples": 15000,
6
+ "train_samples_per_second": 1.14,
7
+ "train_steps_per_second": 0.095
8
+ }
emissions.csv ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ timestamp,project_name,run_id,duration,emissions,emissions_rate,cpu_power,gpu_power,ram_power,cpu_energy,gpu_energy,ram_energy,energy_consumed,country_name,country_iso_code,region,cloud_provider,cloud_region,os,python_version,codecarbon_version,cpu_count,cpu_model,gpu_count,gpu_model,longitude,latitude,ram_total_size,tracking_mode,on_cloud,pue
2
+ 2024-09-20T19:24:49,codecarbon,ba2c61e1-2e7d-419d-b922-799bb02ae527,13155.737360954285,0.006773657887472573,5.148824198616586e-07,42.5,630.561,188.74309015274048,0.1553077856000925,2.007423683997746,0.6866731924424456,2.8494046620402878,Canada,CAN,quebec,,,Linux-5.15.0-84-generic-x86_64-with-glibc2.35,3.10.14,2.2.3,32,Intel(R) Xeon(R) W-3335 CPU @ 3.40GHz,4,4 x NVIDIA GeForce RTX 4090,-71.2,46.8,503.3149070739746,machine,N,1.0
runs/Sep20_15-03-45_gpu4-119-5/events.out.tfevents.1726811133.gpu4-119-5.1089928.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:63e6bddda3d42662838f9a40d42575bb6a7044052e0b59b98a142d71554d46e3
3
- size 82441
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d340e3e7feffac4674349358a0a07500fa66d98e092d231f5ef8a1b130729194
3
+ size 84697
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.47997802200317385,
4
+ "train_runtime": 13155.732,
5
+ "train_samples": 15000,
6
+ "train_samples_per_second": 1.14,
7
+ "train_steps_per_second": 0.095
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1794 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 1.0,
5
+ "eval_steps": 500,
6
+ "global_step": 1250,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0,
13
+ "learning_rate": 4e-08,
14
+ "logits/chosen": -2.683027744293213,
15
+ "logits/rejected": -2.0717973709106445,
16
+ "logps/chosen": -497.5299987792969,
17
+ "logps/rejected": -340.85333251953125,
18
+ "loss": 0.6931,
19
+ "rewards/accuracies": 0.0,
20
+ "rewards/chosen": 0.0,
21
+ "rewards/margins": 0.0,
22
+ "rewards/rejected": 0.0,
23
+ "step": 1
24
+ },
25
+ {
26
+ "epoch": 0.01,
27
+ "learning_rate": 4.0000000000000003e-07,
28
+ "logits/chosen": -2.517321825027466,
29
+ "logits/rejected": -2.1676418781280518,
30
+ "logps/chosen": -288.0818176269531,
31
+ "logps/rejected": -199.1251678466797,
32
+ "loss": 0.6932,
33
+ "rewards/accuracies": 0.3611111044883728,
34
+ "rewards/chosen": 0.00022377756249625236,
35
+ "rewards/margins": 0.00016948273696471006,
36
+ "rewards/rejected": 5.429480006569065e-05,
37
+ "step": 10
38
+ },
39
+ {
40
+ "epoch": 0.02,
41
+ "learning_rate": 8.000000000000001e-07,
42
+ "logits/chosen": -2.39406156539917,
43
+ "logits/rejected": -2.1605257987976074,
44
+ "logps/chosen": -271.68157958984375,
45
+ "logps/rejected": -219.1865234375,
46
+ "loss": 0.6934,
47
+ "rewards/accuracies": 0.3499999940395355,
48
+ "rewards/chosen": 0.0002538819098845124,
49
+ "rewards/margins": -0.0007037109462544322,
50
+ "rewards/rejected": 0.0009575928561389446,
51
+ "step": 20
52
+ },
53
+ {
54
+ "epoch": 0.02,
55
+ "learning_rate": 1.2000000000000002e-06,
56
+ "logits/chosen": -2.306056261062622,
57
+ "logits/rejected": -2.278916358947754,
58
+ "logps/chosen": -270.09515380859375,
59
+ "logps/rejected": -301.93194580078125,
60
+ "loss": 0.6926,
61
+ "rewards/accuracies": 0.5,
62
+ "rewards/chosen": 0.0033609136007726192,
63
+ "rewards/margins": 0.0015898284036666155,
64
+ "rewards/rejected": 0.0017710853135213256,
65
+ "step": 30
66
+ },
67
+ {
68
+ "epoch": 0.03,
69
+ "learning_rate": 1.6000000000000001e-06,
70
+ "logits/chosen": -2.5502350330352783,
71
+ "logits/rejected": -2.383606433868408,
72
+ "logps/chosen": -211.55270385742188,
73
+ "logps/rejected": -190.15623474121094,
74
+ "loss": 0.6919,
75
+ "rewards/accuracies": 0.5249999761581421,
76
+ "rewards/chosen": 0.005155195482075214,
77
+ "rewards/margins": 0.0021972700487822294,
78
+ "rewards/rejected": 0.002957924734801054,
79
+ "step": 40
80
+ },
81
+ {
82
+ "epoch": 0.04,
83
+ "learning_rate": 2.0000000000000003e-06,
84
+ "logits/chosen": -2.3993449211120605,
85
+ "logits/rejected": -2.355790615081787,
86
+ "logps/chosen": -196.9150390625,
87
+ "logps/rejected": -221.62014770507812,
88
+ "loss": 0.69,
89
+ "rewards/accuracies": 0.5,
90
+ "rewards/chosen": 0.008768909610807896,
91
+ "rewards/margins": 0.005697342567145824,
92
+ "rewards/rejected": 0.0030715656466782093,
93
+ "step": 50
94
+ },
95
+ {
96
+ "epoch": 0.05,
97
+ "learning_rate": 2.4000000000000003e-06,
98
+ "logits/chosen": -2.525311231613159,
99
+ "logits/rejected": -2.3309919834136963,
100
+ "logps/chosen": -243.81521606445312,
101
+ "logps/rejected": -289.21868896484375,
102
+ "loss": 0.6895,
103
+ "rewards/accuracies": 0.5,
104
+ "rewards/chosen": 0.012168792076408863,
105
+ "rewards/margins": 0.00664276909083128,
106
+ "rewards/rejected": 0.005526022985577583,
107
+ "step": 60
108
+ },
109
+ {
110
+ "epoch": 0.06,
111
+ "learning_rate": 2.8000000000000003e-06,
112
+ "logits/chosen": -2.2812366485595703,
113
+ "logits/rejected": -2.306039810180664,
114
+ "logps/chosen": -225.4685516357422,
115
+ "logps/rejected": -229.1845703125,
116
+ "loss": 0.683,
117
+ "rewards/accuracies": 0.699999988079071,
118
+ "rewards/chosen": 0.016351569443941116,
119
+ "rewards/margins": 0.022075170651078224,
120
+ "rewards/rejected": -0.0057236007414758205,
121
+ "step": 70
122
+ },
123
+ {
124
+ "epoch": 0.06,
125
+ "learning_rate": 3.2000000000000003e-06,
126
+ "logits/chosen": -2.44425892829895,
127
+ "logits/rejected": -2.432558536529541,
128
+ "logps/chosen": -261.52703857421875,
129
+ "logps/rejected": -270.8040466308594,
130
+ "loss": 0.6798,
131
+ "rewards/accuracies": 0.625,
132
+ "rewards/chosen": 0.03062610700726509,
133
+ "rewards/margins": 0.02709539607167244,
134
+ "rewards/rejected": 0.003530709771439433,
135
+ "step": 80
136
+ },
137
+ {
138
+ "epoch": 0.07,
139
+ "learning_rate": 3.6000000000000003e-06,
140
+ "logits/chosen": -2.3680663108825684,
141
+ "logits/rejected": -2.022505283355713,
142
+ "logps/chosen": -264.2672424316406,
143
+ "logps/rejected": -186.8569793701172,
144
+ "loss": 0.6624,
145
+ "rewards/accuracies": 0.75,
146
+ "rewards/chosen": 0.023214900866150856,
147
+ "rewards/margins": 0.05782170966267586,
148
+ "rewards/rejected": -0.03460680693387985,
149
+ "step": 90
150
+ },
151
+ {
152
+ "epoch": 0.08,
153
+ "learning_rate": 4.000000000000001e-06,
154
+ "logits/chosen": -2.32338547706604,
155
+ "logits/rejected": -2.332152843475342,
156
+ "logps/chosen": -283.2625732421875,
157
+ "logps/rejected": -274.24365234375,
158
+ "loss": 0.6627,
159
+ "rewards/accuracies": 0.574999988079071,
160
+ "rewards/chosen": -0.10683544725179672,
161
+ "rewards/margins": 0.053194332867860794,
162
+ "rewards/rejected": -0.160029798746109,
163
+ "step": 100
164
+ },
165
+ {
166
+ "epoch": 0.09,
167
+ "learning_rate": 4.4e-06,
168
+ "logits/chosen": -2.220726490020752,
169
+ "logits/rejected": -2.0992746353149414,
170
+ "logps/chosen": -226.12930297851562,
171
+ "logps/rejected": -215.71804809570312,
172
+ "loss": 0.6366,
173
+ "rewards/accuracies": 0.574999988079071,
174
+ "rewards/chosen": -0.13904300332069397,
175
+ "rewards/margins": 0.1515841782093048,
176
+ "rewards/rejected": -0.2906271815299988,
177
+ "step": 110
178
+ },
179
+ {
180
+ "epoch": 0.1,
181
+ "learning_rate": 4.800000000000001e-06,
182
+ "logits/chosen": -2.012620449066162,
183
+ "logits/rejected": -2.0512988567352295,
184
+ "logps/chosen": -296.2652893066406,
185
+ "logps/rejected": -368.3299560546875,
186
+ "loss": 0.5823,
187
+ "rewards/accuracies": 0.75,
188
+ "rewards/chosen": -0.5935014486312866,
189
+ "rewards/margins": 0.31774038076400757,
190
+ "rewards/rejected": -0.9112418293952942,
191
+ "step": 120
192
+ },
193
+ {
194
+ "epoch": 0.1,
195
+ "learning_rate": 4.999756310023261e-06,
196
+ "logits/chosen": -2.3318862915039062,
197
+ "logits/rejected": -2.2144980430603027,
198
+ "logps/chosen": -300.13494873046875,
199
+ "logps/rejected": -285.32867431640625,
200
+ "loss": 0.5917,
201
+ "rewards/accuracies": 0.574999988079071,
202
+ "rewards/chosen": -0.6550928950309753,
203
+ "rewards/margins": 0.21246926486492157,
204
+ "rewards/rejected": -0.8675621151924133,
205
+ "step": 130
206
+ },
207
+ {
208
+ "epoch": 0.11,
209
+ "learning_rate": 4.997807075247147e-06,
210
+ "logits/chosen": -2.1210927963256836,
211
+ "logits/rejected": -1.8453128337860107,
212
+ "logps/chosen": -247.13992309570312,
213
+ "logps/rejected": -273.41650390625,
214
+ "loss": 0.5644,
215
+ "rewards/accuracies": 0.6000000238418579,
216
+ "rewards/chosen": -0.5666564702987671,
217
+ "rewards/margins": 0.40540236234664917,
218
+ "rewards/rejected": -0.972058892250061,
219
+ "step": 140
220
+ },
221
+ {
222
+ "epoch": 0.12,
223
+ "learning_rate": 4.993910125649561e-06,
224
+ "logits/chosen": -2.1635901927948,
225
+ "logits/rejected": -2.0486695766448975,
226
+ "logps/chosen": -275.05364990234375,
227
+ "logps/rejected": -299.96148681640625,
228
+ "loss": 0.5361,
229
+ "rewards/accuracies": 0.75,
230
+ "rewards/chosen": -0.45165014266967773,
231
+ "rewards/margins": 0.5924302935600281,
232
+ "rewards/rejected": -1.0440804958343506,
233
+ "step": 150
234
+ },
235
+ {
236
+ "epoch": 0.13,
237
+ "learning_rate": 4.988068499954578e-06,
238
+ "logits/chosen": -2.0027568340301514,
239
+ "logits/rejected": -2.079007148742676,
240
+ "logps/chosen": -488.09222412109375,
241
+ "logps/rejected": -544.9346923828125,
242
+ "loss": 0.5923,
243
+ "rewards/accuracies": 0.574999988079071,
244
+ "rewards/chosen": -2.358773946762085,
245
+ "rewards/margins": 0.36690616607666016,
246
+ "rewards/rejected": -2.725680112838745,
247
+ "step": 160
248
+ },
249
+ {
250
+ "epoch": 0.14,
251
+ "learning_rate": 4.980286753286196e-06,
252
+ "logits/chosen": -2.197312831878662,
253
+ "logits/rejected": -1.8223193883895874,
254
+ "logps/chosen": -473.13067626953125,
255
+ "logps/rejected": -477.058837890625,
256
+ "loss": 0.5539,
257
+ "rewards/accuracies": 0.625,
258
+ "rewards/chosen": -2.0573782920837402,
259
+ "rewards/margins": 0.36220186948776245,
260
+ "rewards/rejected": -2.4195804595947266,
261
+ "step": 170
262
+ },
263
+ {
264
+ "epoch": 0.14,
265
+ "learning_rate": 4.970570953616383e-06,
266
+ "logits/chosen": -2.0688188076019287,
267
+ "logits/rejected": -2.0571534633636475,
268
+ "logps/chosen": -367.75189208984375,
269
+ "logps/rejected": -408.49432373046875,
270
+ "loss": 0.5837,
271
+ "rewards/accuracies": 0.44999998807907104,
272
+ "rewards/chosen": -1.4084174633026123,
273
+ "rewards/margins": 0.27590009570121765,
274
+ "rewards/rejected": -1.6843173503875732,
275
+ "step": 180
276
+ },
277
+ {
278
+ "epoch": 0.15,
279
+ "learning_rate": 4.958928677033465e-06,
280
+ "logits/chosen": -1.9835456609725952,
281
+ "logits/rejected": -1.9227497577667236,
282
+ "logps/chosen": -397.43719482421875,
283
+ "logps/rejected": -443.9778747558594,
284
+ "loss": 0.4748,
285
+ "rewards/accuracies": 0.6000000238418579,
286
+ "rewards/chosen": -1.6137508153915405,
287
+ "rewards/margins": 0.6342134475708008,
288
+ "rewards/rejected": -2.247964382171631,
289
+ "step": 190
290
+ },
291
+ {
292
+ "epoch": 0.16,
293
+ "learning_rate": 4.9453690018345144e-06,
294
+ "logits/chosen": -1.8892637491226196,
295
+ "logits/rejected": -1.7664573192596436,
296
+ "logps/chosen": -440.25885009765625,
297
+ "logps/rejected": -551.2183837890625,
298
+ "loss": 0.4752,
299
+ "rewards/accuracies": 0.625,
300
+ "rewards/chosen": -1.8693583011627197,
301
+ "rewards/margins": 0.8833104968070984,
302
+ "rewards/rejected": -2.752668857574463,
303
+ "step": 200
304
+ },
305
+ {
306
+ "epoch": 0.17,
307
+ "learning_rate": 4.9299025014463665e-06,
308
+ "logits/chosen": -2.1745524406433105,
309
+ "logits/rejected": -1.8148044347763062,
310
+ "logps/chosen": -423.63128662109375,
311
+ "logps/rejected": -463.9573669433594,
312
+ "loss": 0.4876,
313
+ "rewards/accuracies": 0.699999988079071,
314
+ "rewards/chosen": -1.4858825206756592,
315
+ "rewards/margins": 1.151026964187622,
316
+ "rewards/rejected": -2.6369097232818604,
317
+ "step": 210
318
+ },
319
+ {
320
+ "epoch": 0.18,
321
+ "learning_rate": 4.912541236180779e-06,
322
+ "logits/chosen": -2.0313258171081543,
323
+ "logits/rejected": -1.6316728591918945,
324
+ "logps/chosen": -486.4668884277344,
325
+ "logps/rejected": -533.1800537109375,
326
+ "loss": 0.547,
327
+ "rewards/accuracies": 0.699999988079071,
328
+ "rewards/chosen": -2.4689016342163086,
329
+ "rewards/margins": 0.798893928527832,
330
+ "rewards/rejected": -3.2677950859069824,
331
+ "step": 220
332
+ },
333
+ {
334
+ "epoch": 0.18,
335
+ "learning_rate": 4.893298743830168e-06,
336
+ "logits/chosen": -2.042020320892334,
337
+ "logits/rejected": -2.0924315452575684,
338
+ "logps/chosen": -414.12152099609375,
339
+ "logps/rejected": -496.2425231933594,
340
+ "loss": 0.5683,
341
+ "rewards/accuracies": 0.625,
342
+ "rewards/chosen": -2.165775775909424,
343
+ "rewards/margins": 0.4269164502620697,
344
+ "rewards/rejected": -2.5926921367645264,
345
+ "step": 230
346
+ },
347
+ {
348
+ "epoch": 0.19,
349
+ "learning_rate": 4.8721900291112415e-06,
350
+ "logits/chosen": -1.9427902698516846,
351
+ "logits/rejected": -1.7720750570297241,
352
+ "logps/chosen": -426.7705078125,
353
+ "logps/rejected": -476.8197326660156,
354
+ "loss": 0.5076,
355
+ "rewards/accuracies": 0.6499999761581421,
356
+ "rewards/chosen": -2.199129104614258,
357
+ "rewards/margins": 0.538731575012207,
358
+ "rewards/rejected": -2.7378602027893066,
359
+ "step": 240
360
+ },
361
+ {
362
+ "epoch": 0.2,
363
+ "learning_rate": 4.849231551964771e-06,
364
+ "logits/chosen": -1.916007399559021,
365
+ "logits/rejected": -1.787649154663086,
366
+ "logps/chosen": -392.2034912109375,
367
+ "logps/rejected": -512.083984375,
368
+ "loss": 0.4932,
369
+ "rewards/accuracies": 0.675000011920929,
370
+ "rewards/chosen": -2.2146475315093994,
371
+ "rewards/margins": 1.0699217319488525,
372
+ "rewards/rejected": -3.2845687866210938,
373
+ "step": 250
374
+ },
375
+ {
376
+ "epoch": 0.21,
377
+ "learning_rate": 4.824441214720629e-06,
378
+ "logits/chosen": -1.9078289270401,
379
+ "logits/rejected": -1.676805853843689,
380
+ "logps/chosen": -607.4827270507812,
381
+ "logps/rejected": -694.8690185546875,
382
+ "loss": 0.4684,
383
+ "rewards/accuracies": 0.75,
384
+ "rewards/chosen": -3.3061842918395996,
385
+ "rewards/margins": 0.9638306498527527,
386
+ "rewards/rejected": -4.270014762878418,
387
+ "step": 260
388
+ },
389
+ {
390
+ "epoch": 0.22,
391
+ "learning_rate": 4.7978383481380865e-06,
392
+ "logits/chosen": -1.841152548789978,
393
+ "logits/rejected": -1.639878511428833,
394
+ "logps/chosen": -522.9131469726562,
395
+ "logps/rejected": -613.7696533203125,
396
+ "loss": 0.4322,
397
+ "rewards/accuracies": 0.7749999761581421,
398
+ "rewards/chosen": -2.718308448791504,
399
+ "rewards/margins": 1.2125293016433716,
400
+ "rewards/rejected": -3.930838108062744,
401
+ "step": 270
402
+ },
403
+ {
404
+ "epoch": 0.22,
405
+ "learning_rate": 4.769443696332272e-06,
406
+ "logits/chosen": -1.622367262840271,
407
+ "logits/rejected": -1.6904557943344116,
408
+ "logps/chosen": -468.45947265625,
409
+ "logps/rejected": -664.0346069335938,
410
+ "loss": 0.4407,
411
+ "rewards/accuracies": 0.699999988079071,
412
+ "rewards/chosen": -3.01562762260437,
413
+ "rewards/margins": 1.5072427988052368,
414
+ "rewards/rejected": -4.5228705406188965,
415
+ "step": 280
416
+ },
417
+ {
418
+ "epoch": 0.23,
419
+ "learning_rate": 4.7392794005985324e-06,
420
+ "logits/chosen": -2.0655009746551514,
421
+ "logits/rejected": -1.782928466796875,
422
+ "logps/chosen": -439.0694274902344,
423
+ "logps/rejected": -528.5975341796875,
424
+ "loss": 0.4046,
425
+ "rewards/accuracies": 0.7749999761581421,
426
+ "rewards/chosen": -1.6179521083831787,
427
+ "rewards/margins": 1.3643434047698975,
428
+ "rewards/rejected": -2.982295513153076,
429
+ "step": 290
430
+ },
431
+ {
432
+ "epoch": 0.24,
433
+ "learning_rate": 4.707368982147318e-06,
434
+ "logits/chosen": -1.8944809436798096,
435
+ "logits/rejected": -1.8623746633529663,
436
+ "logps/chosen": -409.47662353515625,
437
+ "logps/rejected": -446.8701171875,
438
+ "loss": 0.4616,
439
+ "rewards/accuracies": 0.625,
440
+ "rewards/chosen": -1.1979713439941406,
441
+ "rewards/margins": 1.0424644947052002,
442
+ "rewards/rejected": -2.2404356002807617,
443
+ "step": 300
444
+ },
445
+ {
446
+ "epoch": 0.25,
447
+ "learning_rate": 4.673737323763048e-06,
448
+ "logits/chosen": -1.6961355209350586,
449
+ "logits/rejected": -1.738201379776001,
450
+ "logps/chosen": -395.11370849609375,
451
+ "logps/rejected": -542.3433837890625,
452
+ "loss": 0.5152,
453
+ "rewards/accuracies": 0.6000000238418579,
454
+ "rewards/chosen": -2.0377097129821777,
455
+ "rewards/margins": 0.9327165484428406,
456
+ "rewards/rejected": -2.970426082611084,
457
+ "step": 310
458
+ },
459
+ {
460
+ "epoch": 0.26,
461
+ "learning_rate": 4.638410650401267e-06,
462
+ "logits/chosen": -1.6403663158416748,
463
+ "logits/rejected": -1.3353043794631958,
464
+ "logps/chosen": -535.1236572265625,
465
+ "logps/rejected": -570.8885498046875,
466
+ "loss": 0.4839,
467
+ "rewards/accuracies": 0.675000011920929,
468
+ "rewards/chosen": -2.7877275943756104,
469
+ "rewards/margins": 1.0707926750183105,
470
+ "rewards/rejected": -3.8585205078125,
471
+ "step": 320
472
+ },
473
+ {
474
+ "epoch": 0.26,
475
+ "learning_rate": 4.601416508739211e-06,
476
+ "logits/chosen": -1.6940498352050781,
477
+ "logits/rejected": -1.7366819381713867,
478
+ "logps/chosen": -538.8651733398438,
479
+ "logps/rejected": -588.4771118164062,
480
+ "loss": 0.46,
481
+ "rewards/accuracies": 0.6000000238418579,
482
+ "rewards/chosen": -2.6663858890533447,
483
+ "rewards/margins": 0.7428363561630249,
484
+ "rewards/rejected": -3.40922212600708,
485
+ "step": 330
486
+ },
487
+ {
488
+ "epoch": 0.27,
489
+ "learning_rate": 4.562783745695738e-06,
490
+ "logits/chosen": -1.7110121250152588,
491
+ "logits/rejected": -1.4488928318023682,
492
+ "logps/chosen": -372.3664855957031,
493
+ "logps/rejected": -503.14288330078125,
494
+ "loss": 0.4308,
495
+ "rewards/accuracies": 0.6499999761581421,
496
+ "rewards/chosen": -1.781057596206665,
497
+ "rewards/margins": 1.5787122249603271,
498
+ "rewards/rejected": -3.359769821166992,
499
+ "step": 340
500
+ },
501
+ {
502
+ "epoch": 0.28,
503
+ "learning_rate": 4.522542485937369e-06,
504
+ "logits/chosen": -1.8226385116577148,
505
+ "logits/rejected": -1.5300580263137817,
506
+ "logps/chosen": -463.80889892578125,
507
+ "logps/rejected": -537.3763427734375,
508
+ "loss": 0.4554,
509
+ "rewards/accuracies": 0.6499999761581421,
510
+ "rewards/chosen": -2.0642762184143066,
511
+ "rewards/margins": 0.9215444326400757,
512
+ "rewards/rejected": -2.985820770263672,
513
+ "step": 350
514
+ },
515
+ {
516
+ "epoch": 0.29,
517
+ "learning_rate": 4.4807241083879774e-06,
518
+ "logits/chosen": -1.781518578529358,
519
+ "logits/rejected": -1.5981372594833374,
520
+ "logps/chosen": -506.3741760253906,
521
+ "logps/rejected": -607.2337646484375,
522
+ "loss": 0.4323,
523
+ "rewards/accuracies": 0.6499999761581421,
524
+ "rewards/chosen": -2.474335193634033,
525
+ "rewards/margins": 1.3016973733901978,
526
+ "rewards/rejected": -3.7760322093963623,
527
+ "step": 360
528
+ },
529
+ {
530
+ "epoch": 0.3,
531
+ "learning_rate": 4.437361221760449e-06,
532
+ "logits/chosen": -1.6915562152862549,
533
+ "logits/rejected": -1.6212940216064453,
534
+ "logps/chosen": -330.9052429199219,
535
+ "logps/rejected": -441.5316467285156,
536
+ "loss": 0.4468,
537
+ "rewards/accuracies": 0.7250000238418579,
538
+ "rewards/chosen": -1.4203054904937744,
539
+ "rewards/margins": 1.071720838546753,
540
+ "rewards/rejected": -2.4920265674591064,
541
+ "step": 370
542
+ },
543
+ {
544
+ "epoch": 0.3,
545
+ "learning_rate": 4.3924876391293915e-06,
546
+ "logits/chosen": -1.5807178020477295,
547
+ "logits/rejected": -1.541355013847351,
548
+ "logps/chosen": -340.5176086425781,
549
+ "logps/rejected": -440.81634521484375,
550
+ "loss": 0.4907,
551
+ "rewards/accuracies": 0.6000000238418579,
552
+ "rewards/chosen": -1.505903959274292,
553
+ "rewards/margins": 0.8253719210624695,
554
+ "rewards/rejected": -2.3312761783599854,
555
+ "step": 380
556
+ },
557
+ {
558
+ "epoch": 0.31,
559
+ "learning_rate": 4.346138351564711e-06,
560
+ "logits/chosen": -1.572176218032837,
561
+ "logits/rejected": -1.440059781074524,
562
+ "logps/chosen": -388.8272399902344,
563
+ "logps/rejected": -453.6480407714844,
564
+ "loss": 0.5046,
565
+ "rewards/accuracies": 0.675000011920929,
566
+ "rewards/chosen": -1.9561408758163452,
567
+ "rewards/margins": 1.0182749032974243,
568
+ "rewards/rejected": -2.9744157791137695,
569
+ "step": 390
570
+ },
571
+ {
572
+ "epoch": 0.32,
573
+ "learning_rate": 4.2983495008466285e-06,
574
+ "logits/chosen": -1.6752668619155884,
575
+ "logits/rejected": -1.6999976634979248,
576
+ "logps/chosen": -423.57000732421875,
577
+ "logps/rejected": -530.806884765625,
578
+ "loss": 0.5645,
579
+ "rewards/accuracies": 0.7250000238418579,
580
+ "rewards/chosen": -2.094298839569092,
581
+ "rewards/margins": 0.8297155499458313,
582
+ "rewards/rejected": -2.9240143299102783,
583
+ "step": 400
584
+ },
585
+ {
586
+ "epoch": 0.33,
587
+ "learning_rate": 4.249158351283414e-06,
588
+ "logits/chosen": -1.8126693964004517,
589
+ "logits/rejected": -1.6003907918930054,
590
+ "logps/chosen": -494.4754943847656,
591
+ "logps/rejected": -602.4152221679688,
592
+ "loss": 0.5194,
593
+ "rewards/accuracies": 0.675000011920929,
594
+ "rewards/chosen": -2.341548442840576,
595
+ "rewards/margins": 0.9126373529434204,
596
+ "rewards/rejected": -3.254185914993286,
597
+ "step": 410
598
+ },
599
+ {
600
+ "epoch": 0.34,
601
+ "learning_rate": 4.198603260653792e-06,
602
+ "logits/chosen": -1.9527698755264282,
603
+ "logits/rejected": -1.7327282428741455,
604
+ "logps/chosen": -508.19873046875,
605
+ "logps/rejected": -566.5889892578125,
606
+ "loss": 0.3991,
607
+ "rewards/accuracies": 0.824999988079071,
608
+ "rewards/chosen": -2.3916351795196533,
609
+ "rewards/margins": 0.7339528799057007,
610
+ "rewards/rejected": -3.1255879402160645,
611
+ "step": 420
612
+ },
613
+ {
614
+ "epoch": 0.34,
615
+ "learning_rate": 4.146723650296701e-06,
616
+ "logits/chosen": -1.7782537937164307,
617
+ "logits/rejected": -1.8380506038665771,
618
+ "logps/chosen": -556.6307373046875,
619
+ "logps/rejected": -737.5545654296875,
620
+ "loss": 0.4901,
621
+ "rewards/accuracies": 0.7749999761581421,
622
+ "rewards/chosen": -2.557034492492676,
623
+ "rewards/margins": 1.55734121799469,
624
+ "rewards/rejected": -4.114375591278076,
625
+ "step": 430
626
+ },
627
+ {
628
+ "epoch": 0.35,
629
+ "learning_rate": 4.093559974371725e-06,
630
+ "logits/chosen": -1.7782948017120361,
631
+ "logits/rejected": -1.4533838033676147,
632
+ "logps/chosen": -427.7296447753906,
633
+ "logps/rejected": -507.8409729003906,
634
+ "loss": 0.3925,
635
+ "rewards/accuracies": 0.699999988079071,
636
+ "rewards/chosen": -2.4289357662200928,
637
+ "rewards/margins": 1.1018160581588745,
638
+ "rewards/rejected": -3.530752182006836,
639
+ "step": 440
640
+ },
641
+ {
642
+ "epoch": 0.36,
643
+ "learning_rate": 4.039153688314146e-06,
644
+ "logits/chosen": -1.875288724899292,
645
+ "logits/rejected": -1.6939185857772827,
646
+ "logps/chosen": -444.589599609375,
647
+ "logps/rejected": -552.37353515625,
648
+ "loss": 0.5043,
649
+ "rewards/accuracies": 0.675000011920929,
650
+ "rewards/chosen": -1.8052520751953125,
651
+ "rewards/margins": 1.3201160430908203,
652
+ "rewards/rejected": -3.125368118286133,
653
+ "step": 450
654
+ },
655
+ {
656
+ "epoch": 0.37,
657
+ "learning_rate": 3.983547216509254e-06,
658
+ "logits/chosen": -1.8725506067276,
659
+ "logits/rejected": -1.7871854305267334,
660
+ "logps/chosen": -455.09185791015625,
661
+ "logps/rejected": -519.0117797851562,
662
+ "loss": 0.4801,
663
+ "rewards/accuracies": 0.6499999761581421,
664
+ "rewards/chosen": -2.264061450958252,
665
+ "rewards/margins": 0.8552305102348328,
666
+ "rewards/rejected": -3.1192917823791504,
667
+ "step": 460
668
+ },
669
+ {
670
+ "epoch": 0.38,
671
+ "learning_rate": 3.92678391921108e-06,
672
+ "logits/chosen": -1.8820081949234009,
673
+ "logits/rejected": -1.7966502904891968,
674
+ "logps/chosen": -340.04119873046875,
675
+ "logps/rejected": -392.32769775390625,
676
+ "loss": 0.5122,
677
+ "rewards/accuracies": 0.550000011920929,
678
+ "rewards/chosen": -1.7454599142074585,
679
+ "rewards/margins": 0.6181780099868774,
680
+ "rewards/rejected": -2.363638401031494,
681
+ "step": 470
682
+ },
683
+ {
684
+ "epoch": 0.38,
685
+ "learning_rate": 3.868908058731376e-06,
686
+ "logits/chosen": -1.760180115699768,
687
+ "logits/rejected": -1.6544349193572998,
688
+ "logps/chosen": -386.4324645996094,
689
+ "logps/rejected": -537.9710083007812,
690
+ "loss": 0.4543,
691
+ "rewards/accuracies": 0.7749999761581421,
692
+ "rewards/chosen": -2.187924861907959,
693
+ "rewards/margins": 1.2385300397872925,
694
+ "rewards/rejected": -3.426455020904541,
695
+ "step": 480
696
+ },
697
+ {
698
+ "epoch": 0.39,
699
+ "learning_rate": 3.8099647649251984e-06,
700
+ "logits/chosen": -1.8242204189300537,
701
+ "logits/rejected": -1.5543259382247925,
702
+ "logps/chosen": -496.9654235839844,
703
+ "logps/rejected": -564.4078979492188,
704
+ "loss": 0.4372,
705
+ "rewards/accuracies": 0.75,
706
+ "rewards/chosen": -2.488828182220459,
707
+ "rewards/margins": 1.1287075281143188,
708
+ "rewards/rejected": -3.617535352706909,
709
+ "step": 490
710
+ },
711
+ {
712
+ "epoch": 0.4,
713
+ "learning_rate": 3.7500000000000005e-06,
714
+ "logits/chosen": -1.846986174583435,
715
+ "logits/rejected": -1.7779200077056885,
716
+ "logps/chosen": -481.5650329589844,
717
+ "logps/rejected": -603.2772216796875,
718
+ "loss": 0.4871,
719
+ "rewards/accuracies": 0.75,
720
+ "rewards/chosen": -2.848694086074829,
721
+ "rewards/margins": 1.1198689937591553,
722
+ "rewards/rejected": -3.9685630798339844,
723
+ "step": 500
724
+ },
725
+ {
726
+ "epoch": 0.41,
727
+ "learning_rate": 3.689060522675689e-06,
728
+ "logits/chosen": -1.8344266414642334,
729
+ "logits/rejected": -1.6635892391204834,
730
+ "logps/chosen": -566.5777587890625,
731
+ "logps/rejected": -637.1046142578125,
732
+ "loss": 0.4406,
733
+ "rewards/accuracies": 0.824999988079071,
734
+ "rewards/chosen": -3.230578899383545,
735
+ "rewards/margins": 0.9766052961349487,
736
+ "rewards/rejected": -4.207183837890625,
737
+ "step": 510
738
+ },
739
+ {
740
+ "epoch": 0.42,
741
+ "learning_rate": 3.627193851723577e-06,
742
+ "logits/chosen": -1.5923402309417725,
743
+ "logits/rejected": -1.473101258277893,
744
+ "logps/chosen": -652.4085083007812,
745
+ "logps/rejected": -711.7216796875,
746
+ "loss": 0.5059,
747
+ "rewards/accuracies": 0.6499999761581421,
748
+ "rewards/chosen": -4.209104061126709,
749
+ "rewards/margins": 0.7725512981414795,
750
+ "rewards/rejected": -4.981655120849609,
751
+ "step": 520
752
+ },
753
+ {
754
+ "epoch": 0.42,
755
+ "learning_rate": 3.564448228912682e-06,
756
+ "logits/chosen": -1.795478105545044,
757
+ "logits/rejected": -1.7586778402328491,
758
+ "logps/chosen": -623.7203369140625,
759
+ "logps/rejected": -718.7762451171875,
760
+ "loss": 0.4676,
761
+ "rewards/accuracies": 0.6000000238418579,
762
+ "rewards/chosen": -3.7662811279296875,
763
+ "rewards/margins": 0.7186762094497681,
764
+ "rewards/rejected": -4.484957218170166,
765
+ "step": 530
766
+ },
767
+ {
768
+ "epoch": 0.43,
769
+ "learning_rate": 3.5008725813922383e-06,
770
+ "logits/chosen": -1.7972501516342163,
771
+ "logits/rejected": -1.5103098154067993,
772
+ "logps/chosen": -662.8839721679688,
773
+ "logps/rejected": -714.4471435546875,
774
+ "loss": 0.4381,
775
+ "rewards/accuracies": 0.699999988079071,
776
+ "rewards/chosen": -3.9809927940368652,
777
+ "rewards/margins": 1.153225064277649,
778
+ "rewards/rejected": -5.134218215942383,
779
+ "step": 540
780
+ },
781
+ {
782
+ "epoch": 0.44,
783
+ "learning_rate": 3.436516483539781e-06,
784
+ "logits/chosen": -1.6798747777938843,
785
+ "logits/rejected": -1.525529384613037,
786
+ "logps/chosen": -635.1442260742188,
787
+ "logps/rejected": -741.7098999023438,
788
+ "loss": 0.4519,
789
+ "rewards/accuracies": 0.625,
790
+ "rewards/chosen": -4.326822757720947,
791
+ "rewards/margins": 1.160167932510376,
792
+ "rewards/rejected": -5.486990928649902,
793
+ "step": 550
794
+ },
795
+ {
796
+ "epoch": 0.45,
797
+ "learning_rate": 3.3714301183045382e-06,
798
+ "logits/chosen": -1.8586448431015015,
799
+ "logits/rejected": -1.954077959060669,
800
+ "logps/chosen": -697.0865478515625,
801
+ "logps/rejected": -801.1902465820312,
802
+ "loss": 0.4172,
803
+ "rewards/accuracies": 0.7250000238418579,
804
+ "rewards/chosen": -3.986204147338867,
805
+ "rewards/margins": 1.209644079208374,
806
+ "rewards/rejected": -5.195847511291504,
807
+ "step": 560
808
+ },
809
+ {
810
+ "epoch": 0.46,
811
+ "learning_rate": 3.3056642380762783e-06,
812
+ "logits/chosen": -1.7111326456069946,
813
+ "logits/rejected": -1.7340152263641357,
814
+ "logps/chosen": -598.3841552734375,
815
+ "logps/rejected": -705.2341918945312,
816
+ "loss": 0.4755,
817
+ "rewards/accuracies": 0.625,
818
+ "rewards/chosen": -3.439131259918213,
819
+ "rewards/margins": 1.249565601348877,
820
+ "rewards/rejected": -4.68869686126709,
821
+ "step": 570
822
+ },
823
+ {
824
+ "epoch": 0.46,
825
+ "learning_rate": 3.2392701251101172e-06,
826
+ "logits/chosen": -1.790801763534546,
827
+ "logits/rejected": -1.677916169166565,
828
+ "logps/chosen": -565.8692016601562,
829
+ "logps/rejected": -629.5975341796875,
830
+ "loss": 0.5383,
831
+ "rewards/accuracies": 0.574999988079071,
832
+ "rewards/chosen": -3.4626059532165527,
833
+ "rewards/margins": 1.0360872745513916,
834
+ "rewards/rejected": -4.498693466186523,
835
+ "step": 580
836
+ },
837
+ {
838
+ "epoch": 0.47,
839
+ "learning_rate": 3.1722995515381644e-06,
840
+ "logits/chosen": -1.763465166091919,
841
+ "logits/rejected": -1.6387125253677368,
842
+ "logps/chosen": -498.0137634277344,
843
+ "logps/rejected": -560.8174438476562,
844
+ "loss": 0.4456,
845
+ "rewards/accuracies": 0.625,
846
+ "rewards/chosen": -2.469872236251831,
847
+ "rewards/margins": 0.9421303868293762,
848
+ "rewards/rejected": -3.4120030403137207,
849
+ "step": 590
850
+ },
851
+ {
852
+ "epoch": 0.48,
853
+ "learning_rate": 3.1048047389991693e-06,
854
+ "logits/chosen": -1.873694658279419,
855
+ "logits/rejected": -1.7188327312469482,
856
+ "logps/chosen": -492.42742919921875,
857
+ "logps/rejected": -610.1829833984375,
858
+ "loss": 0.4947,
859
+ "rewards/accuracies": 0.675000011920929,
860
+ "rewards/chosen": -2.7357325553894043,
861
+ "rewards/margins": 1.0360163450241089,
862
+ "rewards/rejected": -3.7717490196228027,
863
+ "step": 600
864
+ },
865
+ {
866
+ "epoch": 0.49,
867
+ "learning_rate": 3.0368383179176584e-06,
868
+ "logits/chosen": -1.798766851425171,
869
+ "logits/rejected": -1.7281709909439087,
870
+ "logps/chosen": -465.31207275390625,
871
+ "logps/rejected": -569.0285034179688,
872
+ "loss": 0.4429,
873
+ "rewards/accuracies": 0.75,
874
+ "rewards/chosen": -2.2491648197174072,
875
+ "rewards/margins": 1.305397391319275,
876
+ "rewards/rejected": -3.55456280708313,
877
+ "step": 610
878
+ },
879
+ {
880
+ "epoch": 0.5,
881
+ "learning_rate": 2.9684532864643123e-06,
882
+ "logits/chosen": -1.898830771446228,
883
+ "logits/rejected": -1.7822704315185547,
884
+ "logps/chosen": -488.25653076171875,
885
+ "logps/rejected": -588.7685546875,
886
+ "loss": 0.4264,
887
+ "rewards/accuracies": 0.699999988079071,
888
+ "rewards/chosen": -2.347430944442749,
889
+ "rewards/margins": 1.2308118343353271,
890
+ "rewards/rejected": -3.578242540359497,
891
+ "step": 620
892
+ },
893
+ {
894
+ "epoch": 0.5,
895
+ "learning_rate": 2.8997029692295875e-06,
896
+ "logits/chosen": -2.0265390872955322,
897
+ "logits/rejected": -1.8765138387680054,
898
+ "logps/chosen": -493.7826232910156,
899
+ "logps/rejected": -595.54541015625,
900
+ "loss": 0.4438,
901
+ "rewards/accuracies": 0.6499999761581421,
902
+ "rewards/chosen": -2.2215516567230225,
903
+ "rewards/margins": 1.2061747312545776,
904
+ "rewards/rejected": -3.4277260303497314,
905
+ "step": 630
906
+ },
907
+ {
908
+ "epoch": 0.51,
909
+ "learning_rate": 2.8306409756428067e-06,
910
+ "logits/chosen": -1.8571748733520508,
911
+ "logits/rejected": -1.4381787776947021,
912
+ "logps/chosen": -562.1082153320312,
913
+ "logps/rejected": -625.988525390625,
914
+ "loss": 0.4561,
915
+ "rewards/accuracies": 0.824999988079071,
916
+ "rewards/chosen": -2.61163330078125,
917
+ "rewards/margins": 1.2634782791137695,
918
+ "rewards/rejected": -3.8751113414764404,
919
+ "step": 640
920
+ },
921
+ {
922
+ "epoch": 0.52,
923
+ "learning_rate": 2.761321158169134e-06,
924
+ "logits/chosen": -1.9520680904388428,
925
+ "logits/rejected": -1.6933047771453857,
926
+ "logps/chosen": -682.3081665039062,
927
+ "logps/rejected": -772.539794921875,
928
+ "loss": 0.4134,
929
+ "rewards/accuracies": 0.75,
930
+ "rewards/chosen": -4.052522659301758,
931
+ "rewards/margins": 1.4804089069366455,
932
+ "rewards/rejected": -5.532931327819824,
933
+ "step": 650
934
+ },
935
+ {
936
+ "epoch": 0.53,
937
+ "learning_rate": 2.6917975703170466e-06,
938
+ "logits/chosen": -1.5707025527954102,
939
+ "logits/rejected": -1.276710867881775,
940
+ "logps/chosen": -679.2574462890625,
941
+ "logps/rejected": -891.1485595703125,
942
+ "loss": 0.4658,
943
+ "rewards/accuracies": 0.7749999761581421,
944
+ "rewards/chosen": -4.277982711791992,
945
+ "rewards/margins": 1.9390618801116943,
946
+ "rewards/rejected": -6.217044830322266,
947
+ "step": 660
948
+ },
949
+ {
950
+ "epoch": 0.54,
951
+ "learning_rate": 2.6221244244890336e-06,
952
+ "logits/chosen": -1.6595405340194702,
953
+ "logits/rejected": -1.4634872674942017,
954
+ "logps/chosen": -665.2347412109375,
955
+ "logps/rejected": -793.9863891601562,
956
+ "loss": 0.381,
957
+ "rewards/accuracies": 0.875,
958
+ "rewards/chosen": -4.211177349090576,
959
+ "rewards/margins": 1.538806676864624,
960
+ "rewards/rejected": -5.749984264373779,
961
+ "step": 670
962
+ },
963
+ {
964
+ "epoch": 0.54,
965
+ "learning_rate": 2.5523560497083927e-06,
966
+ "logits/chosen": -1.8683850765228271,
967
+ "logits/rejected": -1.7012536525726318,
968
+ "logps/chosen": -486.9891052246094,
969
+ "logps/rejected": -554.7735595703125,
970
+ "loss": 0.5042,
971
+ "rewards/accuracies": 0.6000000238418579,
972
+ "rewards/chosen": -2.8961403369903564,
973
+ "rewards/margins": 0.7647749185562134,
974
+ "rewards/rejected": -3.660914897918701,
975
+ "step": 680
976
+ },
977
+ {
978
+ "epoch": 0.55,
979
+ "learning_rate": 2.482546849255096e-06,
980
+ "logits/chosen": -1.7561218738555908,
981
+ "logits/rejected": -1.4548377990722656,
982
+ "logps/chosen": -519.9765625,
983
+ "logps/rejected": -611.5833740234375,
984
+ "loss": 0.4507,
985
+ "rewards/accuracies": 0.625,
986
+ "rewards/chosen": -3.0487327575683594,
987
+ "rewards/margins": 1.145900845527649,
988
+ "rewards/rejected": -4.194633960723877,
989
+ "step": 690
990
+ },
991
+ {
992
+ "epoch": 0.56,
993
+ "learning_rate": 2.4127512582437486e-06,
994
+ "logits/chosen": -1.7805372476577759,
995
+ "logits/rejected": -1.7106291055679321,
996
+ "logps/chosen": -587.1712646484375,
997
+ "logps/rejected": -704.341064453125,
998
+ "loss": 0.4176,
999
+ "rewards/accuracies": 0.7250000238418579,
1000
+ "rewards/chosen": -3.0211739540100098,
1001
+ "rewards/margins": 1.2845187187194824,
1002
+ "rewards/rejected": -4.30569314956665,
1003
+ "step": 700
1004
+ },
1005
+ {
1006
+ "epoch": 0.57,
1007
+ "learning_rate": 2.3430237011767166e-06,
1008
+ "logits/chosen": -1.6074645519256592,
1009
+ "logits/rejected": -1.453162431716919,
1010
+ "logps/chosen": -528.9210205078125,
1011
+ "logps/rejected": -630.1168823242188,
1012
+ "loss": 0.4237,
1013
+ "rewards/accuracies": 0.800000011920929,
1014
+ "rewards/chosen": -3.0611021518707275,
1015
+ "rewards/margins": 1.255999207496643,
1016
+ "rewards/rejected": -4.31710147857666,
1017
+ "step": 710
1018
+ },
1019
+ {
1020
+ "epoch": 0.58,
1021
+ "learning_rate": 2.2734185495055503e-06,
1022
+ "logits/chosen": -1.4858500957489014,
1023
+ "logits/rejected": -1.2407737970352173,
1024
+ "logps/chosen": -591.7060546875,
1025
+ "logps/rejected": -691.8463745117188,
1026
+ "loss": 0.3379,
1027
+ "rewards/accuracies": 0.8500000238418579,
1028
+ "rewards/chosen": -3.651978015899658,
1029
+ "rewards/margins": 1.4824391603469849,
1030
+ "rewards/rejected": -5.134417533874512,
1031
+ "step": 720
1032
+ },
1033
+ {
1034
+ "epoch": 0.58,
1035
+ "learning_rate": 2.2039900792337477e-06,
1036
+ "logits/chosen": -1.4694669246673584,
1037
+ "logits/rejected": -1.2653883695602417,
1038
+ "logps/chosen": -646.7306518554688,
1039
+ "logps/rejected": -756.9968872070312,
1040
+ "loss": 0.4527,
1041
+ "rewards/accuracies": 0.675000011920929,
1042
+ "rewards/chosen": -3.894749164581299,
1043
+ "rewards/margins": 1.4129467010498047,
1044
+ "rewards/rejected": -5.3076958656311035,
1045
+ "step": 730
1046
+ },
1047
+ {
1048
+ "epoch": 0.59,
1049
+ "learning_rate": 2.134792428593971e-06,
1050
+ "logits/chosen": -1.6758044958114624,
1051
+ "logits/rejected": -1.3347949981689453,
1052
+ "logps/chosen": -642.6209716796875,
1053
+ "logps/rejected": -756.3058471679688,
1054
+ "loss": 0.4029,
1055
+ "rewards/accuracies": 0.675000011920929,
1056
+ "rewards/chosen": -3.9325084686279297,
1057
+ "rewards/margins": 1.526052474975586,
1058
+ "rewards/rejected": -5.458560943603516,
1059
+ "step": 740
1060
+ },
1061
+ {
1062
+ "epoch": 0.6,
1063
+ "learning_rate": 2.0658795558326745e-06,
1064
+ "logits/chosen": -1.530992031097412,
1065
+ "logits/rejected": -1.616612195968628,
1066
+ "logps/chosen": -558.5872802734375,
1067
+ "logps/rejected": -712.8720703125,
1068
+ "loss": 0.5007,
1069
+ "rewards/accuracies": 0.75,
1070
+ "rewards/chosen": -3.5670769214630127,
1071
+ "rewards/margins": 1.441688895225525,
1072
+ "rewards/rejected": -5.008765697479248,
1073
+ "step": 750
1074
+ },
1075
+ {
1076
+ "epoch": 0.61,
1077
+ "learning_rate": 1.997305197135089e-06,
1078
+ "logits/chosen": -1.5513006448745728,
1079
+ "logits/rejected": -1.3652799129486084,
1080
+ "logps/chosen": -569.5272216796875,
1081
+ "logps/rejected": -667.7091064453125,
1082
+ "loss": 0.4669,
1083
+ "rewards/accuracies": 0.675000011920929,
1084
+ "rewards/chosen": -3.5715885162353516,
1085
+ "rewards/margins": 1.0474005937576294,
1086
+ "rewards/rejected": -4.618988990783691,
1087
+ "step": 760
1088
+ },
1089
+ {
1090
+ "epoch": 0.62,
1091
+ "learning_rate": 1.9291228247233607e-06,
1092
+ "logits/chosen": -1.9496749639511108,
1093
+ "logits/rejected": -1.60714852809906,
1094
+ "logps/chosen": -613.51806640625,
1095
+ "logps/rejected": -627.2810668945312,
1096
+ "loss": 0.4715,
1097
+ "rewards/accuracies": 0.625,
1098
+ "rewards/chosen": -3.2026474475860596,
1099
+ "rewards/margins": 0.9100178480148315,
1100
+ "rewards/rejected": -4.112665176391602,
1101
+ "step": 770
1102
+ },
1103
+ {
1104
+ "epoch": 0.62,
1105
+ "learning_rate": 1.8613856051605242e-06,
1106
+ "logits/chosen": -1.639764428138733,
1107
+ "logits/rejected": -1.4028499126434326,
1108
+ "logps/chosen": -560.9642333984375,
1109
+ "logps/rejected": -718.7645263671875,
1110
+ "loss": 0.4383,
1111
+ "rewards/accuracies": 0.7749999761581421,
1112
+ "rewards/chosen": -3.006631374359131,
1113
+ "rewards/margins": 1.6886869668960571,
1114
+ "rewards/rejected": -4.69531774520874,
1115
+ "step": 780
1116
+ },
1117
+ {
1118
+ "epoch": 0.63,
1119
+ "learning_rate": 1.7941463578928088e-06,
1120
+ "logits/chosen": -1.7558552026748657,
1121
+ "logits/rejected": -1.731774091720581,
1122
+ "logps/chosen": -547.339599609375,
1123
+ "logps/rejected": -605.9044189453125,
1124
+ "loss": 0.5097,
1125
+ "rewards/accuracies": 0.550000011920929,
1126
+ "rewards/chosen": -3.284121036529541,
1127
+ "rewards/margins": 0.618303120136261,
1128
+ "rewards/rejected": -3.902423858642578,
1129
+ "step": 790
1130
+ },
1131
+ {
1132
+ "epoch": 0.64,
1133
+ "learning_rate": 1.7274575140626318e-06,
1134
+ "logits/chosen": -1.662502646446228,
1135
+ "logits/rejected": -1.8295265436172485,
1136
+ "logps/chosen": -503.92352294921875,
1137
+ "logps/rejected": -656.8177490234375,
1138
+ "loss": 0.3823,
1139
+ "rewards/accuracies": 0.75,
1140
+ "rewards/chosen": -2.5763401985168457,
1141
+ "rewards/margins": 1.3676784038543701,
1142
+ "rewards/rejected": -3.944018840789795,
1143
+ "step": 800
1144
+ },
1145
+ {
1146
+ "epoch": 0.65,
1147
+ "learning_rate": 1.661371075624363e-06,
1148
+ "logits/chosen": -1.6462104320526123,
1149
+ "logits/rejected": -1.3300979137420654,
1150
+ "logps/chosen": -448.4722595214844,
1151
+ "logps/rejected": -538.0807495117188,
1152
+ "loss": 0.436,
1153
+ "rewards/accuracies": 0.699999988079071,
1154
+ "rewards/chosen": -2.5369763374328613,
1155
+ "rewards/margins": 1.249807357788086,
1156
+ "rewards/rejected": -3.7867836952209473,
1157
+ "step": 810
1158
+ },
1159
+ {
1160
+ "epoch": 0.66,
1161
+ "learning_rate": 1.5959385747947697e-06,
1162
+ "logits/chosen": -1.6544301509857178,
1163
+ "logits/rejected": -1.6204363107681274,
1164
+ "logps/chosen": -540.1203002929688,
1165
+ "logps/rejected": -714.814697265625,
1166
+ "loss": 0.3821,
1167
+ "rewards/accuracies": 0.75,
1168
+ "rewards/chosen": -2.9797520637512207,
1169
+ "rewards/margins": 1.4802463054656982,
1170
+ "rewards/rejected": -4.45999813079834,
1171
+ "step": 820
1172
+ },
1173
+ {
1174
+ "epoch": 0.66,
1175
+ "learning_rate": 1.5312110338697427e-06,
1176
+ "logits/chosen": -1.8631584644317627,
1177
+ "logits/rejected": -1.7527086734771729,
1178
+ "logps/chosen": -472.2159118652344,
1179
+ "logps/rejected": -541.8966064453125,
1180
+ "loss": 0.5163,
1181
+ "rewards/accuracies": 0.5249999761581421,
1182
+ "rewards/chosen": -2.802929639816284,
1183
+ "rewards/margins": 0.781417727470398,
1184
+ "rewards/rejected": -3.58434796333313,
1185
+ "step": 830
1186
+ },
1187
+ {
1188
+ "epoch": 0.67,
1189
+ "learning_rate": 1.467238925438646e-06,
1190
+ "logits/chosen": -1.8967678546905518,
1191
+ "logits/rejected": -1.7050600051879883,
1192
+ "logps/chosen": -537.9852905273438,
1193
+ "logps/rejected": -641.7759399414062,
1194
+ "loss": 0.4672,
1195
+ "rewards/accuracies": 0.7250000238418579,
1196
+ "rewards/chosen": -2.674623489379883,
1197
+ "rewards/margins": 1.1100194454193115,
1198
+ "rewards/rejected": -3.7846426963806152,
1199
+ "step": 840
1200
+ },
1201
+ {
1202
+ "epoch": 0.68,
1203
+ "learning_rate": 1.4040721330273063e-06,
1204
+ "logits/chosen": -1.5284979343414307,
1205
+ "logits/rejected": -1.2263238430023193,
1206
+ "logps/chosen": -609.40283203125,
1207
+ "logps/rejected": -748.0750732421875,
1208
+ "loss": 0.3961,
1209
+ "rewards/accuracies": 0.824999988079071,
1210
+ "rewards/chosen": -3.1462016105651855,
1211
+ "rewards/margins": 1.6625381708145142,
1212
+ "rewards/rejected": -4.808740139007568,
1213
+ "step": 850
1214
+ },
1215
+ {
1216
+ "epoch": 0.69,
1217
+ "learning_rate": 1.3417599122003464e-06,
1218
+ "logits/chosen": -1.913924217224121,
1219
+ "logits/rejected": -1.4708651304244995,
1220
+ "logps/chosen": -531.4802856445312,
1221
+ "logps/rejected": -647.6087646484375,
1222
+ "loss": 0.3934,
1223
+ "rewards/accuracies": 0.675000011920929,
1224
+ "rewards/chosen": -2.661881923675537,
1225
+ "rewards/margins": 1.7113628387451172,
1226
+ "rewards/rejected": -4.373244285583496,
1227
+ "step": 860
1228
+ },
1229
+ {
1230
+ "epoch": 0.7,
1231
+ "learning_rate": 1.280350852153168e-06,
1232
+ "logits/chosen": -1.6116275787353516,
1233
+ "logits/rejected": -1.7562923431396484,
1234
+ "logps/chosen": -450.93243408203125,
1235
+ "logps/rejected": -635.6582641601562,
1236
+ "loss": 0.4188,
1237
+ "rewards/accuracies": 0.7749999761581421,
1238
+ "rewards/chosen": -2.6061110496520996,
1239
+ "rewards/margins": 1.2891266345977783,
1240
+ "rewards/rejected": -3.895237684249878,
1241
+ "step": 870
1242
+ },
1243
+ {
1244
+ "epoch": 0.7,
1245
+ "learning_rate": 1.2198928378235717e-06,
1246
+ "logits/chosen": -1.674541711807251,
1247
+ "logits/rejected": -1.5418341159820557,
1248
+ "logps/chosen": -596.4996337890625,
1249
+ "logps/rejected": -777.8062133789062,
1250
+ "loss": 0.4317,
1251
+ "rewards/accuracies": 0.7250000238418579,
1252
+ "rewards/chosen": -3.463611602783203,
1253
+ "rewards/margins": 1.5395991802215576,
1254
+ "rewards/rejected": -5.00321102142334,
1255
+ "step": 880
1256
+ },
1257
+ {
1258
+ "epoch": 0.71,
1259
+ "learning_rate": 1.160433012552508e-06,
1260
+ "logits/chosen": -1.8214279413223267,
1261
+ "logits/rejected": -1.766122579574585,
1262
+ "logps/chosen": -468.58197021484375,
1263
+ "logps/rejected": -573.4989624023438,
1264
+ "loss": 0.5022,
1265
+ "rewards/accuracies": 0.675000011920929,
1266
+ "rewards/chosen": -2.285228967666626,
1267
+ "rewards/margins": 1.5109031200408936,
1268
+ "rewards/rejected": -3.7961318492889404,
1269
+ "step": 890
1270
+ },
1271
+ {
1272
+ "epoch": 0.72,
1273
+ "learning_rate": 1.1020177413231334e-06,
1274
+ "logits/chosen": -1.842508316040039,
1275
+ "logits/rejected": -1.6384559869766235,
1276
+ "logps/chosen": -473.90576171875,
1277
+ "logps/rejected": -624.7535400390625,
1278
+ "loss": 0.4147,
1279
+ "rewards/accuracies": 0.699999988079071,
1280
+ "rewards/chosen": -2.3901591300964355,
1281
+ "rewards/margins": 1.7103700637817383,
1282
+ "rewards/rejected": -4.100529193878174,
1283
+ "step": 900
1284
+ },
1285
+ {
1286
+ "epoch": 0.73,
1287
+ "learning_rate": 1.0446925746067768e-06,
1288
+ "logits/chosen": -1.6257625818252563,
1289
+ "logits/rejected": -1.366236925125122,
1290
+ "logps/chosen": -425.0322265625,
1291
+ "logps/rejected": -510.67645263671875,
1292
+ "loss": 0.407,
1293
+ "rewards/accuracies": 0.675000011920929,
1294
+ "rewards/chosen": -2.007061004638672,
1295
+ "rewards/margins": 1.399216890335083,
1296
+ "rewards/rejected": -3.406277894973755,
1297
+ "step": 910
1298
+ },
1299
+ {
1300
+ "epoch": 0.74,
1301
+ "learning_rate": 9.88502212844063e-07,
1302
+ "logits/chosen": -1.7914766073226929,
1303
+ "logits/rejected": -1.665331482887268,
1304
+ "logps/chosen": -537.3778686523438,
1305
+ "logps/rejected": -753.0563354492188,
1306
+ "loss": 0.4276,
1307
+ "rewards/accuracies": 0.875,
1308
+ "rewards/chosen": -2.615752696990967,
1309
+ "rewards/margins": 2.0941243171691895,
1310
+ "rewards/rejected": -4.7098774909973145,
1311
+ "step": 920
1312
+ },
1313
+ {
1314
+ "epoch": 0.74,
1315
+ "learning_rate": 9.334904715888496e-07,
1316
+ "logits/chosen": -1.7023980617523193,
1317
+ "logits/rejected": -1.4218547344207764,
1318
+ "logps/chosen": -579.594970703125,
1319
+ "logps/rejected": -793.6122436523438,
1320
+ "loss": 0.4859,
1321
+ "rewards/accuracies": 0.7250000238418579,
1322
+ "rewards/chosen": -2.9950103759765625,
1323
+ "rewards/margins": 2.0560593605041504,
1324
+ "rewards/rejected": -5.051069736480713,
1325
+ "step": 930
1326
+ },
1327
+ {
1328
+ "epoch": 0.75,
1329
+ "learning_rate": 8.797002473421729e-07,
1330
+ "logits/chosen": -1.6579145193099976,
1331
+ "logits/rejected": -1.6787497997283936,
1332
+ "logps/chosen": -541.71826171875,
1333
+ "logps/rejected": -658.0726318359375,
1334
+ "loss": 0.4487,
1335
+ "rewards/accuracies": 0.7250000238418579,
1336
+ "rewards/chosen": -2.8812687397003174,
1337
+ "rewards/margins": 1.0891892910003662,
1338
+ "rewards/rejected": -3.9704577922821045,
1339
+ "step": 940
1340
+ },
1341
+ {
1342
+ "epoch": 0.76,
1343
+ "learning_rate": 8.271734841028553e-07,
1344
+ "logits/chosen": -1.638536810874939,
1345
+ "logits/rejected": -1.4352543354034424,
1346
+ "logps/chosen": -552.4362182617188,
1347
+ "logps/rejected": -678.9232788085938,
1348
+ "loss": 0.4098,
1349
+ "rewards/accuracies": 0.7749999761581421,
1350
+ "rewards/chosen": -2.9728922843933105,
1351
+ "rewards/margins": 1.4913972616195679,
1352
+ "rewards/rejected": -4.464289665222168,
1353
+ "step": 950
1354
+ },
1355
+ {
1356
+ "epoch": 0.77,
1357
+ "learning_rate": 7.759511406608255e-07,
1358
+ "logits/chosen": -1.7685962915420532,
1359
+ "logits/rejected": -1.6653659343719482,
1360
+ "logps/chosen": -514.057373046875,
1361
+ "logps/rejected": -638.7364501953125,
1362
+ "loss": 0.4683,
1363
+ "rewards/accuracies": 0.7250000238418579,
1364
+ "rewards/chosen": -2.7808051109313965,
1365
+ "rewards/margins": 1.2293360233306885,
1366
+ "rewards/rejected": -4.010141372680664,
1367
+ "step": 960
1368
+ },
1369
+ {
1370
+ "epoch": 0.78,
1371
+ "learning_rate": 7.260731586586983e-07,
1372
+ "logits/chosen": -1.5464670658111572,
1373
+ "logits/rejected": -1.6253650188446045,
1374
+ "logps/chosen": -365.79888916015625,
1375
+ "logps/rejected": -496.05535888671875,
1376
+ "loss": 0.4741,
1377
+ "rewards/accuracies": 0.625,
1378
+ "rewards/chosen": -2.3713974952697754,
1379
+ "rewards/margins": 1.2587835788726807,
1380
+ "rewards/rejected": -3.630180835723877,
1381
+ "step": 970
1382
+ },
1383
+ {
1384
+ "epoch": 0.78,
1385
+ "learning_rate": 6.775784314464717e-07,
1386
+ "logits/chosen": -1.8969453573226929,
1387
+ "logits/rejected": -1.5643236637115479,
1388
+ "logps/chosen": -483.40655517578125,
1389
+ "logps/rejected": -605.5526123046875,
1390
+ "loss": 0.4348,
1391
+ "rewards/accuracies": 0.625,
1392
+ "rewards/chosen": -2.432926893234253,
1393
+ "rewards/margins": 1.5910948514938354,
1394
+ "rewards/rejected": -4.024021625518799,
1395
+ "step": 980
1396
+ },
1397
+ {
1398
+ "epoch": 0.79,
1399
+ "learning_rate": 6.305047737536707e-07,
1400
+ "logits/chosen": -1.5974626541137695,
1401
+ "logits/rejected": -1.6372623443603516,
1402
+ "logps/chosen": -524.9561767578125,
1403
+ "logps/rejected": -677.9723510742188,
1404
+ "loss": 0.4174,
1405
+ "rewards/accuracies": 0.7250000238418579,
1406
+ "rewards/chosen": -3.2284367084503174,
1407
+ "rewards/margins": 1.2628570795059204,
1408
+ "rewards/rejected": -4.491293907165527,
1409
+ "step": 990
1410
+ },
1411
+ {
1412
+ "epoch": 0.8,
1413
+ "learning_rate": 5.848888922025553e-07,
1414
+ "logits/chosen": -1.6726499795913696,
1415
+ "logits/rejected": -1.8870794773101807,
1416
+ "logps/chosen": -580.0320434570312,
1417
+ "logps/rejected": -648.3935546875,
1418
+ "loss": 0.4433,
1419
+ "rewards/accuracies": 0.699999988079071,
1420
+ "rewards/chosen": -3.139599561691284,
1421
+ "rewards/margins": 0.8954163789749146,
1422
+ "rewards/rejected": -4.035016059875488,
1423
+ "step": 1000
1424
+ },
1425
+ {
1426
+ "epoch": 0.81,
1427
+ "learning_rate": 5.407663566854008e-07,
1428
+ "logits/chosen": -1.9880282878875732,
1429
+ "logits/rejected": -1.7486753463745117,
1430
+ "logps/chosen": -493.7062072753906,
1431
+ "logps/rejected": -590.9326171875,
1432
+ "loss": 0.5279,
1433
+ "rewards/accuracies": 0.675000011920929,
1434
+ "rewards/chosen": -2.3209338188171387,
1435
+ "rewards/margins": 1.3980019092559814,
1436
+ "rewards/rejected": -3.7189362049102783,
1437
+ "step": 1010
1438
+ },
1439
+ {
1440
+ "epoch": 0.82,
1441
+ "learning_rate": 4.981715726281666e-07,
1442
+ "logits/chosen": -1.8252193927764893,
1443
+ "logits/rejected": -1.7858692407608032,
1444
+ "logps/chosen": -575.9556884765625,
1445
+ "logps/rejected": -775.96630859375,
1446
+ "loss": 0.3673,
1447
+ "rewards/accuracies": 0.8500000238418579,
1448
+ "rewards/chosen": -2.9226129055023193,
1449
+ "rewards/margins": 1.7984931468963623,
1450
+ "rewards/rejected": -4.72110652923584,
1451
+ "step": 1020
1452
+ },
1453
+ {
1454
+ "epoch": 0.82,
1455
+ "learning_rate": 4.5713775416217884e-07,
1456
+ "logits/chosen": -1.4751231670379639,
1457
+ "logits/rejected": -1.5509759187698364,
1458
+ "logps/chosen": -474.5804748535156,
1459
+ "logps/rejected": -639.2667846679688,
1460
+ "loss": 0.4805,
1461
+ "rewards/accuracies": 0.7250000238418579,
1462
+ "rewards/chosen": -2.7624380588531494,
1463
+ "rewards/margins": 1.4808038473129272,
1464
+ "rewards/rejected": -4.243242263793945,
1465
+ "step": 1030
1466
+ },
1467
+ {
1468
+ "epoch": 0.83,
1469
+ "learning_rate": 4.1769689822475147e-07,
1470
+ "logits/chosen": -1.4878826141357422,
1471
+ "logits/rejected": -1.6433823108673096,
1472
+ "logps/chosen": -429.17535400390625,
1473
+ "logps/rejected": -613.1614379882812,
1474
+ "loss": 0.5111,
1475
+ "rewards/accuracies": 0.675000011920929,
1476
+ "rewards/chosen": -2.686619520187378,
1477
+ "rewards/margins": 1.2492364645004272,
1478
+ "rewards/rejected": -3.935856342315674,
1479
+ "step": 1040
1480
+ },
1481
+ {
1482
+ "epoch": 0.84,
1483
+ "learning_rate": 3.798797596089351e-07,
1484
+ "logits/chosen": -1.6832265853881836,
1485
+ "logits/rejected": -1.4579049348831177,
1486
+ "logps/chosen": -508.81622314453125,
1487
+ "logps/rejected": -643.6360473632812,
1488
+ "loss": 0.3685,
1489
+ "rewards/accuracies": 0.7749999761581421,
1490
+ "rewards/chosen": -2.7480552196502686,
1491
+ "rewards/margins": 1.529114007949829,
1492
+ "rewards/rejected": -4.277169227600098,
1493
+ "step": 1050
1494
+ },
1495
+ {
1496
+ "epoch": 0.85,
1497
+ "learning_rate": 3.4371582698185636e-07,
1498
+ "logits/chosen": -1.819790244102478,
1499
+ "logits/rejected": -1.6671082973480225,
1500
+ "logps/chosen": -602.0831298828125,
1501
+ "logps/rejected": -778.685546875,
1502
+ "loss": 0.3429,
1503
+ "rewards/accuracies": 0.824999988079071,
1504
+ "rewards/chosen": -3.222461223602295,
1505
+ "rewards/margins": 1.466620683670044,
1506
+ "rewards/rejected": -4.68908166885376,
1507
+ "step": 1060
1508
+ },
1509
+ {
1510
+ "epoch": 0.86,
1511
+ "learning_rate": 3.092332998903416e-07,
1512
+ "logits/chosen": -1.7987339496612549,
1513
+ "logits/rejected": -1.8623387813568115,
1514
+ "logps/chosen": -553.125244140625,
1515
+ "logps/rejected": -703.0887451171875,
1516
+ "loss": 0.4544,
1517
+ "rewards/accuracies": 0.7250000238418579,
1518
+ "rewards/chosen": -3.028597116470337,
1519
+ "rewards/margins": 1.3114429712295532,
1520
+ "rewards/rejected": -4.34004020690918,
1521
+ "step": 1070
1522
+ },
1523
+ {
1524
+ "epoch": 0.86,
1525
+ "learning_rate": 2.764590667717562e-07,
1526
+ "logits/chosen": -1.5797593593597412,
1527
+ "logits/rejected": -1.3366864919662476,
1528
+ "logps/chosen": -491.5816345214844,
1529
+ "logps/rejected": -525.8438110351562,
1530
+ "loss": 0.4606,
1531
+ "rewards/accuracies": 0.550000011920929,
1532
+ "rewards/chosen": -3.148127794265747,
1533
+ "rewards/margins": 0.6825836300849915,
1534
+ "rewards/rejected": -3.830711841583252,
1535
+ "step": 1080
1536
+ },
1537
+ {
1538
+ "epoch": 0.87,
1539
+ "learning_rate": 2.454186839872158e-07,
1540
+ "logits/chosen": -1.5686615705490112,
1541
+ "logits/rejected": -1.4363592863082886,
1542
+ "logps/chosen": -489.3838806152344,
1543
+ "logps/rejected": -668.1046752929688,
1544
+ "loss": 0.4054,
1545
+ "rewards/accuracies": 0.875,
1546
+ "rewards/chosen": -2.8657939434051514,
1547
+ "rewards/margins": 1.8218481540679932,
1548
+ "rewards/rejected": -4.6876420974731445,
1549
+ "step": 1090
1550
+ },
1551
+ {
1552
+ "epoch": 0.88,
1553
+ "learning_rate": 2.1613635589349756e-07,
1554
+ "logits/chosen": -1.7610828876495361,
1555
+ "logits/rejected": -1.5154088735580444,
1556
+ "logps/chosen": -484.9877014160156,
1557
+ "logps/rejected": -566.8745727539062,
1558
+ "loss": 0.5207,
1559
+ "rewards/accuracies": 0.5,
1560
+ "rewards/chosen": -2.580667018890381,
1561
+ "rewards/margins": 0.9732138514518738,
1562
+ "rewards/rejected": -3.5538806915283203,
1563
+ "step": 1100
1564
+ },
1565
+ {
1566
+ "epoch": 0.89,
1567
+ "learning_rate": 1.8863491596921745e-07,
1568
+ "logits/chosen": -1.7479203939437866,
1569
+ "logits/rejected": -1.4576488733291626,
1570
+ "logps/chosen": -534.5175170898438,
1571
+ "logps/rejected": -588.4047241210938,
1572
+ "loss": 0.4262,
1573
+ "rewards/accuracies": 0.6000000238418579,
1574
+ "rewards/chosen": -2.9962010383605957,
1575
+ "rewards/margins": 1.064734697341919,
1576
+ "rewards/rejected": -4.060935020446777,
1577
+ "step": 1110
1578
+ },
1579
+ {
1580
+ "epoch": 0.9,
1581
+ "learning_rate": 1.629358090099639e-07,
1582
+ "logits/chosen": -1.6448123455047607,
1583
+ "logits/rejected": -1.516871690750122,
1584
+ "logps/chosen": -528.2589721679688,
1585
+ "logps/rejected": -610.9138793945312,
1586
+ "loss": 0.4469,
1587
+ "rewards/accuracies": 0.800000011920929,
1588
+ "rewards/chosen": -3.179063558578491,
1589
+ "rewards/margins": 1.1216888427734375,
1590
+ "rewards/rejected": -4.30075216293335,
1591
+ "step": 1120
1592
+ },
1593
+ {
1594
+ "epoch": 0.9,
1595
+ "learning_rate": 1.3905907440629752e-07,
1596
+ "logits/chosen": -1.681919813156128,
1597
+ "logits/rejected": -1.4408434629440308,
1598
+ "logps/chosen": -494.0121154785156,
1599
+ "logps/rejected": -599.4526977539062,
1600
+ "loss": 0.4937,
1601
+ "rewards/accuracies": 0.625,
1602
+ "rewards/chosen": -2.8302178382873535,
1603
+ "rewards/margins": 1.3215891122817993,
1604
+ "rewards/rejected": -4.1518073081970215,
1605
+ "step": 1130
1606
+ },
1607
+ {
1608
+ "epoch": 0.91,
1609
+ "learning_rate": 1.1702333051763271e-07,
1610
+ "logits/chosen": -1.5896549224853516,
1611
+ "logits/rejected": -1.3458514213562012,
1612
+ "logps/chosen": -558.2330322265625,
1613
+ "logps/rejected": -696.0857543945312,
1614
+ "loss": 0.4117,
1615
+ "rewards/accuracies": 0.699999988079071,
1616
+ "rewards/chosen": -2.992506742477417,
1617
+ "rewards/margins": 1.719747543334961,
1618
+ "rewards/rejected": -4.712254047393799,
1619
+ "step": 1140
1620
+ },
1621
+ {
1622
+ "epoch": 0.92,
1623
+ "learning_rate": 9.684576015420277e-08,
1624
+ "logits/chosen": -1.533817172050476,
1625
+ "logits/rejected": -1.3470897674560547,
1626
+ "logps/chosen": -540.8046875,
1627
+ "logps/rejected": -716.5506591796875,
1628
+ "loss": 0.3407,
1629
+ "rewards/accuracies": 0.7250000238418579,
1630
+ "rewards/chosen": -2.967010021209717,
1631
+ "rewards/margins": 1.7264270782470703,
1632
+ "rewards/rejected": -4.693437099456787,
1633
+ "step": 1150
1634
+ },
1635
+ {
1636
+ "epoch": 0.93,
1637
+ "learning_rate": 7.854209717842231e-08,
1638
+ "logits/chosen": -1.5058703422546387,
1639
+ "logits/rejected": -1.4305099248886108,
1640
+ "logps/chosen": -563.594970703125,
1641
+ "logps/rejected": -608.3973388671875,
1642
+ "loss": 0.4775,
1643
+ "rewards/accuracies": 0.675000011920929,
1644
+ "rewards/chosen": -3.313091278076172,
1645
+ "rewards/margins": 0.6520703434944153,
1646
+ "rewards/rejected": -3.9651618003845215,
1647
+ "step": 1160
1648
+ },
1649
+ {
1650
+ "epoch": 0.94,
1651
+ "learning_rate": 6.212661423609184e-08,
1652
+ "logits/chosen": -1.4233185052871704,
1653
+ "logits/rejected": -1.3441836833953857,
1654
+ "logps/chosen": -597.6832275390625,
1655
+ "logps/rejected": -758.8271484375,
1656
+ "loss": 0.5072,
1657
+ "rewards/accuracies": 0.675000011920929,
1658
+ "rewards/chosen": -3.8257243633270264,
1659
+ "rewards/margins": 1.319272756576538,
1660
+ "rewards/rejected": -5.144996643066406,
1661
+ "step": 1170
1662
+ },
1663
+ {
1664
+ "epoch": 0.94,
1665
+ "learning_rate": 4.761211162702117e-08,
1666
+ "logits/chosen": -1.603582739830017,
1667
+ "logits/rejected": -1.5437158346176147,
1668
+ "logps/chosen": -524.7338256835938,
1669
+ "logps/rejected": -654.0867919921875,
1670
+ "loss": 0.443,
1671
+ "rewards/accuracies": 0.7749999761581421,
1672
+ "rewards/chosen": -2.977849006652832,
1673
+ "rewards/margins": 1.4028117656707764,
1674
+ "rewards/rejected": -4.3806610107421875,
1675
+ "step": 1180
1676
+ },
1677
+ {
1678
+ "epoch": 0.95,
1679
+ "learning_rate": 3.5009907323737826e-08,
1680
+ "logits/chosen": -1.5934550762176514,
1681
+ "logits/rejected": -1.2932679653167725,
1682
+ "logps/chosen": -560.14501953125,
1683
+ "logps/rejected": -724.4017333984375,
1684
+ "loss": 0.4103,
1685
+ "rewards/accuracies": 0.699999988079071,
1686
+ "rewards/chosen": -3.257315158843994,
1687
+ "rewards/margins": 1.7088110446929932,
1688
+ "rewards/rejected": -4.966126441955566,
1689
+ "step": 1190
1690
+ },
1691
+ {
1692
+ "epoch": 0.96,
1693
+ "learning_rate": 2.4329828146074096e-08,
1694
+ "logits/chosen": -1.9117376804351807,
1695
+ "logits/rejected": -1.5988755226135254,
1696
+ "logps/chosen": -599.6260986328125,
1697
+ "logps/rejected": -691.9334716796875,
1698
+ "loss": 0.4567,
1699
+ "rewards/accuracies": 0.75,
1700
+ "rewards/chosen": -3.3284664154052734,
1701
+ "rewards/margins": 1.461777925491333,
1702
+ "rewards/rejected": -4.790244102478027,
1703
+ "step": 1200
1704
+ },
1705
+ {
1706
+ "epoch": 0.97,
1707
+ "learning_rate": 1.5580202098509078e-08,
1708
+ "logits/chosen": -1.714948058128357,
1709
+ "logits/rejected": -1.6837193965911865,
1710
+ "logps/chosen": -673.5400390625,
1711
+ "logps/rejected": -731.3297119140625,
1712
+ "loss": 0.519,
1713
+ "rewards/accuracies": 0.7250000238418579,
1714
+ "rewards/chosen": -3.3261985778808594,
1715
+ "rewards/margins": 1.0625519752502441,
1716
+ "rewards/rejected": -4.388751029968262,
1717
+ "step": 1210
1718
+ },
1719
+ {
1720
+ "epoch": 0.98,
1721
+ "learning_rate": 8.767851876239075e-09,
1722
+ "logits/chosen": -1.4761461019515991,
1723
+ "logits/rejected": -1.4198424816131592,
1724
+ "logps/chosen": -547.0154418945312,
1725
+ "logps/rejected": -722.1144409179688,
1726
+ "loss": 0.4213,
1727
+ "rewards/accuracies": 0.7250000238418579,
1728
+ "rewards/chosen": -3.2144112586975098,
1729
+ "rewards/margins": 1.5727777481079102,
1730
+ "rewards/rejected": -4.78718900680542,
1731
+ "step": 1220
1732
+ },
1733
+ {
1734
+ "epoch": 0.98,
1735
+ "learning_rate": 3.8980895450474455e-09,
1736
+ "logits/chosen": -1.6909526586532593,
1737
+ "logits/rejected": -1.5214656591415405,
1738
+ "logps/chosen": -526.8455810546875,
1739
+ "logps/rejected": -642.462646484375,
1740
+ "loss": 0.4702,
1741
+ "rewards/accuracies": 0.699999988079071,
1742
+ "rewards/chosen": -3.0600032806396484,
1743
+ "rewards/margins": 1.274106502532959,
1744
+ "rewards/rejected": -4.334109783172607,
1745
+ "step": 1230
1746
+ },
1747
+ {
1748
+ "epoch": 0.99,
1749
+ "learning_rate": 9.747123991141193e-10,
1750
+ "logits/chosen": -1.7589871883392334,
1751
+ "logits/rejected": -1.6617376804351807,
1752
+ "logps/chosen": -495.7108459472656,
1753
+ "logps/rejected": -648.486083984375,
1754
+ "loss": 0.3952,
1755
+ "rewards/accuracies": 0.7250000238418579,
1756
+ "rewards/chosen": -2.6948044300079346,
1757
+ "rewards/margins": 1.5261682271957397,
1758
+ "rewards/rejected": -4.220972537994385,
1759
+ "step": 1240
1760
+ },
1761
+ {
1762
+ "epoch": 1.0,
1763
+ "learning_rate": 0.0,
1764
+ "logits/chosen": -1.6433188915252686,
1765
+ "logits/rejected": -1.4641611576080322,
1766
+ "logps/chosen": -550.0238037109375,
1767
+ "logps/rejected": -689.4912109375,
1768
+ "loss": 0.3973,
1769
+ "rewards/accuracies": 0.625,
1770
+ "rewards/chosen": -3.04687237739563,
1771
+ "rewards/margins": 1.4918220043182373,
1772
+ "rewards/rejected": -4.538693904876709,
1773
+ "step": 1250
1774
+ },
1775
+ {
1776
+ "epoch": 1.0,
1777
+ "step": 1250,
1778
+ "total_flos": 0.0,
1779
+ "train_loss": 0.47997802200317385,
1780
+ "train_runtime": 13155.732,
1781
+ "train_samples_per_second": 1.14,
1782
+ "train_steps_per_second": 0.095
1783
+ }
1784
+ ],
1785
+ "logging_steps": 10,
1786
+ "max_steps": 1250,
1787
+ "num_input_tokens_seen": 0,
1788
+ "num_train_epochs": 1,
1789
+ "save_steps": 20,
1790
+ "total_flos": 0.0,
1791
+ "train_batch_size": 2,
1792
+ "trial_name": null,
1793
+ "trial_params": null
1794
+ }