yyx123 commited on
Commit
62fb982
·
verified ·
1 Parent(s): 52a0119

Model save

Browse files
Files changed (5) hide show
  1. README.md +78 -0
  2. all_results.json +13 -0
  3. eval_results.json +8 -0
  4. train_results.json +8 -0
  5. trainer_state.json +1907 -0
README.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: peft
4
+ tags:
5
+ - trl
6
+ - sft
7
+ - generated_from_trainer
8
+ base_model: 01-ai/Yi-6B
9
+ model-index:
10
+ - name: Yi-6B-ruozhiba-1e-5
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # Yi-6B-ruozhiba-1e-5
18
+
19
+ This model is a fine-tuned version of [01-ai/Yi-6B](https://huggingface.co/01-ai/Yi-6B) on the None dataset.
20
+ It achieves the following results on the evaluation set:
21
+ - Loss: 1.9852
22
+
23
+ ## Model description
24
+
25
+ More information needed
26
+
27
+ ## Intended uses & limitations
28
+
29
+ More information needed
30
+
31
+ ## Training and evaluation data
32
+
33
+ More information needed
34
+
35
+ ## Training procedure
36
+
37
+ ### Training hyperparameters
38
+
39
+ The following hyperparameters were used during training:
40
+ - learning_rate: 1e-05
41
+ - train_batch_size: 4
42
+ - eval_batch_size: 4
43
+ - seed: 42
44
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
45
+ - lr_scheduler_type: cosine
46
+ - lr_scheduler_warmup_ratio: 0.1
47
+ - num_epochs: 20
48
+
49
+ ### Training results
50
+
51
+ | Training Loss | Epoch | Step | Validation Loss |
52
+ |:-------------:|:-----:|:----:|:---------------:|
53
+ | 1.9439 | 2.0 | 110 | 2.0206 |
54
+ | 1.8731 | 3.0 | 165 | 1.9055 |
55
+ | 1.7574 | 4.0 | 220 | 1.8510 |
56
+ | 1.7266 | 5.0 | 275 | 1.8366 |
57
+ | 1.6036 | 7.0 | 385 | 1.8308 |
58
+ | 1.7214 | 8.0 | 440 | 1.8380 |
59
+ | 1.5245 | 9.0 | 495 | 1.8495 |
60
+ | 1.5239 | 10.0 | 550 | 1.8638 |
61
+ | 1.4286 | 11.0 | 605 | 1.8771 |
62
+ | 1.3534 | 12.0 | 660 | 1.9030 |
63
+ | 1.3895 | 14.0 | 770 | 1.9447 |
64
+ | 1.3721 | 15.0 | 825 | 1.9617 |
65
+ | 1.3598 | 16.0 | 880 | 1.9719 |
66
+ | 1.3015 | 17.0 | 935 | 1.9796 |
67
+ | 1.3456 | 18.0 | 990 | 1.9831 |
68
+ | 1.2136 | 19.0 | 1045 | 1.9848 |
69
+ | 1.302 | 20.0 | 1100 | 1.9852 |
70
+
71
+
72
+ ### Framework versions
73
+
74
+ - PEFT 0.7.1
75
+ - Transformers 4.36.2
76
+ - Pytorch 2.2.2+cu118
77
+ - Datasets 2.14.6
78
+ - Tokenizers 0.15.2
all_results.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 20.0,
3
+ "eval_loss": 1.9851655960083008,
4
+ "eval_runtime": 4.9249,
5
+ "eval_samples": 23,
6
+ "eval_samples_per_second": 4.67,
7
+ "eval_steps_per_second": 1.218,
8
+ "train_loss": 0.4608629340475256,
9
+ "train_runtime": 2621.4029,
10
+ "train_samples": 217,
11
+ "train_samples_per_second": 1.656,
12
+ "train_steps_per_second": 0.42
13
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 20.0,
3
+ "eval_loss": 1.9851655960083008,
4
+ "eval_runtime": 4.9249,
5
+ "eval_samples": 23,
6
+ "eval_samples_per_second": 4.67,
7
+ "eval_steps_per_second": 1.218
8
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 20.0,
3
+ "train_loss": 0.4608629340475256,
4
+ "train_runtime": 2621.4029,
5
+ "train_samples": 217,
6
+ "train_samples_per_second": 1.656,
7
+ "train_steps_per_second": 0.42
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1907 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 20.0,
5
+ "eval_steps": 500,
6
+ "global_step": 1100,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.02,
13
+ "learning_rate": 9.090909090909091e-08,
14
+ "loss": 2.3833,
15
+ "step": 1
16
+ },
17
+ {
18
+ "epoch": 0.07,
19
+ "learning_rate": 3.6363636363636366e-07,
20
+ "loss": 2.4789,
21
+ "step": 4
22
+ },
23
+ {
24
+ "epoch": 0.15,
25
+ "learning_rate": 7.272727272727273e-07,
26
+ "loss": 2.3195,
27
+ "step": 8
28
+ },
29
+ {
30
+ "epoch": 0.22,
31
+ "learning_rate": 1.090909090909091e-06,
32
+ "loss": 2.3366,
33
+ "step": 12
34
+ },
35
+ {
36
+ "epoch": 0.29,
37
+ "learning_rate": 1.4545454545454546e-06,
38
+ "loss": 2.3221,
39
+ "step": 16
40
+ },
41
+ {
42
+ "epoch": 0.36,
43
+ "learning_rate": 1.8181818181818183e-06,
44
+ "loss": 2.4036,
45
+ "step": 20
46
+ },
47
+ {
48
+ "epoch": 0.44,
49
+ "learning_rate": 2.181818181818182e-06,
50
+ "loss": 2.4224,
51
+ "step": 24
52
+ },
53
+ {
54
+ "epoch": 0.51,
55
+ "learning_rate": 2.5454545454545456e-06,
56
+ "loss": 2.6085,
57
+ "step": 28
58
+ },
59
+ {
60
+ "epoch": 0.58,
61
+ "learning_rate": 2.9090909090909093e-06,
62
+ "loss": 2.5477,
63
+ "step": 32
64
+ },
65
+ {
66
+ "epoch": 0.65,
67
+ "learning_rate": 3.272727272727273e-06,
68
+ "loss": 2.4446,
69
+ "step": 36
70
+ },
71
+ {
72
+ "epoch": 0.73,
73
+ "learning_rate": 3.6363636363636366e-06,
74
+ "loss": 2.3109,
75
+ "step": 40
76
+ },
77
+ {
78
+ "epoch": 0.8,
79
+ "learning_rate": 4.000000000000001e-06,
80
+ "loss": 2.4149,
81
+ "step": 44
82
+ },
83
+ {
84
+ "epoch": 0.87,
85
+ "learning_rate": 4.363636363636364e-06,
86
+ "loss": 2.5514,
87
+ "step": 48
88
+ },
89
+ {
90
+ "epoch": 0.95,
91
+ "learning_rate": 4.727272727272728e-06,
92
+ "loss": 2.3816,
93
+ "step": 52
94
+ },
95
+ {
96
+ "epoch": 1.02,
97
+ "learning_rate": 5.090909090909091e-06,
98
+ "loss": 2.6293,
99
+ "step": 56
100
+ },
101
+ {
102
+ "epoch": 1.09,
103
+ "learning_rate": 5.4545454545454545e-06,
104
+ "loss": 2.2422,
105
+ "step": 60
106
+ },
107
+ {
108
+ "epoch": 1.16,
109
+ "learning_rate": 5.8181818181818185e-06,
110
+ "loss": 2.4031,
111
+ "step": 64
112
+ },
113
+ {
114
+ "epoch": 1.24,
115
+ "learning_rate": 6.181818181818182e-06,
116
+ "loss": 2.2303,
117
+ "step": 68
118
+ },
119
+ {
120
+ "epoch": 1.31,
121
+ "learning_rate": 6.545454545454546e-06,
122
+ "loss": 2.2847,
123
+ "step": 72
124
+ },
125
+ {
126
+ "epoch": 1.38,
127
+ "learning_rate": 6.90909090909091e-06,
128
+ "loss": 2.1578,
129
+ "step": 76
130
+ },
131
+ {
132
+ "epoch": 1.45,
133
+ "learning_rate": 7.272727272727273e-06,
134
+ "loss": 2.1774,
135
+ "step": 80
136
+ },
137
+ {
138
+ "epoch": 1.53,
139
+ "learning_rate": 7.636363636363638e-06,
140
+ "loss": 2.197,
141
+ "step": 84
142
+ },
143
+ {
144
+ "epoch": 1.6,
145
+ "learning_rate": 8.000000000000001e-06,
146
+ "loss": 2.2093,
147
+ "step": 88
148
+ },
149
+ {
150
+ "epoch": 1.67,
151
+ "learning_rate": 8.363636363636365e-06,
152
+ "loss": 2.1004,
153
+ "step": 92
154
+ },
155
+ {
156
+ "epoch": 1.75,
157
+ "learning_rate": 8.727272727272728e-06,
158
+ "loss": 2.0526,
159
+ "step": 96
160
+ },
161
+ {
162
+ "epoch": 1.82,
163
+ "learning_rate": 9.090909090909091e-06,
164
+ "loss": 2.0771,
165
+ "step": 100
166
+ },
167
+ {
168
+ "epoch": 1.89,
169
+ "learning_rate": 9.454545454545456e-06,
170
+ "loss": 2.0219,
171
+ "step": 104
172
+ },
173
+ {
174
+ "epoch": 1.96,
175
+ "learning_rate": 9.81818181818182e-06,
176
+ "loss": 1.9439,
177
+ "step": 108
178
+ },
179
+ {
180
+ "epoch": 2.0,
181
+ "gpt4_scores": 0.3833333333333333,
182
+ "step": 110
183
+ },
184
+ {
185
+ "epoch": 2.0,
186
+ "eval_loss": 2.0205769538879395,
187
+ "eval_runtime": 4.9749,
188
+ "eval_samples_per_second": 4.623,
189
+ "eval_steps_per_second": 1.206,
190
+ "step": 110
191
+ },
192
+ {
193
+ "epoch": 2.04,
194
+ "learning_rate": 9.999899300364534e-06,
195
+ "loss": 2.1647,
196
+ "step": 112
197
+ },
198
+ {
199
+ "epoch": 2.11,
200
+ "learning_rate": 9.99909372761763e-06,
201
+ "loss": 1.9397,
202
+ "step": 116
203
+ },
204
+ {
205
+ "epoch": 2.18,
206
+ "learning_rate": 9.997482711915926e-06,
207
+ "loss": 2.1249,
208
+ "step": 120
209
+ },
210
+ {
211
+ "epoch": 2.25,
212
+ "learning_rate": 9.99506651282272e-06,
213
+ "loss": 1.9123,
214
+ "step": 124
215
+ },
216
+ {
217
+ "epoch": 2.33,
218
+ "learning_rate": 9.991845519630679e-06,
219
+ "loss": 1.8704,
220
+ "step": 128
221
+ },
222
+ {
223
+ "epoch": 2.4,
224
+ "learning_rate": 9.987820251299121e-06,
225
+ "loss": 1.9337,
226
+ "step": 132
227
+ },
228
+ {
229
+ "epoch": 2.47,
230
+ "learning_rate": 9.982991356370404e-06,
231
+ "loss": 1.8947,
232
+ "step": 136
233
+ },
234
+ {
235
+ "epoch": 2.55,
236
+ "learning_rate": 9.977359612865424e-06,
237
+ "loss": 1.8811,
238
+ "step": 140
239
+ },
240
+ {
241
+ "epoch": 2.62,
242
+ "learning_rate": 9.970925928158275e-06,
243
+ "loss": 1.8535,
244
+ "step": 144
245
+ },
246
+ {
247
+ "epoch": 2.69,
248
+ "learning_rate": 9.963691338830045e-06,
249
+ "loss": 1.8775,
250
+ "step": 148
251
+ },
252
+ {
253
+ "epoch": 2.76,
254
+ "learning_rate": 9.955657010501807e-06,
255
+ "loss": 1.8215,
256
+ "step": 152
257
+ },
258
+ {
259
+ "epoch": 2.84,
260
+ "learning_rate": 9.946824237646823e-06,
261
+ "loss": 1.9721,
262
+ "step": 156
263
+ },
264
+ {
265
+ "epoch": 2.91,
266
+ "learning_rate": 9.937194443381972e-06,
267
+ "loss": 2.0102,
268
+ "step": 160
269
+ },
270
+ {
271
+ "epoch": 2.98,
272
+ "learning_rate": 9.926769179238467e-06,
273
+ "loss": 1.8731,
274
+ "step": 164
275
+ },
276
+ {
277
+ "epoch": 3.0,
278
+ "gpt4_scores": 0.7999999999999999,
279
+ "step": 165
280
+ },
281
+ {
282
+ "epoch": 3.0,
283
+ "eval_loss": 1.9054837226867676,
284
+ "eval_runtime": 4.9773,
285
+ "eval_samples_per_second": 4.621,
286
+ "eval_steps_per_second": 1.205,
287
+ "step": 165
288
+ },
289
+ {
290
+ "epoch": 3.05,
291
+ "learning_rate": 9.915550124911866e-06,
292
+ "loss": 1.7406,
293
+ "step": 168
294
+ },
295
+ {
296
+ "epoch": 3.13,
297
+ "learning_rate": 9.903539087991462e-06,
298
+ "loss": 1.9294,
299
+ "step": 172
300
+ },
301
+ {
302
+ "epoch": 3.2,
303
+ "learning_rate": 9.890738003669029e-06,
304
+ "loss": 1.8174,
305
+ "step": 176
306
+ },
307
+ {
308
+ "epoch": 3.27,
309
+ "learning_rate": 9.877148934427037e-06,
310
+ "loss": 1.8922,
311
+ "step": 180
312
+ },
313
+ {
314
+ "epoch": 3.35,
315
+ "learning_rate": 9.862774069706346e-06,
316
+ "loss": 1.823,
317
+ "step": 184
318
+ },
319
+ {
320
+ "epoch": 3.42,
321
+ "learning_rate": 9.847615725553457e-06,
322
+ "loss": 1.7867,
323
+ "step": 188
324
+ },
325
+ {
326
+ "epoch": 3.49,
327
+ "learning_rate": 9.831676344247343e-06,
328
+ "loss": 1.7675,
329
+ "step": 192
330
+ },
331
+ {
332
+ "epoch": 3.56,
333
+ "learning_rate": 9.814958493905962e-06,
334
+ "loss": 1.8052,
335
+ "step": 196
336
+ },
337
+ {
338
+ "epoch": 3.64,
339
+ "learning_rate": 9.797464868072489e-06,
340
+ "loss": 1.799,
341
+ "step": 200
342
+ },
343
+ {
344
+ "epoch": 3.71,
345
+ "learning_rate": 9.779198285281326e-06,
346
+ "loss": 1.8455,
347
+ "step": 204
348
+ },
349
+ {
350
+ "epoch": 3.78,
351
+ "learning_rate": 9.760161688604008e-06,
352
+ "loss": 1.8541,
353
+ "step": 208
354
+ },
355
+ {
356
+ "epoch": 3.85,
357
+ "learning_rate": 9.740358145174999e-06,
358
+ "loss": 1.7094,
359
+ "step": 212
360
+ },
361
+ {
362
+ "epoch": 3.93,
363
+ "learning_rate": 9.719790845697534e-06,
364
+ "loss": 1.8727,
365
+ "step": 216
366
+ },
367
+ {
368
+ "epoch": 4.0,
369
+ "learning_rate": 9.698463103929542e-06,
370
+ "loss": 1.7574,
371
+ "step": 220
372
+ },
373
+ {
374
+ "epoch": 4.0,
375
+ "gpt4_scores": 0.7999999999999999,
376
+ "step": 220
377
+ },
378
+ {
379
+ "epoch": 4.0,
380
+ "eval_loss": 1.8509678840637207,
381
+ "eval_runtime": 4.9667,
382
+ "eval_samples_per_second": 4.631,
383
+ "eval_steps_per_second": 1.208,
384
+ "step": 220
385
+ },
386
+ {
387
+ "epoch": 4.07,
388
+ "learning_rate": 9.676378356149733e-06,
389
+ "loss": 1.7719,
390
+ "step": 224
391
+ },
392
+ {
393
+ "epoch": 4.15,
394
+ "learning_rate": 9.653540160603956e-06,
395
+ "loss": 1.7947,
396
+ "step": 228
397
+ },
398
+ {
399
+ "epoch": 4.22,
400
+ "learning_rate": 9.629952196931902e-06,
401
+ "loss": 1.6527,
402
+ "step": 232
403
+ },
404
+ {
405
+ "epoch": 4.29,
406
+ "learning_rate": 9.60561826557425e-06,
407
+ "loss": 1.8207,
408
+ "step": 236
409
+ },
410
+ {
411
+ "epoch": 4.36,
412
+ "learning_rate": 9.580542287160348e-06,
413
+ "loss": 1.8435,
414
+ "step": 240
415
+ },
416
+ {
417
+ "epoch": 4.44,
418
+ "learning_rate": 9.554728301876525e-06,
419
+ "loss": 1.6849,
420
+ "step": 244
421
+ },
422
+ {
423
+ "epoch": 4.51,
424
+ "learning_rate": 9.528180468815155e-06,
425
+ "loss": 1.7372,
426
+ "step": 248
427
+ },
428
+ {
429
+ "epoch": 4.58,
430
+ "learning_rate": 9.50090306530454e-06,
431
+ "loss": 1.7699,
432
+ "step": 252
433
+ },
434
+ {
435
+ "epoch": 4.65,
436
+ "learning_rate": 9.47290048621977e-06,
437
+ "loss": 1.7707,
438
+ "step": 256
439
+ },
440
+ {
441
+ "epoch": 4.73,
442
+ "learning_rate": 9.444177243274619e-06,
443
+ "loss": 1.7384,
444
+ "step": 260
445
+ },
446
+ {
447
+ "epoch": 4.8,
448
+ "learning_rate": 9.414737964294636e-06,
449
+ "loss": 1.7769,
450
+ "step": 264
451
+ },
452
+ {
453
+ "epoch": 4.87,
454
+ "learning_rate": 9.384587392471516e-06,
455
+ "loss": 1.7427,
456
+ "step": 268
457
+ },
458
+ {
459
+ "epoch": 4.95,
460
+ "learning_rate": 9.353730385598887e-06,
461
+ "loss": 1.7266,
462
+ "step": 272
463
+ },
464
+ {
465
+ "epoch": 5.0,
466
+ "gpt4_scores": 0.7833333333333332,
467
+ "step": 275
468
+ },
469
+ {
470
+ "epoch": 5.0,
471
+ "eval_loss": 1.8366360664367676,
472
+ "eval_runtime": 4.9298,
473
+ "eval_samples_per_second": 4.665,
474
+ "eval_steps_per_second": 1.217,
475
+ "step": 275
476
+ },
477
+ {
478
+ "epoch": 5.02,
479
+ "learning_rate": 9.322171915289635e-06,
480
+ "loss": 1.7487,
481
+ "step": 276
482
+ },
483
+ {
484
+ "epoch": 5.09,
485
+ "learning_rate": 9.289917066174887e-06,
486
+ "loss": 1.765,
487
+ "step": 280
488
+ },
489
+ {
490
+ "epoch": 5.16,
491
+ "learning_rate": 9.256971035084786e-06,
492
+ "loss": 1.7863,
493
+ "step": 284
494
+ },
495
+ {
496
+ "epoch": 5.24,
497
+ "learning_rate": 9.223339130211194e-06,
498
+ "loss": 1.7161,
499
+ "step": 288
500
+ },
501
+ {
502
+ "epoch": 5.31,
503
+ "learning_rate": 9.189026770252437e-06,
504
+ "loss": 1.699,
505
+ "step": 292
506
+ },
507
+ {
508
+ "epoch": 5.38,
509
+ "learning_rate": 9.154039483540273e-06,
510
+ "loss": 1.8007,
511
+ "step": 296
512
+ },
513
+ {
514
+ "epoch": 5.45,
515
+ "learning_rate": 9.118382907149164e-06,
516
+ "loss": 1.7352,
517
+ "step": 300
518
+ },
519
+ {
520
+ "epoch": 5.53,
521
+ "learning_rate": 9.08206278598805e-06,
522
+ "loss": 1.6892,
523
+ "step": 304
524
+ },
525
+ {
526
+ "epoch": 5.6,
527
+ "learning_rate": 9.045084971874738e-06,
528
+ "loss": 1.7667,
529
+ "step": 308
530
+ },
531
+ {
532
+ "epoch": 5.67,
533
+ "learning_rate": 9.007455422593077e-06,
534
+ "loss": 1.6885,
535
+ "step": 312
536
+ },
537
+ {
538
+ "epoch": 5.75,
539
+ "learning_rate": 8.969180200933048e-06,
540
+ "loss": 1.6391,
541
+ "step": 316
542
+ },
543
+ {
544
+ "epoch": 5.82,
545
+ "learning_rate": 8.930265473713939e-06,
546
+ "loss": 1.7314,
547
+ "step": 320
548
+ },
549
+ {
550
+ "epoch": 5.89,
551
+ "learning_rate": 8.890717510790763e-06,
552
+ "loss": 1.6487,
553
+ "step": 324
554
+ },
555
+ {
556
+ "epoch": 5.96,
557
+ "learning_rate": 8.850542684044078e-06,
558
+ "loss": 1.6627,
559
+ "step": 328
560
+ },
561
+ {
562
+ "epoch": 6.04,
563
+ "learning_rate": 8.809747466353356e-06,
564
+ "loss": 1.7486,
565
+ "step": 332
566
+ },
567
+ {
568
+ "epoch": 6.11,
569
+ "learning_rate": 8.768338430554083e-06,
570
+ "loss": 1.7223,
571
+ "step": 336
572
+ },
573
+ {
574
+ "epoch": 6.18,
575
+ "learning_rate": 8.726322248378775e-06,
576
+ "loss": 1.675,
577
+ "step": 340
578
+ },
579
+ {
580
+ "epoch": 6.25,
581
+ "learning_rate": 8.683705689382025e-06,
582
+ "loss": 1.6991,
583
+ "step": 344
584
+ },
585
+ {
586
+ "epoch": 6.33,
587
+ "learning_rate": 8.640495619849821e-06,
588
+ "loss": 1.6633,
589
+ "step": 348
590
+ },
591
+ {
592
+ "epoch": 6.4,
593
+ "learning_rate": 8.596699001693257e-06,
594
+ "loss": 1.7376,
595
+ "step": 352
596
+ },
597
+ {
598
+ "epoch": 6.47,
599
+ "learning_rate": 8.552322891326846e-06,
600
+ "loss": 1.7234,
601
+ "step": 356
602
+ },
603
+ {
604
+ "epoch": 6.55,
605
+ "learning_rate": 8.507374438531606e-06,
606
+ "loss": 1.6112,
607
+ "step": 360
608
+ },
609
+ {
610
+ "epoch": 6.62,
611
+ "learning_rate": 8.461860885303116e-06,
612
+ "loss": 1.6037,
613
+ "step": 364
614
+ },
615
+ {
616
+ "epoch": 6.69,
617
+ "learning_rate": 8.415789564684673e-06,
618
+ "loss": 1.6574,
619
+ "step": 368
620
+ },
621
+ {
622
+ "epoch": 6.76,
623
+ "learning_rate": 8.36916789958584e-06,
624
+ "loss": 1.6418,
625
+ "step": 372
626
+ },
627
+ {
628
+ "epoch": 6.84,
629
+ "learning_rate": 8.322003401586463e-06,
630
+ "loss": 1.7189,
631
+ "step": 376
632
+ },
633
+ {
634
+ "epoch": 6.91,
635
+ "learning_rate": 8.274303669726427e-06,
636
+ "loss": 1.5976,
637
+ "step": 380
638
+ },
639
+ {
640
+ "epoch": 6.98,
641
+ "learning_rate": 8.226076389281316e-06,
642
+ "loss": 1.6036,
643
+ "step": 384
644
+ },
645
+ {
646
+ "epoch": 7.0,
647
+ "gpt4_scores": 0.75,
648
+ "step": 385
649
+ },
650
+ {
651
+ "epoch": 7.0,
652
+ "eval_loss": 1.830824851989746,
653
+ "eval_runtime": 4.9258,
654
+ "eval_samples_per_second": 4.669,
655
+ "eval_steps_per_second": 1.218,
656
+ "step": 385
657
+ },
658
+ {
659
+ "epoch": 7.05,
660
+ "learning_rate": 8.177329330524182e-06,
661
+ "loss": 1.852,
662
+ "step": 388
663
+ },
664
+ {
665
+ "epoch": 7.13,
666
+ "learning_rate": 8.128070347473609e-06,
667
+ "loss": 1.6013,
668
+ "step": 392
669
+ },
670
+ {
671
+ "epoch": 7.2,
672
+ "learning_rate": 8.078307376628292e-06,
673
+ "loss": 1.587,
674
+ "step": 396
675
+ },
676
+ {
677
+ "epoch": 7.27,
678
+ "learning_rate": 8.028048435688333e-06,
679
+ "loss": 1.6268,
680
+ "step": 400
681
+ },
682
+ {
683
+ "epoch": 7.35,
684
+ "learning_rate": 7.97730162226344e-06,
685
+ "loss": 1.5518,
686
+ "step": 404
687
+ },
688
+ {
689
+ "epoch": 7.42,
690
+ "learning_rate": 7.92607511256826e-06,
691
+ "loss": 1.59,
692
+ "step": 408
693
+ },
694
+ {
695
+ "epoch": 7.49,
696
+ "learning_rate": 7.874377160105037e-06,
697
+ "loss": 1.6115,
698
+ "step": 412
699
+ },
700
+ {
701
+ "epoch": 7.56,
702
+ "learning_rate": 7.822216094333847e-06,
703
+ "loss": 1.5037,
704
+ "step": 416
705
+ },
706
+ {
707
+ "epoch": 7.64,
708
+ "learning_rate": 7.769600319330553e-06,
709
+ "loss": 1.6263,
710
+ "step": 420
711
+ },
712
+ {
713
+ "epoch": 7.71,
714
+ "learning_rate": 7.716538312432767e-06,
715
+ "loss": 1.6295,
716
+ "step": 424
717
+ },
718
+ {
719
+ "epoch": 7.78,
720
+ "learning_rate": 7.663038622873999e-06,
721
+ "loss": 1.6025,
722
+ "step": 428
723
+ },
724
+ {
725
+ "epoch": 7.85,
726
+ "learning_rate": 7.60910987040623e-06,
727
+ "loss": 1.6275,
728
+ "step": 432
729
+ },
730
+ {
731
+ "epoch": 7.93,
732
+ "learning_rate": 7.554760743911104e-06,
733
+ "loss": 1.6367,
734
+ "step": 436
735
+ },
736
+ {
737
+ "epoch": 8.0,
738
+ "learning_rate": 7.500000000000001e-06,
739
+ "loss": 1.7214,
740
+ "step": 440
741
+ },
742
+ {
743
+ "epoch": 8.0,
744
+ "gpt4_scores": 0.75,
745
+ "step": 440
746
+ },
747
+ {
748
+ "epoch": 8.0,
749
+ "eval_loss": 1.8379725217819214,
750
+ "eval_runtime": 4.9284,
751
+ "eval_samples_per_second": 4.667,
752
+ "eval_steps_per_second": 1.217,
753
+ "step": 440
754
+ },
755
+ {
756
+ "epoch": 8.07,
757
+ "learning_rate": 7.444836461603195e-06,
758
+ "loss": 1.5514,
759
+ "step": 444
760
+ },
761
+ {
762
+ "epoch": 8.15,
763
+ "learning_rate": 7.3892790165483164e-06,
764
+ "loss": 1.5264,
765
+ "step": 448
766
+ },
767
+ {
768
+ "epoch": 8.22,
769
+ "learning_rate": 7.333336616128369e-06,
770
+ "loss": 1.5404,
771
+ "step": 452
772
+ },
773
+ {
774
+ "epoch": 8.29,
775
+ "learning_rate": 7.2770182736595164e-06,
776
+ "loss": 1.5687,
777
+ "step": 456
778
+ },
779
+ {
780
+ "epoch": 8.36,
781
+ "learning_rate": 7.2203330630288714e-06,
782
+ "loss": 1.5514,
783
+ "step": 460
784
+ },
785
+ {
786
+ "epoch": 8.44,
787
+ "learning_rate": 7.163290117232542e-06,
788
+ "loss": 1.5833,
789
+ "step": 464
790
+ },
791
+ {
792
+ "epoch": 8.51,
793
+ "learning_rate": 7.105898626904134e-06,
794
+ "loss": 1.5976,
795
+ "step": 468
796
+ },
797
+ {
798
+ "epoch": 8.58,
799
+ "learning_rate": 7.048167838833977e-06,
800
+ "loss": 1.5742,
801
+ "step": 472
802
+ },
803
+ {
804
+ "epoch": 8.65,
805
+ "learning_rate": 6.990107054479313e-06,
806
+ "loss": 1.6667,
807
+ "step": 476
808
+ },
809
+ {
810
+ "epoch": 8.73,
811
+ "learning_rate": 6.931725628465643e-06,
812
+ "loss": 1.5979,
813
+ "step": 480
814
+ },
815
+ {
816
+ "epoch": 8.8,
817
+ "learning_rate": 6.873032967079562e-06,
818
+ "loss": 1.5281,
819
+ "step": 484
820
+ },
821
+ {
822
+ "epoch": 8.87,
823
+ "learning_rate": 6.814038526753205e-06,
824
+ "loss": 1.5594,
825
+ "step": 488
826
+ },
827
+ {
828
+ "epoch": 8.95,
829
+ "learning_rate": 6.75475181254068e-06,
830
+ "loss": 1.5245,
831
+ "step": 492
832
+ },
833
+ {
834
+ "epoch": 9.0,
835
+ "gpt4_scores": 0.7666666666666666,
836
+ "step": 495
837
+ },
838
+ {
839
+ "epoch": 9.0,
840
+ "eval_loss": 1.8495147228240967,
841
+ "eval_runtime": 4.9697,
842
+ "eval_samples_per_second": 4.628,
843
+ "eval_steps_per_second": 1.207,
844
+ "step": 495
845
+ },
846
+ {
847
+ "epoch": 9.02,
848
+ "learning_rate": 6.695182376586603e-06,
849
+ "loss": 1.4976,
850
+ "step": 496
851
+ },
852
+ {
853
+ "epoch": 9.09,
854
+ "learning_rate": 6.635339816587109e-06,
855
+ "loss": 1.5139,
856
+ "step": 500
857
+ },
858
+ {
859
+ "epoch": 9.16,
860
+ "learning_rate": 6.5752337742434644e-06,
861
+ "loss": 1.5678,
862
+ "step": 504
863
+ },
864
+ {
865
+ "epoch": 9.24,
866
+ "learning_rate": 6.514873933708637e-06,
867
+ "loss": 1.5552,
868
+ "step": 508
869
+ },
870
+ {
871
+ "epoch": 9.31,
872
+ "learning_rate": 6.454270020026996e-06,
873
+ "loss": 1.5398,
874
+ "step": 512
875
+ },
876
+ {
877
+ "epoch": 9.38,
878
+ "learning_rate": 6.39343179756744e-06,
879
+ "loss": 1.4932,
880
+ "step": 516
881
+ },
882
+ {
883
+ "epoch": 9.45,
884
+ "learning_rate": 6.332369068450175e-06,
885
+ "loss": 1.5297,
886
+ "step": 520
887
+ },
888
+ {
889
+ "epoch": 9.53,
890
+ "learning_rate": 6.271091670967437e-06,
891
+ "loss": 1.5346,
892
+ "step": 524
893
+ },
894
+ {
895
+ "epoch": 9.6,
896
+ "learning_rate": 6.209609477998339e-06,
897
+ "loss": 1.4501,
898
+ "step": 528
899
+ },
900
+ {
901
+ "epoch": 9.67,
902
+ "learning_rate": 6.1479323954182055e-06,
903
+ "loss": 1.48,
904
+ "step": 532
905
+ },
906
+ {
907
+ "epoch": 9.75,
908
+ "learning_rate": 6.08607036050254e-06,
909
+ "loss": 1.5156,
910
+ "step": 536
911
+ },
912
+ {
913
+ "epoch": 9.82,
914
+ "learning_rate": 6.024033340325954e-06,
915
+ "loss": 1.55,
916
+ "step": 540
917
+ },
918
+ {
919
+ "epoch": 9.89,
920
+ "learning_rate": 5.961831330156306e-06,
921
+ "loss": 1.4848,
922
+ "step": 544
923
+ },
924
+ {
925
+ "epoch": 9.96,
926
+ "learning_rate": 5.89947435184427e-06,
927
+ "loss": 1.5239,
928
+ "step": 548
929
+ },
930
+ {
931
+ "epoch": 10.0,
932
+ "gpt4_scores": 0.7333333333333334,
933
+ "step": 550
934
+ },
935
+ {
936
+ "epoch": 10.0,
937
+ "eval_loss": 1.8637609481811523,
938
+ "eval_runtime": 4.95,
939
+ "eval_samples_per_second": 4.646,
940
+ "eval_steps_per_second": 1.212,
941
+ "step": 550
942
+ },
943
+ {
944
+ "epoch": 10.04,
945
+ "learning_rate": 5.8369724522086545e-06,
946
+ "loss": 1.4492,
947
+ "step": 552
948
+ },
949
+ {
950
+ "epoch": 10.11,
951
+ "learning_rate": 5.774335701417662e-06,
952
+ "loss": 1.5163,
953
+ "step": 556
954
+ },
955
+ {
956
+ "epoch": 10.18,
957
+ "learning_rate": 5.711574191366427e-06,
958
+ "loss": 1.3988,
959
+ "step": 560
960
+ },
961
+ {
962
+ "epoch": 10.25,
963
+ "learning_rate": 5.648698034051009e-06,
964
+ "loss": 1.4406,
965
+ "step": 564
966
+ },
967
+ {
968
+ "epoch": 10.33,
969
+ "learning_rate": 5.585717359939192e-06,
970
+ "loss": 1.4894,
971
+ "step": 568
972
+ },
973
+ {
974
+ "epoch": 10.4,
975
+ "learning_rate": 5.522642316338268e-06,
976
+ "loss": 1.5209,
977
+ "step": 572
978
+ },
979
+ {
980
+ "epoch": 10.47,
981
+ "learning_rate": 5.459483065760138e-06,
982
+ "loss": 1.4599,
983
+ "step": 576
984
+ },
985
+ {
986
+ "epoch": 10.55,
987
+ "learning_rate": 5.396249784283943e-06,
988
+ "loss": 1.44,
989
+ "step": 580
990
+ },
991
+ {
992
+ "epoch": 10.62,
993
+ "learning_rate": 5.33295265991652e-06,
994
+ "loss": 1.4532,
995
+ "step": 584
996
+ },
997
+ {
998
+ "epoch": 10.69,
999
+ "learning_rate": 5.26960189095093e-06,
1000
+ "loss": 1.4861,
1001
+ "step": 588
1002
+ },
1003
+ {
1004
+ "epoch": 10.76,
1005
+ "learning_rate": 5.206207684323337e-06,
1006
+ "loss": 1.5065,
1007
+ "step": 592
1008
+ },
1009
+ {
1010
+ "epoch": 10.84,
1011
+ "learning_rate": 5.142780253968481e-06,
1012
+ "loss": 1.5798,
1013
+ "step": 596
1014
+ },
1015
+ {
1016
+ "epoch": 10.91,
1017
+ "learning_rate": 5.07932981917404e-06,
1018
+ "loss": 1.4649,
1019
+ "step": 600
1020
+ },
1021
+ {
1022
+ "epoch": 10.98,
1023
+ "learning_rate": 5.015866602934112e-06,
1024
+ "loss": 1.4286,
1025
+ "step": 604
1026
+ },
1027
+ {
1028
+ "epoch": 11.0,
1029
+ "gpt4_scores": 0.7333333333333334,
1030
+ "step": 605
1031
+ },
1032
+ {
1033
+ "epoch": 11.0,
1034
+ "eval_loss": 1.8770924806594849,
1035
+ "eval_runtime": 4.9833,
1036
+ "eval_samples_per_second": 4.615,
1037
+ "eval_steps_per_second": 1.204,
1038
+ "step": 605
1039
+ },
1040
+ {
1041
+ "epoch": 11.05,
1042
+ "learning_rate": 4.952400830302117e-06,
1043
+ "loss": 1.437,
1044
+ "step": 608
1045
+ },
1046
+ {
1047
+ "epoch": 11.13,
1048
+ "learning_rate": 4.888942726743353e-06,
1049
+ "loss": 1.4056,
1050
+ "step": 612
1051
+ },
1052
+ {
1053
+ "epoch": 11.2,
1054
+ "learning_rate": 4.825502516487497e-06,
1055
+ "loss": 1.5407,
1056
+ "step": 616
1057
+ },
1058
+ {
1059
+ "epoch": 11.27,
1060
+ "learning_rate": 4.762090420881289e-06,
1061
+ "loss": 1.3875,
1062
+ "step": 620
1063
+ },
1064
+ {
1065
+ "epoch": 11.35,
1066
+ "learning_rate": 4.6987166567417085e-06,
1067
+ "loss": 1.3708,
1068
+ "step": 624
1069
+ },
1070
+ {
1071
+ "epoch": 11.42,
1072
+ "learning_rate": 4.635391434709847e-06,
1073
+ "loss": 1.4348,
1074
+ "step": 628
1075
+ },
1076
+ {
1077
+ "epoch": 11.49,
1078
+ "learning_rate": 4.572124957605803e-06,
1079
+ "loss": 1.4,
1080
+ "step": 632
1081
+ },
1082
+ {
1083
+ "epoch": 11.56,
1084
+ "learning_rate": 4.5089274187848144e-06,
1085
+ "loss": 1.5172,
1086
+ "step": 636
1087
+ },
1088
+ {
1089
+ "epoch": 11.64,
1090
+ "learning_rate": 4.445809000494945e-06,
1091
+ "loss": 1.505,
1092
+ "step": 640
1093
+ },
1094
+ {
1095
+ "epoch": 11.71,
1096
+ "learning_rate": 4.382779872236527e-06,
1097
+ "loss": 1.3405,
1098
+ "step": 644
1099
+ },
1100
+ {
1101
+ "epoch": 11.78,
1102
+ "learning_rate": 4.319850189123681e-06,
1103
+ "loss": 1.4869,
1104
+ "step": 648
1105
+ },
1106
+ {
1107
+ "epoch": 11.85,
1108
+ "learning_rate": 4.257030090248142e-06,
1109
+ "loss": 1.4366,
1110
+ "step": 652
1111
+ },
1112
+ {
1113
+ "epoch": 11.93,
1114
+ "learning_rate": 4.194329697045681e-06,
1115
+ "loss": 1.3966,
1116
+ "step": 656
1117
+ },
1118
+ {
1119
+ "epoch": 12.0,
1120
+ "learning_rate": 4.131759111665349e-06,
1121
+ "loss": 1.3534,
1122
+ "step": 660
1123
+ },
1124
+ {
1125
+ "epoch": 12.0,
1126
+ "gpt4_scores": 0.7666666666666666,
1127
+ "step": 660
1128
+ },
1129
+ {
1130
+ "epoch": 12.0,
1131
+ "eval_loss": 1.9029945135116577,
1132
+ "eval_runtime": 4.9424,
1133
+ "eval_samples_per_second": 4.654,
1134
+ "eval_steps_per_second": 1.214,
1135
+ "step": 660
1136
+ },
1137
+ {
1138
+ "epoch": 12.07,
1139
+ "learning_rate": 4.06932841534185e-06,
1140
+ "loss": 1.294,
1141
+ "step": 664
1142
+ },
1143
+ {
1144
+ "epoch": 12.15,
1145
+ "learning_rate": 4.007047666771274e-06,
1146
+ "loss": 1.4497,
1147
+ "step": 668
1148
+ },
1149
+ {
1150
+ "epoch": 12.22,
1151
+ "learning_rate": 3.944926900490452e-06,
1152
+ "loss": 1.4023,
1153
+ "step": 672
1154
+ },
1155
+ {
1156
+ "epoch": 12.29,
1157
+ "learning_rate": 3.882976125260229e-06,
1158
+ "loss": 1.4619,
1159
+ "step": 676
1160
+ },
1161
+ {
1162
+ "epoch": 12.36,
1163
+ "learning_rate": 3.821205322452863e-06,
1164
+ "loss": 1.4371,
1165
+ "step": 680
1166
+ },
1167
+ {
1168
+ "epoch": 12.44,
1169
+ "learning_rate": 3.7596244444438577e-06,
1170
+ "loss": 1.4326,
1171
+ "step": 684
1172
+ },
1173
+ {
1174
+ "epoch": 12.51,
1175
+ "learning_rate": 3.69824341300844e-06,
1176
+ "loss": 1.3918,
1177
+ "step": 688
1178
+ },
1179
+ {
1180
+ "epoch": 12.58,
1181
+ "learning_rate": 3.637072117723012e-06,
1182
+ "loss": 1.3524,
1183
+ "step": 692
1184
+ },
1185
+ {
1186
+ "epoch": 12.65,
1187
+ "learning_rate": 3.5761204143717387e-06,
1188
+ "loss": 1.3517,
1189
+ "step": 696
1190
+ },
1191
+ {
1192
+ "epoch": 12.73,
1193
+ "learning_rate": 3.5153981233586277e-06,
1194
+ "loss": 1.3572,
1195
+ "step": 700
1196
+ },
1197
+ {
1198
+ "epoch": 12.8,
1199
+ "learning_rate": 3.4549150281252635e-06,
1200
+ "loss": 1.3718,
1201
+ "step": 704
1202
+ },
1203
+ {
1204
+ "epoch": 12.87,
1205
+ "learning_rate": 3.394680873574546e-06,
1206
+ "loss": 1.3663,
1207
+ "step": 708
1208
+ },
1209
+ {
1210
+ "epoch": 12.95,
1211
+ "learning_rate": 3.3347053645005965e-06,
1212
+ "loss": 1.386,
1213
+ "step": 712
1214
+ },
1215
+ {
1216
+ "epoch": 13.02,
1217
+ "learning_rate": 3.274998164025148e-06,
1218
+ "loss": 1.4448,
1219
+ "step": 716
1220
+ },
1221
+ {
1222
+ "epoch": 13.09,
1223
+ "learning_rate": 3.2155688920406415e-06,
1224
+ "loss": 1.438,
1225
+ "step": 720
1226
+ },
1227
+ {
1228
+ "epoch": 13.16,
1229
+ "learning_rate": 3.156427123660297e-06,
1230
+ "loss": 1.3065,
1231
+ "step": 724
1232
+ },
1233
+ {
1234
+ "epoch": 13.24,
1235
+ "learning_rate": 3.097582387675385e-06,
1236
+ "loss": 1.377,
1237
+ "step": 728
1238
+ },
1239
+ {
1240
+ "epoch": 13.31,
1241
+ "learning_rate": 3.0390441650199727e-06,
1242
+ "loss": 1.336,
1243
+ "step": 732
1244
+ },
1245
+ {
1246
+ "epoch": 13.38,
1247
+ "learning_rate": 2.980821887243377e-06,
1248
+ "loss": 1.4014,
1249
+ "step": 736
1250
+ },
1251
+ {
1252
+ "epoch": 13.45,
1253
+ "learning_rate": 2.9229249349905686e-06,
1254
+ "loss": 1.2758,
1255
+ "step": 740
1256
+ },
1257
+ {
1258
+ "epoch": 13.53,
1259
+ "learning_rate": 2.8653626364907918e-06,
1260
+ "loss": 1.3139,
1261
+ "step": 744
1262
+ },
1263
+ {
1264
+ "epoch": 13.6,
1265
+ "learning_rate": 2.8081442660546126e-06,
1266
+ "loss": 1.3392,
1267
+ "step": 748
1268
+ },
1269
+ {
1270
+ "epoch": 13.67,
1271
+ "learning_rate": 2.751279042579672e-06,
1272
+ "loss": 1.4052,
1273
+ "step": 752
1274
+ },
1275
+ {
1276
+ "epoch": 13.75,
1277
+ "learning_rate": 2.694776128065345e-06,
1278
+ "loss": 1.3885,
1279
+ "step": 756
1280
+ },
1281
+ {
1282
+ "epoch": 13.82,
1283
+ "learning_rate": 2.6386446261365874e-06,
1284
+ "loss": 1.2347,
1285
+ "step": 760
1286
+ },
1287
+ {
1288
+ "epoch": 13.89,
1289
+ "learning_rate": 2.5828935805771804e-06,
1290
+ "loss": 1.3792,
1291
+ "step": 764
1292
+ },
1293
+ {
1294
+ "epoch": 13.96,
1295
+ "learning_rate": 2.527531973872617e-06,
1296
+ "loss": 1.3895,
1297
+ "step": 768
1298
+ },
1299
+ {
1300
+ "epoch": 14.0,
1301
+ "gpt4_scores": 0.7999999999999999,
1302
+ "step": 770
1303
+ },
1304
+ {
1305
+ "epoch": 14.0,
1306
+ "eval_loss": 1.9447271823883057,
1307
+ "eval_runtime": 4.942,
1308
+ "eval_samples_per_second": 4.654,
1309
+ "eval_steps_per_second": 1.214,
1310
+ "step": 770
1311
+ },
1312
+ {
1313
+ "epoch": 14.04,
1314
+ "learning_rate": 2.4725687257628533e-06,
1315
+ "loss": 1.5377,
1316
+ "step": 772
1317
+ },
1318
+ {
1319
+ "epoch": 14.11,
1320
+ "learning_rate": 2.418012691805191e-06,
1321
+ "loss": 1.3932,
1322
+ "step": 776
1323
+ },
1324
+ {
1325
+ "epoch": 14.18,
1326
+ "learning_rate": 2.363872661947488e-06,
1327
+ "loss": 1.3257,
1328
+ "step": 780
1329
+ },
1330
+ {
1331
+ "epoch": 14.25,
1332
+ "learning_rate": 2.310157359111938e-06,
1333
+ "loss": 1.3721,
1334
+ "step": 784
1335
+ },
1336
+ {
1337
+ "epoch": 14.33,
1338
+ "learning_rate": 2.2568754377896516e-06,
1339
+ "loss": 1.3564,
1340
+ "step": 788
1341
+ },
1342
+ {
1343
+ "epoch": 14.4,
1344
+ "learning_rate": 2.204035482646267e-06,
1345
+ "loss": 1.2608,
1346
+ "step": 792
1347
+ },
1348
+ {
1349
+ "epoch": 14.47,
1350
+ "learning_rate": 2.1516460071388062e-06,
1351
+ "loss": 1.3862,
1352
+ "step": 796
1353
+ },
1354
+ {
1355
+ "epoch": 14.55,
1356
+ "learning_rate": 2.09971545214401e-06,
1357
+ "loss": 1.2653,
1358
+ "step": 800
1359
+ },
1360
+ {
1361
+ "epoch": 14.62,
1362
+ "learning_rate": 2.0482521845983522e-06,
1363
+ "loss": 1.3125,
1364
+ "step": 804
1365
+ },
1366
+ {
1367
+ "epoch": 14.69,
1368
+ "learning_rate": 1.9972644961499853e-06,
1369
+ "loss": 1.31,
1370
+ "step": 808
1371
+ },
1372
+ {
1373
+ "epoch": 14.76,
1374
+ "learning_rate": 1.946760601822809e-06,
1375
+ "loss": 1.3196,
1376
+ "step": 812
1377
+ },
1378
+ {
1379
+ "epoch": 14.84,
1380
+ "learning_rate": 1.8967486386928819e-06,
1381
+ "loss": 1.3496,
1382
+ "step": 816
1383
+ },
1384
+ {
1385
+ "epoch": 14.91,
1386
+ "learning_rate": 1.8472366645773892e-06,
1387
+ "loss": 1.3419,
1388
+ "step": 820
1389
+ },
1390
+ {
1391
+ "epoch": 14.98,
1392
+ "learning_rate": 1.798232656736389e-06,
1393
+ "loss": 1.3721,
1394
+ "step": 824
1395
+ },
1396
+ {
1397
+ "epoch": 15.0,
1398
+ "gpt4_scores": 0.7999999999999999,
1399
+ "step": 825
1400
+ },
1401
+ {
1402
+ "epoch": 15.0,
1403
+ "eval_loss": 1.9617422819137573,
1404
+ "eval_runtime": 4.9528,
1405
+ "eval_samples_per_second": 4.644,
1406
+ "eval_steps_per_second": 1.211,
1407
+ "step": 825
1408
+ },
1409
+ {
1410
+ "epoch": 15.05,
1411
+ "learning_rate": 1.7497445105875377e-06,
1412
+ "loss": 1.4496,
1413
+ "step": 828
1414
+ },
1415
+ {
1416
+ "epoch": 15.13,
1417
+ "learning_rate": 1.7017800384339928e-06,
1418
+ "loss": 1.3551,
1419
+ "step": 832
1420
+ },
1421
+ {
1422
+ "epoch": 15.2,
1423
+ "learning_rate": 1.6543469682057105e-06,
1424
+ "loss": 1.3492,
1425
+ "step": 836
1426
+ },
1427
+ {
1428
+ "epoch": 15.27,
1429
+ "learning_rate": 1.6074529422143398e-06,
1430
+ "loss": 1.2737,
1431
+ "step": 840
1432
+ },
1433
+ {
1434
+ "epoch": 15.35,
1435
+ "learning_rate": 1.561105515921915e-06,
1436
+ "loss": 1.2914,
1437
+ "step": 844
1438
+ },
1439
+ {
1440
+ "epoch": 15.42,
1441
+ "learning_rate": 1.5153121567235334e-06,
1442
+ "loss": 1.2787,
1443
+ "step": 848
1444
+ },
1445
+ {
1446
+ "epoch": 15.49,
1447
+ "learning_rate": 1.470080242744218e-06,
1448
+ "loss": 1.3057,
1449
+ "step": 852
1450
+ },
1451
+ {
1452
+ "epoch": 15.56,
1453
+ "learning_rate": 1.4254170616501828e-06,
1454
+ "loss": 1.3169,
1455
+ "step": 856
1456
+ },
1457
+ {
1458
+ "epoch": 15.64,
1459
+ "learning_rate": 1.3813298094746491e-06,
1460
+ "loss": 1.3178,
1461
+ "step": 860
1462
+ },
1463
+ {
1464
+ "epoch": 15.71,
1465
+ "learning_rate": 1.3378255894584463e-06,
1466
+ "loss": 1.2875,
1467
+ "step": 864
1468
+ },
1469
+ {
1470
+ "epoch": 15.78,
1471
+ "learning_rate": 1.2949114109055417e-06,
1472
+ "loss": 1.3542,
1473
+ "step": 868
1474
+ },
1475
+ {
1476
+ "epoch": 15.85,
1477
+ "learning_rate": 1.2525941880537307e-06,
1478
+ "loss": 1.3405,
1479
+ "step": 872
1480
+ },
1481
+ {
1482
+ "epoch": 15.93,
1483
+ "learning_rate": 1.210880738960616e-06,
1484
+ "loss": 1.2646,
1485
+ "step": 876
1486
+ },
1487
+ {
1488
+ "epoch": 16.0,
1489
+ "learning_rate": 1.1697777844051105e-06,
1490
+ "loss": 1.3598,
1491
+ "step": 880
1492
+ },
1493
+ {
1494
+ "epoch": 16.0,
1495
+ "gpt4_scores": 0.6166666666666667,
1496
+ "step": 880
1497
+ },
1498
+ {
1499
+ "epoch": 16.0,
1500
+ "eval_loss": 1.971917986869812,
1501
+ "eval_runtime": 4.9531,
1502
+ "eval_samples_per_second": 4.644,
1503
+ "eval_steps_per_second": 1.211,
1504
+ "step": 880
1505
+ },
1506
+ {
1507
+ "epoch": 16.07,
1508
+ "learning_rate": 1.1292919468045876e-06,
1509
+ "loss": 1.2695,
1510
+ "step": 884
1511
+ },
1512
+ {
1513
+ "epoch": 16.15,
1514
+ "learning_rate": 1.0894297491479044e-06,
1515
+ "loss": 1.3693,
1516
+ "step": 888
1517
+ },
1518
+ {
1519
+ "epoch": 16.22,
1520
+ "learning_rate": 1.0501976139444191e-06,
1521
+ "loss": 1.2719,
1522
+ "step": 892
1523
+ },
1524
+ {
1525
+ "epoch": 16.29,
1526
+ "learning_rate": 1.0116018621892237e-06,
1527
+ "loss": 1.3008,
1528
+ "step": 896
1529
+ },
1530
+ {
1531
+ "epoch": 16.36,
1532
+ "learning_rate": 9.73648712344707e-07,
1533
+ "loss": 1.344,
1534
+ "step": 900
1535
+ },
1536
+ {
1537
+ "epoch": 16.44,
1538
+ "learning_rate": 9.363442793386606e-07,
1539
+ "loss": 1.2761,
1540
+ "step": 904
1541
+ },
1542
+ {
1543
+ "epoch": 16.51,
1544
+ "learning_rate": 8.996945735790447e-07,
1545
+ "loss": 1.3216,
1546
+ "step": 908
1547
+ },
1548
+ {
1549
+ "epoch": 16.58,
1550
+ "learning_rate": 8.637054999856148e-07,
1551
+ "loss": 1.3374,
1552
+ "step": 912
1553
+ },
1554
+ {
1555
+ "epoch": 16.65,
1556
+ "learning_rate": 8.283828570385239e-07,
1557
+ "loss": 1.269,
1558
+ "step": 916
1559
+ },
1560
+ {
1561
+ "epoch": 16.73,
1562
+ "learning_rate": 7.937323358440935e-07,
1563
+ "loss": 1.2961,
1564
+ "step": 920
1565
+ },
1566
+ {
1567
+ "epoch": 16.8,
1568
+ "learning_rate": 7.597595192178702e-07,
1569
+ "loss": 1.277,
1570
+ "step": 924
1571
+ },
1572
+ {
1573
+ "epoch": 16.87,
1574
+ "learning_rate": 7.264698807851328e-07,
1575
+ "loss": 1.3616,
1576
+ "step": 928
1577
+ },
1578
+ {
1579
+ "epoch": 16.95,
1580
+ "learning_rate": 6.938687840989972e-07,
1581
+ "loss": 1.3015,
1582
+ "step": 932
1583
+ },
1584
+ {
1585
+ "epoch": 17.0,
1586
+ "gpt4_scores": 0.5666666666666667,
1587
+ "step": 935
1588
+ },
1589
+ {
1590
+ "epoch": 17.0,
1591
+ "eval_loss": 1.9795942306518555,
1592
+ "eval_runtime": 4.9559,
1593
+ "eval_samples_per_second": 4.641,
1594
+ "eval_steps_per_second": 1.211,
1595
+ "step": 935
1596
+ },
1597
+ {
1598
+ "epoch": 17.02,
1599
+ "learning_rate": 6.619614817762537e-07,
1600
+ "loss": 1.2956,
1601
+ "step": 936
1602
+ },
1603
+ {
1604
+ "epoch": 17.09,
1605
+ "learning_rate": 6.307531146510754e-07,
1606
+ "loss": 1.2213,
1607
+ "step": 940
1608
+ },
1609
+ {
1610
+ "epoch": 17.16,
1611
+ "learning_rate": 6.002487109467347e-07,
1612
+ "loss": 1.292,
1613
+ "step": 944
1614
+ },
1615
+ {
1616
+ "epoch": 17.24,
1617
+ "learning_rate": 5.704531854654721e-07,
1618
+ "loss": 1.3196,
1619
+ "step": 948
1620
+ },
1621
+ {
1622
+ "epoch": 17.31,
1623
+ "learning_rate": 5.413713387966329e-07,
1624
+ "loss": 1.3676,
1625
+ "step": 952
1626
+ },
1627
+ {
1628
+ "epoch": 17.38,
1629
+ "learning_rate": 5.130078565432089e-07,
1630
+ "loss": 1.3627,
1631
+ "step": 956
1632
+ },
1633
+ {
1634
+ "epoch": 17.45,
1635
+ "learning_rate": 4.853673085668947e-07,
1636
+ "loss": 1.2544,
1637
+ "step": 960
1638
+ },
1639
+ {
1640
+ "epoch": 17.53,
1641
+ "learning_rate": 4.58454148251814e-07,
1642
+ "loss": 1.3126,
1643
+ "step": 964
1644
+ },
1645
+ {
1646
+ "epoch": 17.6,
1647
+ "learning_rate": 4.322727117869951e-07,
1648
+ "loss": 1.3056,
1649
+ "step": 968
1650
+ },
1651
+ {
1652
+ "epoch": 17.67,
1653
+ "learning_rate": 4.0682721746773346e-07,
1654
+ "loss": 1.2876,
1655
+ "step": 972
1656
+ },
1657
+ {
1658
+ "epoch": 17.75,
1659
+ "learning_rate": 3.821217650159453e-07,
1660
+ "loss": 1.3168,
1661
+ "step": 976
1662
+ },
1663
+ {
1664
+ "epoch": 17.82,
1665
+ "learning_rate": 3.581603349196372e-07,
1666
+ "loss": 1.2738,
1667
+ "step": 980
1668
+ },
1669
+ {
1670
+ "epoch": 17.89,
1671
+ "learning_rate": 3.3494678779157464e-07,
1672
+ "loss": 1.2393,
1673
+ "step": 984
1674
+ },
1675
+ {
1676
+ "epoch": 17.96,
1677
+ "learning_rate": 3.1248486374726884e-07,
1678
+ "loss": 1.3456,
1679
+ "step": 988
1680
+ },
1681
+ {
1682
+ "epoch": 18.0,
1683
+ "gpt4_scores": 0.45,
1684
+ "step": 990
1685
+ },
1686
+ {
1687
+ "epoch": 18.0,
1688
+ "eval_loss": 1.983115315437317,
1689
+ "eval_runtime": 4.9224,
1690
+ "eval_samples_per_second": 4.673,
1691
+ "eval_steps_per_second": 1.219,
1692
+ "step": 990
1693
+ },
1694
+ {
1695
+ "epoch": 18.04,
1696
+ "learning_rate": 2.9077818180237693e-07,
1697
+ "loss": 1.2949,
1698
+ "step": 992
1699
+ },
1700
+ {
1701
+ "epoch": 18.11,
1702
+ "learning_rate": 2.6983023928961406e-07,
1703
+ "loss": 1.3541,
1704
+ "step": 996
1705
+ },
1706
+ {
1707
+ "epoch": 18.18,
1708
+ "learning_rate": 2.4964441129527337e-07,
1709
+ "loss": 1.293,
1710
+ "step": 1000
1711
+ },
1712
+ {
1713
+ "epoch": 18.25,
1714
+ "learning_rate": 2.3022395011543687e-07,
1715
+ "loss": 1.2917,
1716
+ "step": 1004
1717
+ },
1718
+ {
1719
+ "epoch": 18.33,
1720
+ "learning_rate": 2.1157198473197417e-07,
1721
+ "loss": 1.318,
1722
+ "step": 1008
1723
+ },
1724
+ {
1725
+ "epoch": 18.4,
1726
+ "learning_rate": 1.9369152030840553e-07,
1727
+ "loss": 1.2806,
1728
+ "step": 1012
1729
+ },
1730
+ {
1731
+ "epoch": 18.47,
1732
+ "learning_rate": 1.765854377057219e-07,
1733
+ "loss": 1.2441,
1734
+ "step": 1016
1735
+ },
1736
+ {
1737
+ "epoch": 18.55,
1738
+ "learning_rate": 1.6025649301821877e-07,
1739
+ "loss": 1.3167,
1740
+ "step": 1020
1741
+ },
1742
+ {
1743
+ "epoch": 18.62,
1744
+ "learning_rate": 1.4470731712944885e-07,
1745
+ "loss": 1.3758,
1746
+ "step": 1024
1747
+ },
1748
+ {
1749
+ "epoch": 18.69,
1750
+ "learning_rate": 1.2994041528833267e-07,
1751
+ "loss": 1.2589,
1752
+ "step": 1028
1753
+ },
1754
+ {
1755
+ "epoch": 18.76,
1756
+ "learning_rate": 1.1595816670552429e-07,
1757
+ "loss": 1.3814,
1758
+ "step": 1032
1759
+ },
1760
+ {
1761
+ "epoch": 18.84,
1762
+ "learning_rate": 1.0276282417007399e-07,
1763
+ "loss": 1.254,
1764
+ "step": 1036
1765
+ },
1766
+ {
1767
+ "epoch": 18.91,
1768
+ "learning_rate": 9.035651368646647e-08,
1769
+ "loss": 1.2075,
1770
+ "step": 1040
1771
+ },
1772
+ {
1773
+ "epoch": 18.98,
1774
+ "learning_rate": 7.874123413208145e-08,
1775
+ "loss": 1.2136,
1776
+ "step": 1044
1777
+ },
1778
+ {
1779
+ "epoch": 19.0,
1780
+ "gpt4_scores": 0.55,
1781
+ "step": 1045
1782
+ },
1783
+ {
1784
+ "epoch": 19.0,
1785
+ "eval_loss": 1.9847663640975952,
1786
+ "eval_runtime": 4.9193,
1787
+ "eval_samples_per_second": 4.676,
1788
+ "eval_steps_per_second": 1.22,
1789
+ "step": 1045
1790
+ },
1791
+ {
1792
+ "epoch": 19.05,
1793
+ "learning_rate": 6.791885693514134e-08,
1794
+ "loss": 1.2188,
1795
+ "step": 1048
1796
+ },
1797
+ {
1798
+ "epoch": 19.13,
1799
+ "learning_rate": 5.7891125773187896e-08,
1800
+ "loss": 1.3258,
1801
+ "step": 1052
1802
+ },
1803
+ {
1804
+ "epoch": 19.2,
1805
+ "learning_rate": 4.865965629214819e-08,
1806
+ "loss": 1.3501,
1807
+ "step": 1056
1808
+ },
1809
+ {
1810
+ "epoch": 19.27,
1811
+ "learning_rate": 4.02259358460233e-08,
1812
+ "loss": 1.2407,
1813
+ "step": 1060
1814
+ },
1815
+ {
1816
+ "epoch": 19.35,
1817
+ "learning_rate": 3.25913232572489e-08,
1818
+ "loss": 1.2737,
1819
+ "step": 1064
1820
+ },
1821
+ {
1822
+ "epoch": 19.42,
1823
+ "learning_rate": 2.57570485977654e-08,
1824
+ "loss": 1.2886,
1825
+ "step": 1068
1826
+ },
1827
+ {
1828
+ "epoch": 19.49,
1829
+ "learning_rate": 1.9724212990830938e-08,
1830
+ "loss": 1.331,
1831
+ "step": 1072
1832
+ },
1833
+ {
1834
+ "epoch": 19.56,
1835
+ "learning_rate": 1.449378843361271e-08,
1836
+ "loss": 1.2453,
1837
+ "step": 1076
1838
+ },
1839
+ {
1840
+ "epoch": 19.64,
1841
+ "learning_rate": 1.006661764057837e-08,
1842
+ "loss": 1.2749,
1843
+ "step": 1080
1844
+ },
1845
+ {
1846
+ "epoch": 19.71,
1847
+ "learning_rate": 6.4434139077201865e-09,
1848
+ "loss": 1.2996,
1849
+ "step": 1084
1850
+ },
1851
+ {
1852
+ "epoch": 19.78,
1853
+ "learning_rate": 3.6247609976319818e-09,
1854
+ "loss": 1.2314,
1855
+ "step": 1088
1856
+ },
1857
+ {
1858
+ "epoch": 19.85,
1859
+ "learning_rate": 1.61111304545436e-09,
1860
+ "loss": 1.3066,
1861
+ "step": 1092
1862
+ },
1863
+ {
1864
+ "epoch": 19.93,
1865
+ "learning_rate": 4.027944857032395e-10,
1866
+ "loss": 1.3807,
1867
+ "step": 1096
1868
+ },
1869
+ {
1870
+ "epoch": 20.0,
1871
+ "learning_rate": 0.0,
1872
+ "loss": 1.302,
1873
+ "step": 1100
1874
+ },
1875
+ {
1876
+ "epoch": 20.0,
1877
+ "gpt4_scores": 0.5666666666666667,
1878
+ "step": 1100
1879
+ },
1880
+ {
1881
+ "epoch": 20.0,
1882
+ "eval_loss": 1.9851655960083008,
1883
+ "eval_runtime": 4.9615,
1884
+ "eval_samples_per_second": 4.636,
1885
+ "eval_steps_per_second": 1.209,
1886
+ "step": 1100
1887
+ },
1888
+ {
1889
+ "epoch": 20.0,
1890
+ "step": 1100,
1891
+ "total_flos": 3.76665795378217e+16,
1892
+ "train_loss": 0.4608629340475256,
1893
+ "train_runtime": 2621.4029,
1894
+ "train_samples_per_second": 1.656,
1895
+ "train_steps_per_second": 0.42
1896
+ }
1897
+ ],
1898
+ "logging_steps": 4,
1899
+ "max_steps": 1100,
1900
+ "num_input_tokens_seen": 0,
1901
+ "num_train_epochs": 20,
1902
+ "save_steps": 55,
1903
+ "total_flos": 3.76665795378217e+16,
1904
+ "train_batch_size": 4,
1905
+ "trial_name": null,
1906
+ "trial_params": null
1907
+ }