BrownEnergy commited on
Commit
6e42e02
·
verified ·
1 Parent(s): 9a38b9a

Upload folder using huggingface_hub

Browse files
Files changed (8) hide show
  1. README.md +95 -0
  2. metadata.json +8 -0
  3. optimizer.pt +3 -0
  4. pytorch_model.bin +3 -0
  5. rng_state.pth +3 -0
  6. scheduler.pt +3 -0
  7. trainer_state.json +1117 -0
  8. training_args.bin +3 -0
README.md ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: google/vit-base-patch16-224
4
+ tags:
5
+ - Image Regression
6
+ datasets:
7
+ - "BrownEnergy/secchi_depth"
8
+ metrics:
9
+ - accuracy
10
+ model-index:
11
+ - name: "sd_depth_regression_v2"
12
+ results: []
13
+ ---
14
+
15
+ # sd_depth_regression_v2
16
+ ## Image Regression Model
17
+
18
+ This model was trained with [Image Regression Model Trainer](https://github.com/TonyAssi/ImageRegression/tree/main). It takes an image as input and outputs a float value.
19
+
20
+ ```python
21
+ from ImageRegression import predict
22
+ predict(repo_id='BrownEnergy/sd_depth_regression_v2',image_path='image.jpg')
23
+ ```
24
+
25
+ ---
26
+
27
+ ## Dataset
28
+ Dataset: BrownEnergy/secchi_depth\
29
+ Value Column: 'sd_depth'\
30
+ Train Test Split: 0.05
31
+
32
+ ---
33
+
34
+ ## Training
35
+ Base Model: [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224)\
36
+ Epochs: 10\
37
+ Learning Rate: 0.0001
38
+
39
+ ---
40
+
41
+ ## Usage
42
+
43
+ ### Download
44
+ ```bash
45
+ git clone https://github.com/TonyAssi/ImageRegression.git
46
+ cd ImageRegression
47
+ ```
48
+
49
+ ### Installation
50
+ ```bash
51
+ pip install -r requirements.txt
52
+ ```
53
+
54
+ ### Import
55
+ ```python
56
+ from ImageRegression import train_model, upload_model, predict
57
+ ```
58
+
59
+ ### Inference (Prediction)
60
+ - **repo_id** 🤗 repo id of the model
61
+ - **image_path** path to image
62
+ ```python
63
+ predict(repo_id='BrownEnergy/sd_depth_regression_v2',
64
+ image_path='image.jpg')
65
+ ```
66
+ The first time this function is called it'll download the safetensor model. Subsequent function calls will run faster.
67
+
68
+ ### Train Model
69
+ - **dataset_id** 🤗 dataset id
70
+ - **value_column_name** column name of prediction values in dataset
71
+ - **test_split** test split of the train/test split
72
+ - **output_dir** the directory where the checkpoints will be saved
73
+ - **num_train_epochs** training epochs
74
+ - **learning_rate** learning rate
75
+ ```python
76
+ train_model(dataset_id='BrownEnergy/secchi_depth',
77
+ value_column_name='sd_depth',
78
+ test_split=0.05,
79
+ output_dir='./results',
80
+ num_train_epochs=10,
81
+ learning_rate=0.0001)
82
+
83
+ ```
84
+ The trainer will save the checkpoints in the output_dir location. The model.safetensors are the trained weights you'll use for inference (predicton).
85
+
86
+ ### Upload Model
87
+ This function will upload your model to the 🤗 Hub.
88
+ - **model_id** the name of the model id
89
+ - **token** go [here](https://huggingface.co/settings/tokens) to create a new 🤗 token
90
+ - **checkpoint_dir** checkpoint folder that will be uploaded
91
+ ```python
92
+ upload_model(model_id='sd_depth_regression_v2',
93
+ token='YOUR_HF_TOKEN',
94
+ checkpoint_dir='./results/checkpoint-940')
95
+ ```
metadata.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dataset_id": "BrownEnergy/secchi_depth",
3
+ "value_column_name": "sd_depth",
4
+ "test_split": 0.05,
5
+ "num_train_epochs": 10,
6
+ "learning_rate": 0.0001,
7
+ "max_value": 77.0
8
+ }
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:695718a3ab09a6a02652f92086ff3230fce36027dc3302e18a885c42dd6fa230
3
+ size 686507205
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4d63672d26affd4e2c44c4ff17d3bf920e43f1f7f8337672d06e9c8ff9d25fa1
3
+ size 345639733
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e69db2ebd3dbe75c8467b788e5787cf93796fa78adc7a0c39fc13316b0348a38
3
+ size 13553
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f1f889d3cf32396afaa82d48432f8d317cf7af7fa13660b25c16d00534377fc9
3
+ size 627
trainer_state.json ADDED
@@ -0,0 +1,1117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 10.0,
5
+ "global_step": 1700,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.06,
12
+ "learning_rate": 9.941176470588236e-05,
13
+ "loss": 0.1482,
14
+ "step": 10
15
+ },
16
+ {
17
+ "epoch": 0.12,
18
+ "learning_rate": 9.882352941176471e-05,
19
+ "loss": 0.0992,
20
+ "step": 20
21
+ },
22
+ {
23
+ "epoch": 0.18,
24
+ "learning_rate": 9.823529411764706e-05,
25
+ "loss": 0.0313,
26
+ "step": 30
27
+ },
28
+ {
29
+ "epoch": 0.24,
30
+ "learning_rate": 9.764705882352942e-05,
31
+ "loss": 0.0365,
32
+ "step": 40
33
+ },
34
+ {
35
+ "epoch": 0.29,
36
+ "learning_rate": 9.705882352941177e-05,
37
+ "loss": 0.0326,
38
+ "step": 50
39
+ },
40
+ {
41
+ "epoch": 0.35,
42
+ "learning_rate": 9.647058823529412e-05,
43
+ "loss": 0.0122,
44
+ "step": 60
45
+ },
46
+ {
47
+ "epoch": 0.41,
48
+ "learning_rate": 9.588235294117648e-05,
49
+ "loss": 0.0045,
50
+ "step": 70
51
+ },
52
+ {
53
+ "epoch": 0.47,
54
+ "learning_rate": 9.529411764705883e-05,
55
+ "loss": 0.0044,
56
+ "step": 80
57
+ },
58
+ {
59
+ "epoch": 0.53,
60
+ "learning_rate": 9.470588235294118e-05,
61
+ "loss": 0.0016,
62
+ "step": 90
63
+ },
64
+ {
65
+ "epoch": 0.59,
66
+ "learning_rate": 9.411764705882353e-05,
67
+ "loss": 0.0036,
68
+ "step": 100
69
+ },
70
+ {
71
+ "epoch": 0.65,
72
+ "learning_rate": 9.352941176470589e-05,
73
+ "loss": 0.0158,
74
+ "step": 110
75
+ },
76
+ {
77
+ "epoch": 0.71,
78
+ "learning_rate": 9.294117647058824e-05,
79
+ "loss": 0.0019,
80
+ "step": 120
81
+ },
82
+ {
83
+ "epoch": 0.76,
84
+ "learning_rate": 9.23529411764706e-05,
85
+ "loss": 0.0132,
86
+ "step": 130
87
+ },
88
+ {
89
+ "epoch": 0.82,
90
+ "learning_rate": 9.176470588235295e-05,
91
+ "loss": 0.0025,
92
+ "step": 140
93
+ },
94
+ {
95
+ "epoch": 0.88,
96
+ "learning_rate": 9.11764705882353e-05,
97
+ "loss": 0.0049,
98
+ "step": 150
99
+ },
100
+ {
101
+ "epoch": 0.94,
102
+ "learning_rate": 9.058823529411765e-05,
103
+ "loss": 0.0181,
104
+ "step": 160
105
+ },
106
+ {
107
+ "epoch": 1.0,
108
+ "learning_rate": 9e-05,
109
+ "loss": 0.0153,
110
+ "step": 170
111
+ },
112
+ {
113
+ "epoch": 1.0,
114
+ "eval_loss": 0.005996049847453833,
115
+ "eval_mse": 0.005996049847453833,
116
+ "eval_runtime": 15.3139,
117
+ "eval_samples_per_second": 4.702,
118
+ "eval_steps_per_second": 0.588,
119
+ "step": 170
120
+ },
121
+ {
122
+ "epoch": 1.06,
123
+ "learning_rate": 8.941176470588236e-05,
124
+ "loss": 0.0079,
125
+ "step": 180
126
+ },
127
+ {
128
+ "epoch": 1.12,
129
+ "learning_rate": 8.882352941176471e-05,
130
+ "loss": 0.0238,
131
+ "step": 190
132
+ },
133
+ {
134
+ "epoch": 1.18,
135
+ "learning_rate": 8.823529411764706e-05,
136
+ "loss": 0.0274,
137
+ "step": 200
138
+ },
139
+ {
140
+ "epoch": 1.24,
141
+ "learning_rate": 8.764705882352942e-05,
142
+ "loss": 0.0078,
143
+ "step": 210
144
+ },
145
+ {
146
+ "epoch": 1.29,
147
+ "learning_rate": 8.705882352941177e-05,
148
+ "loss": 0.0144,
149
+ "step": 220
150
+ },
151
+ {
152
+ "epoch": 1.35,
153
+ "learning_rate": 8.647058823529412e-05,
154
+ "loss": 0.0046,
155
+ "step": 230
156
+ },
157
+ {
158
+ "epoch": 1.41,
159
+ "learning_rate": 8.588235294117646e-05,
160
+ "loss": 0.0078,
161
+ "step": 240
162
+ },
163
+ {
164
+ "epoch": 1.47,
165
+ "learning_rate": 8.529411764705883e-05,
166
+ "loss": 0.0012,
167
+ "step": 250
168
+ },
169
+ {
170
+ "epoch": 1.53,
171
+ "learning_rate": 8.470588235294118e-05,
172
+ "loss": 0.0124,
173
+ "step": 260
174
+ },
175
+ {
176
+ "epoch": 1.59,
177
+ "learning_rate": 8.411764705882354e-05,
178
+ "loss": 0.0105,
179
+ "step": 270
180
+ },
181
+ {
182
+ "epoch": 1.65,
183
+ "learning_rate": 8.352941176470589e-05,
184
+ "loss": 0.0089,
185
+ "step": 280
186
+ },
187
+ {
188
+ "epoch": 1.71,
189
+ "learning_rate": 8.294117647058824e-05,
190
+ "loss": 0.0017,
191
+ "step": 290
192
+ },
193
+ {
194
+ "epoch": 1.76,
195
+ "learning_rate": 8.23529411764706e-05,
196
+ "loss": 0.0095,
197
+ "step": 300
198
+ },
199
+ {
200
+ "epoch": 1.82,
201
+ "learning_rate": 8.176470588235295e-05,
202
+ "loss": 0.0016,
203
+ "step": 310
204
+ },
205
+ {
206
+ "epoch": 1.88,
207
+ "learning_rate": 8.11764705882353e-05,
208
+ "loss": 0.0128,
209
+ "step": 320
210
+ },
211
+ {
212
+ "epoch": 1.94,
213
+ "learning_rate": 8.058823529411765e-05,
214
+ "loss": 0.008,
215
+ "step": 330
216
+ },
217
+ {
218
+ "epoch": 2.0,
219
+ "learning_rate": 8e-05,
220
+ "loss": 0.001,
221
+ "step": 340
222
+ },
223
+ {
224
+ "epoch": 2.0,
225
+ "eval_loss": 0.0011689820094034076,
226
+ "eval_mse": 0.0011689820094034076,
227
+ "eval_runtime": 15.7171,
228
+ "eval_samples_per_second": 4.581,
229
+ "eval_steps_per_second": 0.573,
230
+ "step": 340
231
+ },
232
+ {
233
+ "epoch": 2.06,
234
+ "learning_rate": 7.941176470588235e-05,
235
+ "loss": 0.0005,
236
+ "step": 350
237
+ },
238
+ {
239
+ "epoch": 2.12,
240
+ "learning_rate": 7.882352941176471e-05,
241
+ "loss": 0.0007,
242
+ "step": 360
243
+ },
244
+ {
245
+ "epoch": 2.18,
246
+ "learning_rate": 7.823529411764707e-05,
247
+ "loss": 0.0183,
248
+ "step": 370
249
+ },
250
+ {
251
+ "epoch": 2.24,
252
+ "learning_rate": 7.764705882352942e-05,
253
+ "loss": 0.0021,
254
+ "step": 380
255
+ },
256
+ {
257
+ "epoch": 2.29,
258
+ "learning_rate": 7.705882352941177e-05,
259
+ "loss": 0.008,
260
+ "step": 390
261
+ },
262
+ {
263
+ "epoch": 2.35,
264
+ "learning_rate": 7.647058823529411e-05,
265
+ "loss": 0.0047,
266
+ "step": 400
267
+ },
268
+ {
269
+ "epoch": 2.41,
270
+ "learning_rate": 7.588235294117648e-05,
271
+ "loss": 0.011,
272
+ "step": 410
273
+ },
274
+ {
275
+ "epoch": 2.47,
276
+ "learning_rate": 7.529411764705883e-05,
277
+ "loss": 0.0024,
278
+ "step": 420
279
+ },
280
+ {
281
+ "epoch": 2.53,
282
+ "learning_rate": 7.470588235294118e-05,
283
+ "loss": 0.0019,
284
+ "step": 430
285
+ },
286
+ {
287
+ "epoch": 2.59,
288
+ "learning_rate": 7.411764705882354e-05,
289
+ "loss": 0.0074,
290
+ "step": 440
291
+ },
292
+ {
293
+ "epoch": 2.65,
294
+ "learning_rate": 7.352941176470589e-05,
295
+ "loss": 0.0017,
296
+ "step": 450
297
+ },
298
+ {
299
+ "epoch": 2.71,
300
+ "learning_rate": 7.294117647058823e-05,
301
+ "loss": 0.0081,
302
+ "step": 460
303
+ },
304
+ {
305
+ "epoch": 2.76,
306
+ "learning_rate": 7.23529411764706e-05,
307
+ "loss": 0.0195,
308
+ "step": 470
309
+ },
310
+ {
311
+ "epoch": 2.82,
312
+ "learning_rate": 7.176470588235295e-05,
313
+ "loss": 0.0072,
314
+ "step": 480
315
+ },
316
+ {
317
+ "epoch": 2.88,
318
+ "learning_rate": 7.11764705882353e-05,
319
+ "loss": 0.0048,
320
+ "step": 490
321
+ },
322
+ {
323
+ "epoch": 2.94,
324
+ "learning_rate": 7.058823529411765e-05,
325
+ "loss": 0.0012,
326
+ "step": 500
327
+ },
328
+ {
329
+ "epoch": 3.0,
330
+ "learning_rate": 7e-05,
331
+ "loss": 0.0036,
332
+ "step": 510
333
+ },
334
+ {
335
+ "epoch": 3.0,
336
+ "eval_loss": 0.0050306557677686214,
337
+ "eval_mse": 0.0050306557677686214,
338
+ "eval_runtime": 16.8544,
339
+ "eval_samples_per_second": 4.272,
340
+ "eval_steps_per_second": 0.534,
341
+ "step": 510
342
+ },
343
+ {
344
+ "epoch": 3.06,
345
+ "learning_rate": 6.941176470588236e-05,
346
+ "loss": 0.0055,
347
+ "step": 520
348
+ },
349
+ {
350
+ "epoch": 3.12,
351
+ "learning_rate": 6.882352941176471e-05,
352
+ "loss": 0.0112,
353
+ "step": 530
354
+ },
355
+ {
356
+ "epoch": 3.18,
357
+ "learning_rate": 6.823529411764707e-05,
358
+ "loss": 0.0038,
359
+ "step": 540
360
+ },
361
+ {
362
+ "epoch": 3.24,
363
+ "learning_rate": 6.764705882352942e-05,
364
+ "loss": 0.0017,
365
+ "step": 550
366
+ },
367
+ {
368
+ "epoch": 3.29,
369
+ "learning_rate": 6.705882352941176e-05,
370
+ "loss": 0.0015,
371
+ "step": 560
372
+ },
373
+ {
374
+ "epoch": 3.35,
375
+ "learning_rate": 6.647058823529411e-05,
376
+ "loss": 0.001,
377
+ "step": 570
378
+ },
379
+ {
380
+ "epoch": 3.41,
381
+ "learning_rate": 6.588235294117648e-05,
382
+ "loss": 0.0115,
383
+ "step": 580
384
+ },
385
+ {
386
+ "epoch": 3.47,
387
+ "learning_rate": 6.529411764705883e-05,
388
+ "loss": 0.0024,
389
+ "step": 590
390
+ },
391
+ {
392
+ "epoch": 3.53,
393
+ "learning_rate": 6.470588235294118e-05,
394
+ "loss": 0.0021,
395
+ "step": 600
396
+ },
397
+ {
398
+ "epoch": 3.59,
399
+ "learning_rate": 6.411764705882354e-05,
400
+ "loss": 0.0015,
401
+ "step": 610
402
+ },
403
+ {
404
+ "epoch": 3.65,
405
+ "learning_rate": 6.352941176470588e-05,
406
+ "loss": 0.0016,
407
+ "step": 620
408
+ },
409
+ {
410
+ "epoch": 3.71,
411
+ "learning_rate": 6.294117647058824e-05,
412
+ "loss": 0.0103,
413
+ "step": 630
414
+ },
415
+ {
416
+ "epoch": 3.76,
417
+ "learning_rate": 6.23529411764706e-05,
418
+ "loss": 0.0108,
419
+ "step": 640
420
+ },
421
+ {
422
+ "epoch": 3.82,
423
+ "learning_rate": 6.176470588235295e-05,
424
+ "loss": 0.0071,
425
+ "step": 650
426
+ },
427
+ {
428
+ "epoch": 3.88,
429
+ "learning_rate": 6.11764705882353e-05,
430
+ "loss": 0.0078,
431
+ "step": 660
432
+ },
433
+ {
434
+ "epoch": 3.94,
435
+ "learning_rate": 6.058823529411765e-05,
436
+ "loss": 0.0043,
437
+ "step": 670
438
+ },
439
+ {
440
+ "epoch": 4.0,
441
+ "learning_rate": 6e-05,
442
+ "loss": 0.0017,
443
+ "step": 680
444
+ },
445
+ {
446
+ "epoch": 4.0,
447
+ "eval_loss": 0.0019855075515806675,
448
+ "eval_mse": 0.001985507318750024,
449
+ "eval_runtime": 15.5377,
450
+ "eval_samples_per_second": 4.634,
451
+ "eval_steps_per_second": 0.579,
452
+ "step": 680
453
+ },
454
+ {
455
+ "epoch": 4.06,
456
+ "learning_rate": 5.9411764705882355e-05,
457
+ "loss": 0.0022,
458
+ "step": 690
459
+ },
460
+ {
461
+ "epoch": 4.12,
462
+ "learning_rate": 5.882352941176471e-05,
463
+ "loss": 0.0092,
464
+ "step": 700
465
+ },
466
+ {
467
+ "epoch": 4.18,
468
+ "learning_rate": 5.823529411764707e-05,
469
+ "loss": 0.0024,
470
+ "step": 710
471
+ },
472
+ {
473
+ "epoch": 4.24,
474
+ "learning_rate": 5.764705882352941e-05,
475
+ "loss": 0.002,
476
+ "step": 720
477
+ },
478
+ {
479
+ "epoch": 4.29,
480
+ "learning_rate": 5.7058823529411766e-05,
481
+ "loss": 0.0042,
482
+ "step": 730
483
+ },
484
+ {
485
+ "epoch": 4.35,
486
+ "learning_rate": 5.647058823529412e-05,
487
+ "loss": 0.0602,
488
+ "step": 740
489
+ },
490
+ {
491
+ "epoch": 4.41,
492
+ "learning_rate": 5.588235294117647e-05,
493
+ "loss": 0.0062,
494
+ "step": 750
495
+ },
496
+ {
497
+ "epoch": 4.47,
498
+ "learning_rate": 5.529411764705883e-05,
499
+ "loss": 0.003,
500
+ "step": 760
501
+ },
502
+ {
503
+ "epoch": 4.53,
504
+ "learning_rate": 5.4705882352941185e-05,
505
+ "loss": 0.0088,
506
+ "step": 770
507
+ },
508
+ {
509
+ "epoch": 4.59,
510
+ "learning_rate": 5.411764705882353e-05,
511
+ "loss": 0.0025,
512
+ "step": 780
513
+ },
514
+ {
515
+ "epoch": 4.65,
516
+ "learning_rate": 5.3529411764705884e-05,
517
+ "loss": 0.0109,
518
+ "step": 790
519
+ },
520
+ {
521
+ "epoch": 4.71,
522
+ "learning_rate": 5.294117647058824e-05,
523
+ "loss": 0.0015,
524
+ "step": 800
525
+ },
526
+ {
527
+ "epoch": 4.76,
528
+ "learning_rate": 5.235294117647059e-05,
529
+ "loss": 0.0011,
530
+ "step": 810
531
+ },
532
+ {
533
+ "epoch": 4.82,
534
+ "learning_rate": 5.176470588235295e-05,
535
+ "loss": 0.0101,
536
+ "step": 820
537
+ },
538
+ {
539
+ "epoch": 4.88,
540
+ "learning_rate": 5.117647058823529e-05,
541
+ "loss": 0.0011,
542
+ "step": 830
543
+ },
544
+ {
545
+ "epoch": 4.94,
546
+ "learning_rate": 5.058823529411765e-05,
547
+ "loss": 0.0017,
548
+ "step": 840
549
+ },
550
+ {
551
+ "epoch": 5.0,
552
+ "learning_rate": 5e-05,
553
+ "loss": 0.0014,
554
+ "step": 850
555
+ },
556
+ {
557
+ "epoch": 5.0,
558
+ "eval_loss": 0.001027416903525591,
559
+ "eval_mse": 0.001027416903525591,
560
+ "eval_runtime": 17.3556,
561
+ "eval_samples_per_second": 4.149,
562
+ "eval_steps_per_second": 0.519,
563
+ "step": 850
564
+ },
565
+ {
566
+ "epoch": 5.06,
567
+ "learning_rate": 4.9411764705882355e-05,
568
+ "loss": 0.0023,
569
+ "step": 860
570
+ },
571
+ {
572
+ "epoch": 5.12,
573
+ "learning_rate": 4.882352941176471e-05,
574
+ "loss": 0.0005,
575
+ "step": 870
576
+ },
577
+ {
578
+ "epoch": 5.18,
579
+ "learning_rate": 4.823529411764706e-05,
580
+ "loss": 0.0053,
581
+ "step": 880
582
+ },
583
+ {
584
+ "epoch": 5.24,
585
+ "learning_rate": 4.7647058823529414e-05,
586
+ "loss": 0.0026,
587
+ "step": 890
588
+ },
589
+ {
590
+ "epoch": 5.29,
591
+ "learning_rate": 4.705882352941177e-05,
592
+ "loss": 0.0045,
593
+ "step": 900
594
+ },
595
+ {
596
+ "epoch": 5.35,
597
+ "learning_rate": 4.647058823529412e-05,
598
+ "loss": 0.0017,
599
+ "step": 910
600
+ },
601
+ {
602
+ "epoch": 5.41,
603
+ "learning_rate": 4.588235294117647e-05,
604
+ "loss": 0.0012,
605
+ "step": 920
606
+ },
607
+ {
608
+ "epoch": 5.47,
609
+ "learning_rate": 4.5294117647058826e-05,
610
+ "loss": 0.0098,
611
+ "step": 930
612
+ },
613
+ {
614
+ "epoch": 5.53,
615
+ "learning_rate": 4.470588235294118e-05,
616
+ "loss": 0.0016,
617
+ "step": 940
618
+ },
619
+ {
620
+ "epoch": 5.59,
621
+ "learning_rate": 4.411764705882353e-05,
622
+ "loss": 0.0011,
623
+ "step": 950
624
+ },
625
+ {
626
+ "epoch": 5.65,
627
+ "learning_rate": 4.3529411764705885e-05,
628
+ "loss": 0.001,
629
+ "step": 960
630
+ },
631
+ {
632
+ "epoch": 5.71,
633
+ "learning_rate": 4.294117647058823e-05,
634
+ "loss": 0.0005,
635
+ "step": 970
636
+ },
637
+ {
638
+ "epoch": 5.76,
639
+ "learning_rate": 4.235294117647059e-05,
640
+ "loss": 0.0021,
641
+ "step": 980
642
+ },
643
+ {
644
+ "epoch": 5.82,
645
+ "learning_rate": 4.1764705882352944e-05,
646
+ "loss": 0.0107,
647
+ "step": 990
648
+ },
649
+ {
650
+ "epoch": 5.88,
651
+ "learning_rate": 4.11764705882353e-05,
652
+ "loss": 0.0086,
653
+ "step": 1000
654
+ },
655
+ {
656
+ "epoch": 5.94,
657
+ "learning_rate": 4.058823529411765e-05,
658
+ "loss": 0.0014,
659
+ "step": 1010
660
+ },
661
+ {
662
+ "epoch": 6.0,
663
+ "learning_rate": 4e-05,
664
+ "loss": 0.0008,
665
+ "step": 1020
666
+ },
667
+ {
668
+ "epoch": 6.0,
669
+ "eval_loss": 0.001083881827071309,
670
+ "eval_mse": 0.001083881827071309,
671
+ "eval_runtime": 15.3262,
672
+ "eval_samples_per_second": 4.698,
673
+ "eval_steps_per_second": 0.587,
674
+ "step": 1020
675
+ },
676
+ {
677
+ "epoch": 6.06,
678
+ "learning_rate": 3.9411764705882356e-05,
679
+ "loss": 0.0012,
680
+ "step": 1030
681
+ },
682
+ {
683
+ "epoch": 6.12,
684
+ "learning_rate": 3.882352941176471e-05,
685
+ "loss": 0.002,
686
+ "step": 1040
687
+ },
688
+ {
689
+ "epoch": 6.18,
690
+ "learning_rate": 3.8235294117647055e-05,
691
+ "loss": 0.0062,
692
+ "step": 1050
693
+ },
694
+ {
695
+ "epoch": 6.24,
696
+ "learning_rate": 3.7647058823529415e-05,
697
+ "loss": 0.0017,
698
+ "step": 1060
699
+ },
700
+ {
701
+ "epoch": 6.29,
702
+ "learning_rate": 3.705882352941177e-05,
703
+ "loss": 0.0008,
704
+ "step": 1070
705
+ },
706
+ {
707
+ "epoch": 6.35,
708
+ "learning_rate": 3.6470588235294114e-05,
709
+ "loss": 0.0013,
710
+ "step": 1080
711
+ },
712
+ {
713
+ "epoch": 6.41,
714
+ "learning_rate": 3.5882352941176474e-05,
715
+ "loss": 0.0078,
716
+ "step": 1090
717
+ },
718
+ {
719
+ "epoch": 6.47,
720
+ "learning_rate": 3.529411764705883e-05,
721
+ "loss": 0.0026,
722
+ "step": 1100
723
+ },
724
+ {
725
+ "epoch": 6.53,
726
+ "learning_rate": 3.470588235294118e-05,
727
+ "loss": 0.0084,
728
+ "step": 1110
729
+ },
730
+ {
731
+ "epoch": 6.59,
732
+ "learning_rate": 3.411764705882353e-05,
733
+ "loss": 0.0016,
734
+ "step": 1120
735
+ },
736
+ {
737
+ "epoch": 6.65,
738
+ "learning_rate": 3.352941176470588e-05,
739
+ "loss": 0.0026,
740
+ "step": 1130
741
+ },
742
+ {
743
+ "epoch": 6.71,
744
+ "learning_rate": 3.294117647058824e-05,
745
+ "loss": 0.0053,
746
+ "step": 1140
747
+ },
748
+ {
749
+ "epoch": 6.76,
750
+ "learning_rate": 3.235294117647059e-05,
751
+ "loss": 0.0058,
752
+ "step": 1150
753
+ },
754
+ {
755
+ "epoch": 6.82,
756
+ "learning_rate": 3.176470588235294e-05,
757
+ "loss": 0.001,
758
+ "step": 1160
759
+ },
760
+ {
761
+ "epoch": 6.88,
762
+ "learning_rate": 3.11764705882353e-05,
763
+ "loss": 0.0011,
764
+ "step": 1170
765
+ },
766
+ {
767
+ "epoch": 6.94,
768
+ "learning_rate": 3.058823529411765e-05,
769
+ "loss": 0.001,
770
+ "step": 1180
771
+ },
772
+ {
773
+ "epoch": 7.0,
774
+ "learning_rate": 3e-05,
775
+ "loss": 0.0008,
776
+ "step": 1190
777
+ },
778
+ {
779
+ "epoch": 7.0,
780
+ "eval_loss": 0.000933311355765909,
781
+ "eval_mse": 0.000933311355765909,
782
+ "eval_runtime": 15.2402,
783
+ "eval_samples_per_second": 4.724,
784
+ "eval_steps_per_second": 0.591,
785
+ "step": 1190
786
+ },
787
+ {
788
+ "epoch": 7.06,
789
+ "learning_rate": 2.9411764705882354e-05,
790
+ "loss": 0.0152,
791
+ "step": 1200
792
+ },
793
+ {
794
+ "epoch": 7.12,
795
+ "learning_rate": 2.8823529411764703e-05,
796
+ "loss": 0.0025,
797
+ "step": 1210
798
+ },
799
+ {
800
+ "epoch": 7.18,
801
+ "learning_rate": 2.823529411764706e-05,
802
+ "loss": 0.0018,
803
+ "step": 1220
804
+ },
805
+ {
806
+ "epoch": 7.24,
807
+ "learning_rate": 2.7647058823529416e-05,
808
+ "loss": 0.0011,
809
+ "step": 1230
810
+ },
811
+ {
812
+ "epoch": 7.29,
813
+ "learning_rate": 2.7058823529411766e-05,
814
+ "loss": 0.0076,
815
+ "step": 1240
816
+ },
817
+ {
818
+ "epoch": 7.35,
819
+ "learning_rate": 2.647058823529412e-05,
820
+ "loss": 0.0009,
821
+ "step": 1250
822
+ },
823
+ {
824
+ "epoch": 7.41,
825
+ "learning_rate": 2.5882352941176475e-05,
826
+ "loss": 0.0054,
827
+ "step": 1260
828
+ },
829
+ {
830
+ "epoch": 7.47,
831
+ "learning_rate": 2.5294117647058825e-05,
832
+ "loss": 0.0012,
833
+ "step": 1270
834
+ },
835
+ {
836
+ "epoch": 7.53,
837
+ "learning_rate": 2.4705882352941178e-05,
838
+ "loss": 0.0012,
839
+ "step": 1280
840
+ },
841
+ {
842
+ "epoch": 7.59,
843
+ "learning_rate": 2.411764705882353e-05,
844
+ "loss": 0.0005,
845
+ "step": 1290
846
+ },
847
+ {
848
+ "epoch": 7.65,
849
+ "learning_rate": 2.3529411764705884e-05,
850
+ "loss": 0.0011,
851
+ "step": 1300
852
+ },
853
+ {
854
+ "epoch": 7.71,
855
+ "learning_rate": 2.2941176470588237e-05,
856
+ "loss": 0.0022,
857
+ "step": 1310
858
+ },
859
+ {
860
+ "epoch": 7.76,
861
+ "learning_rate": 2.235294117647059e-05,
862
+ "loss": 0.0009,
863
+ "step": 1320
864
+ },
865
+ {
866
+ "epoch": 7.82,
867
+ "learning_rate": 2.1764705882352943e-05,
868
+ "loss": 0.0007,
869
+ "step": 1330
870
+ },
871
+ {
872
+ "epoch": 7.88,
873
+ "learning_rate": 2.1176470588235296e-05,
874
+ "loss": 0.0007,
875
+ "step": 1340
876
+ },
877
+ {
878
+ "epoch": 7.94,
879
+ "learning_rate": 2.058823529411765e-05,
880
+ "loss": 0.0007,
881
+ "step": 1350
882
+ },
883
+ {
884
+ "epoch": 8.0,
885
+ "learning_rate": 2e-05,
886
+ "loss": 0.0008,
887
+ "step": 1360
888
+ },
889
+ {
890
+ "epoch": 8.0,
891
+ "eval_loss": 0.001160233630798757,
892
+ "eval_mse": 0.001160233747214079,
893
+ "eval_runtime": 18.2978,
894
+ "eval_samples_per_second": 3.935,
895
+ "eval_steps_per_second": 0.492,
896
+ "step": 1360
897
+ },
898
+ {
899
+ "epoch": 8.06,
900
+ "learning_rate": 1.9411764705882355e-05,
901
+ "loss": 0.0072,
902
+ "step": 1370
903
+ },
904
+ {
905
+ "epoch": 8.12,
906
+ "learning_rate": 1.8823529411764708e-05,
907
+ "loss": 0.001,
908
+ "step": 1380
909
+ },
910
+ {
911
+ "epoch": 8.18,
912
+ "learning_rate": 1.8235294117647057e-05,
913
+ "loss": 0.0008,
914
+ "step": 1390
915
+ },
916
+ {
917
+ "epoch": 8.24,
918
+ "learning_rate": 1.7647058823529414e-05,
919
+ "loss": 0.0025,
920
+ "step": 1400
921
+ },
922
+ {
923
+ "epoch": 8.29,
924
+ "learning_rate": 1.7058823529411767e-05,
925
+ "loss": 0.001,
926
+ "step": 1410
927
+ },
928
+ {
929
+ "epoch": 8.35,
930
+ "learning_rate": 1.647058823529412e-05,
931
+ "loss": 0.0014,
932
+ "step": 1420
933
+ },
934
+ {
935
+ "epoch": 8.41,
936
+ "learning_rate": 1.588235294117647e-05,
937
+ "loss": 0.0015,
938
+ "step": 1430
939
+ },
940
+ {
941
+ "epoch": 8.47,
942
+ "learning_rate": 1.5294117647058826e-05,
943
+ "loss": 0.0041,
944
+ "step": 1440
945
+ },
946
+ {
947
+ "epoch": 8.53,
948
+ "learning_rate": 1.4705882352941177e-05,
949
+ "loss": 0.0012,
950
+ "step": 1450
951
+ },
952
+ {
953
+ "epoch": 8.59,
954
+ "learning_rate": 1.411764705882353e-05,
955
+ "loss": 0.0009,
956
+ "step": 1460
957
+ },
958
+ {
959
+ "epoch": 8.65,
960
+ "learning_rate": 1.3529411764705883e-05,
961
+ "loss": 0.0012,
962
+ "step": 1470
963
+ },
964
+ {
965
+ "epoch": 8.71,
966
+ "learning_rate": 1.2941176470588238e-05,
967
+ "loss": 0.0022,
968
+ "step": 1480
969
+ },
970
+ {
971
+ "epoch": 8.76,
972
+ "learning_rate": 1.2352941176470589e-05,
973
+ "loss": 0.0007,
974
+ "step": 1490
975
+ },
976
+ {
977
+ "epoch": 8.82,
978
+ "learning_rate": 1.1764705882352942e-05,
979
+ "loss": 0.0064,
980
+ "step": 1500
981
+ },
982
+ {
983
+ "epoch": 8.88,
984
+ "learning_rate": 1.1176470588235295e-05,
985
+ "loss": 0.0014,
986
+ "step": 1510
987
+ },
988
+ {
989
+ "epoch": 8.94,
990
+ "learning_rate": 1.0588235294117648e-05,
991
+ "loss": 0.0006,
992
+ "step": 1520
993
+ },
994
+ {
995
+ "epoch": 9.0,
996
+ "learning_rate": 1e-05,
997
+ "loss": 0.001,
998
+ "step": 1530
999
+ },
1000
+ {
1001
+ "epoch": 9.0,
1002
+ "eval_loss": 0.0007979701040312648,
1003
+ "eval_mse": 0.0007979701040312648,
1004
+ "eval_runtime": 17.0378,
1005
+ "eval_samples_per_second": 4.226,
1006
+ "eval_steps_per_second": 0.528,
1007
+ "step": 1530
1008
+ },
1009
+ {
1010
+ "epoch": 9.06,
1011
+ "learning_rate": 9.411764705882354e-06,
1012
+ "loss": 0.0009,
1013
+ "step": 1540
1014
+ },
1015
+ {
1016
+ "epoch": 9.12,
1017
+ "learning_rate": 8.823529411764707e-06,
1018
+ "loss": 0.0008,
1019
+ "step": 1550
1020
+ },
1021
+ {
1022
+ "epoch": 9.18,
1023
+ "learning_rate": 8.23529411764706e-06,
1024
+ "loss": 0.0021,
1025
+ "step": 1560
1026
+ },
1027
+ {
1028
+ "epoch": 9.24,
1029
+ "learning_rate": 7.647058823529413e-06,
1030
+ "loss": 0.0009,
1031
+ "step": 1570
1032
+ },
1033
+ {
1034
+ "epoch": 9.29,
1035
+ "learning_rate": 7.058823529411765e-06,
1036
+ "loss": 0.0006,
1037
+ "step": 1580
1038
+ },
1039
+ {
1040
+ "epoch": 9.35,
1041
+ "learning_rate": 6.470588235294119e-06,
1042
+ "loss": 0.0005,
1043
+ "step": 1590
1044
+ },
1045
+ {
1046
+ "epoch": 9.41,
1047
+ "learning_rate": 5.882352941176471e-06,
1048
+ "loss": 0.0005,
1049
+ "step": 1600
1050
+ },
1051
+ {
1052
+ "epoch": 9.47,
1053
+ "learning_rate": 5.294117647058824e-06,
1054
+ "loss": 0.0007,
1055
+ "step": 1610
1056
+ },
1057
+ {
1058
+ "epoch": 9.53,
1059
+ "learning_rate": 4.705882352941177e-06,
1060
+ "loss": 0.0033,
1061
+ "step": 1620
1062
+ },
1063
+ {
1064
+ "epoch": 9.59,
1065
+ "learning_rate": 4.11764705882353e-06,
1066
+ "loss": 0.0009,
1067
+ "step": 1630
1068
+ },
1069
+ {
1070
+ "epoch": 9.65,
1071
+ "learning_rate": 3.5294117647058825e-06,
1072
+ "loss": 0.0008,
1073
+ "step": 1640
1074
+ },
1075
+ {
1076
+ "epoch": 9.71,
1077
+ "learning_rate": 2.9411764705882355e-06,
1078
+ "loss": 0.0007,
1079
+ "step": 1650
1080
+ },
1081
+ {
1082
+ "epoch": 9.76,
1083
+ "learning_rate": 2.3529411764705885e-06,
1084
+ "loss": 0.002,
1085
+ "step": 1660
1086
+ },
1087
+ {
1088
+ "epoch": 9.82,
1089
+ "learning_rate": 1.7647058823529412e-06,
1090
+ "loss": 0.0009,
1091
+ "step": 1670
1092
+ },
1093
+ {
1094
+ "epoch": 9.88,
1095
+ "learning_rate": 1.1764705882352942e-06,
1096
+ "loss": 0.0008,
1097
+ "step": 1680
1098
+ },
1099
+ {
1100
+ "epoch": 9.94,
1101
+ "learning_rate": 5.882352941176471e-07,
1102
+ "loss": 0.0063,
1103
+ "step": 1690
1104
+ },
1105
+ {
1106
+ "epoch": 10.0,
1107
+ "learning_rate": 0.0,
1108
+ "loss": 0.0004,
1109
+ "step": 1700
1110
+ }
1111
+ ],
1112
+ "max_steps": 1700,
1113
+ "num_train_epochs": 10,
1114
+ "total_flos": 0.0,
1115
+ "trial_name": null,
1116
+ "trial_params": null
1117
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e9129b253c4797bfda8d87a2ac803edfd36087b3175c2d38ef40e84ae3f06f6e
3
+ size 3899