lombardata commited on
Commit
f2e976c
1 Parent(s): 849fe90

Evaluation on the test set completed on 2024_09_18.

Browse files
README.md ADDED
@@ -0,0 +1,147 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: facebook/dinov2-large
4
+ tags:
5
+ - generated_from_trainer
6
+ model-index:
7
+ - name: drone-DinoVdeau-large-2024_09_17-batch-size64_epochs100_freeze
8
+ results: []
9
+ ---
10
+
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
13
+
14
+ # drone-DinoVdeau-large-2024_09_17-batch-size64_epochs100_freeze
15
+
16
+ This model is a fine-tuned version of [facebook/dinov2-large](https://huggingface.co/facebook/dinov2-large) on the None dataset.
17
+ It achieves the following results on the evaluation set:
18
+ - Loss: 0.3578
19
+ - Mse: 0.0378
20
+ - Rmse: 0.1943
21
+ - Mae: 0.1288
22
+ - R2: 0.4008
23
+ - Explained Variance: 0.4014
24
+ - Learning Rate: 0.0000
25
+
26
+ ## Model description
27
+
28
+ More information needed
29
+
30
+ ## Intended uses & limitations
31
+
32
+ More information needed
33
+
34
+ ## Training and evaluation data
35
+
36
+ More information needed
37
+
38
+ ## Training procedure
39
+
40
+ ### Training hyperparameters
41
+
42
+ The following hyperparameters were used during training:
43
+ - learning_rate: 0.001
44
+ - train_batch_size: 64
45
+ - eval_batch_size: 64
46
+ - seed: 42
47
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
48
+ - lr_scheduler_type: linear
49
+ - num_epochs: 100
50
+ - mixed_precision_training: Native AMP
51
+
52
+ ### Training results
53
+
54
+ | Training Loss | Epoch | Step | Validation Loss | Mse | Rmse | Mae | R2 | Explained Variance | Rate |
55
+ |:-------------:|:-----:|:-----:|:---------------:|:------:|:------:|:------:|:------:|:------------------:|:------:|
56
+ | No log | 1.0 | 181 | 0.3858 | 0.0464 | 0.2153 | 0.1571 | 0.2624 | 0.2805 | 0.001 |
57
+ | No log | 2.0 | 362 | 0.3764 | 0.0440 | 0.2097 | 0.1467 | 0.3121 | 0.3209 | 0.001 |
58
+ | 0.4473 | 3.0 | 543 | 0.3716 | 0.0425 | 0.2062 | 0.1450 | 0.3319 | 0.3394 | 0.001 |
59
+ | 0.4473 | 4.0 | 724 | 0.3673 | 0.0410 | 0.2024 | 0.1395 | 0.3548 | 0.3566 | 0.001 |
60
+ | 0.4473 | 5.0 | 905 | 0.3692 | 0.0419 | 0.2046 | 0.1393 | 0.3425 | 0.3494 | 0.001 |
61
+ | 0.3892 | 6.0 | 1086 | 0.3673 | 0.0409 | 0.2022 | 0.1412 | 0.3554 | 0.3590 | 0.001 |
62
+ | 0.3892 | 7.0 | 1267 | 0.3681 | 0.0415 | 0.2038 | 0.1408 | 0.3457 | 0.3499 | 0.001 |
63
+ | 0.3892 | 8.0 | 1448 | 0.3656 | 0.0406 | 0.2015 | 0.1389 | 0.3596 | 0.3642 | 0.001 |
64
+ | 0.3855 | 9.0 | 1629 | 0.3659 | 0.0408 | 0.2019 | 0.1344 | 0.3555 | 0.3613 | 0.001 |
65
+ | 0.3855 | 10.0 | 1810 | 0.3666 | 0.0409 | 0.2023 | 0.1384 | 0.3533 | 0.3562 | 0.001 |
66
+ | 0.3855 | 11.0 | 1991 | 0.3666 | 0.0409 | 0.2022 | 0.1366 | 0.3550 | 0.3574 | 0.001 |
67
+ | 0.3816 | 12.0 | 2172 | 0.3663 | 0.0409 | 0.2021 | 0.1396 | 0.3587 | 0.3598 | 0.001 |
68
+ | 0.3816 | 13.0 | 2353 | 0.3632 | 0.0398 | 0.1995 | 0.1361 | 0.3697 | 0.3705 | 0.001 |
69
+ | 0.381 | 14.0 | 2534 | 0.3669 | 0.0410 | 0.2024 | 0.1423 | 0.3562 | 0.3628 | 0.001 |
70
+ | 0.381 | 15.0 | 2715 | 0.3645 | 0.0404 | 0.2009 | 0.1395 | 0.3620 | 0.3645 | 0.001 |
71
+ | 0.381 | 16.0 | 2896 | 0.3639 | 0.0400 | 0.2000 | 0.1357 | 0.3695 | 0.3715 | 0.001 |
72
+ | 0.3811 | 17.0 | 3077 | 0.3667 | 0.0406 | 0.2016 | 0.1413 | 0.3622 | 0.3728 | 0.001 |
73
+ | 0.3811 | 18.0 | 3258 | 0.3632 | 0.0398 | 0.1995 | 0.1368 | 0.3695 | 0.3705 | 0.001 |
74
+ | 0.3811 | 19.0 | 3439 | 0.3630 | 0.0397 | 0.1994 | 0.1354 | 0.3719 | 0.3734 | 0.001 |
75
+ | 0.3792 | 20.0 | 3620 | 0.3649 | 0.0405 | 0.2013 | 0.1349 | 0.3587 | 0.3622 | 0.001 |
76
+ | 0.3792 | 21.0 | 3801 | 0.3665 | 0.0407 | 0.2017 | 0.1361 | 0.3585 | 0.3631 | 0.001 |
77
+ | 0.3792 | 22.0 | 3982 | 0.3648 | 0.0400 | 0.2000 | 0.1369 | 0.3678 | 0.3705 | 0.001 |
78
+ | 0.3808 | 23.0 | 4163 | 0.3633 | 0.0398 | 0.1996 | 0.1356 | 0.3705 | 0.3736 | 0.001 |
79
+ | 0.3808 | 24.0 | 4344 | 0.3632 | 0.0397 | 0.1991 | 0.1393 | 0.3725 | 0.3761 | 0.001 |
80
+ | 0.3796 | 25.0 | 4525 | 0.3638 | 0.0399 | 0.1997 | 0.1381 | 0.3698 | 0.3734 | 0.001 |
81
+ | 0.3796 | 26.0 | 4706 | 0.3607 | 0.0390 | 0.1975 | 0.1329 | 0.3818 | 0.3836 | 0.0001 |
82
+ | 0.3796 | 27.0 | 4887 | 0.3600 | 0.0387 | 0.1967 | 0.1353 | 0.3863 | 0.3878 | 0.0001 |
83
+ | 0.3765 | 28.0 | 5068 | 0.3592 | 0.0384 | 0.1961 | 0.1337 | 0.3894 | 0.3904 | 0.0001 |
84
+ | 0.3765 | 29.0 | 5249 | 0.3595 | 0.0385 | 0.1961 | 0.1350 | 0.3892 | 0.3915 | 0.0001 |
85
+ | 0.3765 | 30.0 | 5430 | 0.3598 | 0.0386 | 0.1965 | 0.1350 | 0.3876 | 0.3893 | 0.0001 |
86
+ | 0.373 | 31.0 | 5611 | 0.3587 | 0.0384 | 0.1959 | 0.1317 | 0.3907 | 0.3921 | 0.0001 |
87
+ | 0.373 | 32.0 | 5792 | 0.3584 | 0.0383 | 0.1956 | 0.1326 | 0.3928 | 0.3932 | 0.0001 |
88
+ | 0.373 | 33.0 | 5973 | 0.3581 | 0.0381 | 0.1953 | 0.1311 | 0.3945 | 0.3953 | 0.0001 |
89
+ | 0.3735 | 34.0 | 6154 | 0.3580 | 0.0381 | 0.1951 | 0.1323 | 0.3953 | 0.3967 | 0.0001 |
90
+ | 0.3735 | 35.0 | 6335 | 0.3579 | 0.0381 | 0.1951 | 0.1322 | 0.3949 | 0.3954 | 0.0001 |
91
+ | 0.3711 | 36.0 | 6516 | 0.3592 | 0.0385 | 0.1963 | 0.1345 | 0.3895 | 0.3899 | 0.0001 |
92
+ | 0.3711 | 37.0 | 6697 | 0.3575 | 0.0380 | 0.1949 | 0.1313 | 0.3966 | 0.3970 | 0.0001 |
93
+ | 0.3711 | 38.0 | 6878 | 0.3582 | 0.0383 | 0.1956 | 0.1326 | 0.3923 | 0.3936 | 0.0001 |
94
+ | 0.3705 | 39.0 | 7059 | 0.3576 | 0.0380 | 0.1948 | 0.1313 | 0.3963 | 0.3965 | 0.0001 |
95
+ | 0.3705 | 40.0 | 7240 | 0.3575 | 0.0379 | 0.1947 | 0.1333 | 0.3980 | 0.4000 | 0.0001 |
96
+ | 0.3705 | 41.0 | 7421 | 0.3580 | 0.0381 | 0.1952 | 0.1317 | 0.3956 | 0.3988 | 0.0001 |
97
+ | 0.3704 | 42.0 | 7602 | 0.3575 | 0.0380 | 0.1949 | 0.1330 | 0.3970 | 0.3986 | 0.0001 |
98
+ | 0.3704 | 43.0 | 7783 | 0.3569 | 0.0377 | 0.1942 | 0.1325 | 0.4008 | 0.4020 | 0.0001 |
99
+ | 0.3704 | 44.0 | 7964 | 0.3568 | 0.0377 | 0.1942 | 0.1305 | 0.4009 | 0.4026 | 0.0001 |
100
+ | 0.3695 | 45.0 | 8145 | 0.3567 | 0.0376 | 0.1940 | 0.1319 | 0.4021 | 0.4033 | 0.0001 |
101
+ | 0.3695 | 46.0 | 8326 | 0.3569 | 0.0377 | 0.1943 | 0.1298 | 0.3998 | 0.4015 | 0.0001 |
102
+ | 0.369 | 47.0 | 8507 | 0.3574 | 0.0380 | 0.1948 | 0.1292 | 0.3973 | 0.3996 | 0.0001 |
103
+ | 0.369 | 48.0 | 8688 | 0.3563 | 0.0376 | 0.1940 | 0.1302 | 0.4019 | 0.4041 | 0.0001 |
104
+ | 0.369 | 49.0 | 8869 | 0.3566 | 0.0377 | 0.1940 | 0.1306 | 0.4011 | 0.4024 | 0.0001 |
105
+ | 0.3691 | 50.0 | 9050 | 0.3571 | 0.0378 | 0.1944 | 0.1322 | 0.3998 | 0.4015 | 0.0001 |
106
+ | 0.3691 | 51.0 | 9231 | 0.3584 | 0.0381 | 0.1952 | 0.1335 | 0.3958 | 0.4021 | 0.0001 |
107
+ | 0.3691 | 52.0 | 9412 | 0.3561 | 0.0375 | 0.1936 | 0.1309 | 0.4042 | 0.4045 | 0.0001 |
108
+ | 0.3677 | 53.0 | 9593 | 0.3565 | 0.0376 | 0.1939 | 0.1315 | 0.4026 | 0.4053 | 0.0001 |
109
+ | 0.3677 | 54.0 | 9774 | 0.3567 | 0.0377 | 0.1943 | 0.1316 | 0.4011 | 0.4018 | 0.0001 |
110
+ | 0.3677 | 55.0 | 9955 | 0.3565 | 0.0376 | 0.1939 | 0.1292 | 0.4026 | 0.4052 | 0.0001 |
111
+ | 0.3684 | 56.0 | 10136 | 0.3567 | 0.0377 | 0.1941 | 0.1279 | 0.4017 | 0.4046 | 0.0001 |
112
+ | 0.3684 | 57.0 | 10317 | 0.3562 | 0.0376 | 0.1938 | 0.1294 | 0.4032 | 0.4049 | 0.0001 |
113
+ | 0.3684 | 58.0 | 10498 | 0.3565 | 0.0376 | 0.1938 | 0.1299 | 0.4036 | 0.4062 | 0.0001 |
114
+ | 0.368 | 59.0 | 10679 | 0.3559 | 0.0375 | 0.1936 | 0.1292 | 0.4047 | 0.4061 | 1e-05 |
115
+ | 0.368 | 60.0 | 10860 | 0.3559 | 0.0374 | 0.1934 | 0.1295 | 0.4060 | 0.4082 | 1e-05 |
116
+ | 0.3664 | 61.0 | 11041 | 0.3555 | 0.0373 | 0.1932 | 0.1304 | 0.4072 | 0.4075 | 1e-05 |
117
+ | 0.3664 | 62.0 | 11222 | 0.3565 | 0.0376 | 0.1939 | 0.1317 | 0.4036 | 0.4058 | 1e-05 |
118
+ | 0.3664 | 63.0 | 11403 | 0.3556 | 0.0373 | 0.1930 | 0.1293 | 0.4075 | 0.4087 | 1e-05 |
119
+ | 0.366 | 64.0 | 11584 | 0.3554 | 0.0373 | 0.1931 | 0.1296 | 0.4077 | 0.4089 | 1e-05 |
120
+ | 0.366 | 65.0 | 11765 | 0.3560 | 0.0375 | 0.1938 | 0.1307 | 0.4049 | 0.4059 | 1e-05 |
121
+ | 0.366 | 66.0 | 11946 | 0.3553 | 0.0372 | 0.1930 | 0.1300 | 0.4080 | 0.4085 | 1e-05 |
122
+ | 0.3654 | 67.0 | 12127 | 0.3554 | 0.0373 | 0.1930 | 0.1299 | 0.4078 | 0.4082 | 1e-05 |
123
+ | 0.3654 | 68.0 | 12308 | 0.3556 | 0.0374 | 0.1934 | 0.1302 | 0.4059 | 0.4074 | 1e-05 |
124
+ | 0.3654 | 69.0 | 12489 | 0.3554 | 0.0373 | 0.1930 | 0.1298 | 0.4083 | 0.4086 | 1e-05 |
125
+ | 0.3658 | 70.0 | 12670 | 0.3559 | 0.0374 | 0.1933 | 0.1307 | 0.4066 | 0.4094 | 1e-05 |
126
+ | 0.3658 | 71.0 | 12851 | 0.3557 | 0.0374 | 0.1933 | 0.1296 | 0.4070 | 0.4073 | 1e-05 |
127
+ | 0.366 | 72.0 | 13032 | 0.3557 | 0.0373 | 0.1932 | 0.1303 | 0.4070 | 0.4084 | 1e-05 |
128
+ | 0.366 | 73.0 | 13213 | 0.3552 | 0.0372 | 0.1929 | 0.1299 | 0.4082 | 0.4090 | 0.0000 |
129
+ | 0.366 | 74.0 | 13394 | 0.3552 | 0.0372 | 0.1929 | 0.1281 | 0.4087 | 0.4094 | 0.0000 |
130
+ | 0.3654 | 75.0 | 13575 | 0.3558 | 0.0375 | 0.1936 | 0.1303 | 0.4047 | 0.4057 | 0.0000 |
131
+ | 0.3654 | 76.0 | 13756 | 0.3555 | 0.0374 | 0.1933 | 0.1277 | 0.4061 | 0.4084 | 0.0000 |
132
+ | 0.3654 | 77.0 | 13937 | 0.3562 | 0.0376 | 0.1938 | 0.1321 | 0.4042 | 0.4046 | 0.0000 |
133
+ | 0.3663 | 78.0 | 14118 | 0.3553 | 0.0372 | 0.1929 | 0.1306 | 0.4087 | 0.4090 | 0.0000 |
134
+ | 0.3663 | 79.0 | 14299 | 0.3569 | 0.0379 | 0.1947 | 0.1310 | 0.3999 | 0.4020 | 0.0000 |
135
+ | 0.3663 | 80.0 | 14480 | 0.3563 | 0.0375 | 0.1936 | 0.1311 | 0.4052 | 0.4058 | 0.0000 |
136
+ | 0.3655 | 81.0 | 14661 | 0.3555 | 0.0373 | 0.1930 | 0.1308 | 0.4079 | 0.4092 | 0.0000 |
137
+ | 0.3655 | 82.0 | 14842 | 0.3556 | 0.0373 | 0.1932 | 0.1309 | 0.4072 | 0.4087 | 0.0000 |
138
+ | 0.3651 | 83.0 | 15023 | 0.3557 | 0.0373 | 0.1932 | 0.1304 | 0.4074 | 0.4102 | 0.0000 |
139
+ | 0.3651 | 84.0 | 15204 | 0.3558 | 0.0374 | 0.1934 | 0.1306 | 0.4063 | 0.4082 | 0.0000 |
140
+
141
+
142
+ ### Framework versions
143
+
144
+ - Transformers 4.41.1
145
+ - Pytorch 2.3.0+cu121
146
+ - Datasets 2.19.1
147
+ - Tokenizers 0.19.1
all_results.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 84.0,
3
+ "eval_explained_variance": 0.40141621002784145,
4
+ "eval_loss": 0.35779908299446106,
5
+ "eval_mae": 0.12878435850143433,
6
+ "eval_mse": 0.0377507321536541,
7
+ "eval_r2": 0.4007782891079936,
8
+ "eval_rmse": 0.1942954808473587,
9
+ "eval_runtime": 66.1225,
10
+ "eval_samples_per_second": 58.074,
11
+ "eval_steps_per_second": 0.907,
12
+ "learning_rate": 1.0000000000000002e-07,
13
+ "total_flos": 2.180798470217171e+19,
14
+ "train_loss": 0.37467605181350044,
15
+ "train_runtime": 24668.9414,
16
+ "train_samples_per_second": 46.707,
17
+ "train_steps_per_second": 0.734
18
+ }
logs/events.out.tfevents.1726594904.datavisu4 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:65a371da58318f87c91bd5c57ea3c9bd9383076e015403dd00c0b033c473b0af
3
- size 60082
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3a34a7eb1d4d90ba5a31e1bc44c63418ed57e553753859abe96efcfba95c2474
3
+ size 61604
logs/events.out.tfevents.1726619814.datavisu4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:957c57852743b5787cc14f19648bf7fc3bd61872d536e9bbec1281ea7a97e054
3
+ size 135
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9fa68f2e0bc790aaf6d22a03251ddd26f951327483ad455757fbaba1c5b508ed
3
  size 1222528676
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b6ecd9a2842a6f0f53514690399f6362e6e3313a6cce5dc7b8f077c1f575284b
3
  size 1222528676
test_results.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 84.0,
3
+ "eval_explained_variance": 0.40141621002784145,
4
+ "eval_loss": 0.35779908299446106,
5
+ "eval_mae": 0.12878435850143433,
6
+ "eval_mse": 0.0377507321536541,
7
+ "eval_r2": 0.4007782891079936,
8
+ "eval_rmse": 0.1942954808473587,
9
+ "eval_runtime": 66.1225,
10
+ "eval_samples_per_second": 58.074,
11
+ "eval_steps_per_second": 0.907,
12
+ "learning_rate": 1.0000000000000002e-07
13
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 84.0,
3
+ "learning_rate": 1.0000000000000002e-07,
4
+ "total_flos": 2.180798470217171e+19,
5
+ "train_loss": 0.37467605181350044,
6
+ "train_runtime": 24668.9414,
7
+ "train_samples_per_second": 46.707,
8
+ "train_steps_per_second": 0.734
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,1438 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.35516515374183655,
3
+ "best_model_checkpoint": "/home/datawork-iot-nos/Seatizen/models/multilabel/drone/drone-DinoVdeau-large-2024_09_17-batch-size64_epochs100_freeze/checkpoint-13394",
4
+ "epoch": 84.0,
5
+ "eval_steps": 500,
6
+ "global_step": 15204,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 1.0,
13
+ "eval_explained_variance": 0.28046968350043666,
14
+ "eval_loss": 0.38582414388656616,
15
+ "eval_mae": 0.15708860754966736,
16
+ "eval_mse": 0.04635250195860863,
17
+ "eval_r2": 0.26238919671070565,
18
+ "eval_rmse": 0.21529631316661835,
19
+ "eval_runtime": 68.5532,
20
+ "eval_samples_per_second": 56.073,
21
+ "eval_steps_per_second": 0.89,
22
+ "learning_rate": 0.001,
23
+ "step": 181
24
+ },
25
+ {
26
+ "epoch": 2.0,
27
+ "eval_explained_variance": 0.32091750548436093,
28
+ "eval_loss": 0.37635815143585205,
29
+ "eval_mae": 0.1467229723930359,
30
+ "eval_mse": 0.04398971050977707,
31
+ "eval_r2": 0.3120521114085856,
32
+ "eval_rmse": 0.20973724126815796,
33
+ "eval_runtime": 65.3616,
34
+ "eval_samples_per_second": 58.811,
35
+ "eval_steps_per_second": 0.933,
36
+ "learning_rate": 0.001,
37
+ "step": 362
38
+ },
39
+ {
40
+ "epoch": 2.7624309392265194,
41
+ "grad_norm": 0.29469817876815796,
42
+ "learning_rate": 0.001,
43
+ "loss": 0.4473,
44
+ "step": 500
45
+ },
46
+ {
47
+ "epoch": 3.0,
48
+ "eval_explained_variance": 0.3393913645010728,
49
+ "eval_loss": 0.3715941309928894,
50
+ "eval_mae": 0.1450481116771698,
51
+ "eval_mse": 0.04250793904066086,
52
+ "eval_r2": 0.33185449725151883,
53
+ "eval_rmse": 0.20617453753948212,
54
+ "eval_runtime": 65.4448,
55
+ "eval_samples_per_second": 58.737,
56
+ "eval_steps_per_second": 0.932,
57
+ "learning_rate": 0.001,
58
+ "step": 543
59
+ },
60
+ {
61
+ "epoch": 4.0,
62
+ "eval_explained_variance": 0.35663692767803484,
63
+ "eval_loss": 0.3672849237918854,
64
+ "eval_mae": 0.1395464986562729,
65
+ "eval_mse": 0.0409623458981514,
66
+ "eval_r2": 0.35477505001012255,
67
+ "eval_rmse": 0.20239156484603882,
68
+ "eval_runtime": 65.8223,
69
+ "eval_samples_per_second": 58.4,
70
+ "eval_steps_per_second": 0.927,
71
+ "learning_rate": 0.001,
72
+ "step": 724
73
+ },
74
+ {
75
+ "epoch": 5.0,
76
+ "eval_explained_variance": 0.3493932577279898,
77
+ "eval_loss": 0.3692065477371216,
78
+ "eval_mae": 0.1393202394247055,
79
+ "eval_mse": 0.041857048869132996,
80
+ "eval_r2": 0.3425061497286567,
81
+ "eval_rmse": 0.20458994805812836,
82
+ "eval_runtime": 66.3389,
83
+ "eval_samples_per_second": 57.945,
84
+ "eval_steps_per_second": 0.92,
85
+ "learning_rate": 0.001,
86
+ "step": 905
87
+ },
88
+ {
89
+ "epoch": 5.524861878453039,
90
+ "grad_norm": 0.19042304158210754,
91
+ "learning_rate": 0.001,
92
+ "loss": 0.3892,
93
+ "step": 1000
94
+ },
95
+ {
96
+ "epoch": 6.0,
97
+ "eval_explained_variance": 0.35904277287996733,
98
+ "eval_loss": 0.3672534227371216,
99
+ "eval_mae": 0.14119164645671844,
100
+ "eval_mse": 0.040877003222703934,
101
+ "eval_r2": 0.3553590945142445,
102
+ "eval_rmse": 0.2021806240081787,
103
+ "eval_runtime": 65.5836,
104
+ "eval_samples_per_second": 58.612,
105
+ "eval_steps_per_second": 0.93,
106
+ "learning_rate": 0.001,
107
+ "step": 1086
108
+ },
109
+ {
110
+ "epoch": 7.0,
111
+ "eval_explained_variance": 0.34988729311869693,
112
+ "eval_loss": 0.3680865168571472,
113
+ "eval_mae": 0.14079739153385162,
114
+ "eval_mse": 0.04153257608413696,
115
+ "eval_r2": 0.3456613343062778,
116
+ "eval_rmse": 0.2037954330444336,
117
+ "eval_runtime": 64.4017,
118
+ "eval_samples_per_second": 59.688,
119
+ "eval_steps_per_second": 0.947,
120
+ "learning_rate": 0.001,
121
+ "step": 1267
122
+ },
123
+ {
124
+ "epoch": 8.0,
125
+ "eval_explained_variance": 0.36423414945602417,
126
+ "eval_loss": 0.365603506565094,
127
+ "eval_mae": 0.13892073929309845,
128
+ "eval_mse": 0.04058730974793434,
129
+ "eval_r2": 0.35962535995096967,
130
+ "eval_rmse": 0.20146292448043823,
131
+ "eval_runtime": 64.831,
132
+ "eval_samples_per_second": 59.293,
133
+ "eval_steps_per_second": 0.941,
134
+ "learning_rate": 0.001,
135
+ "step": 1448
136
+ },
137
+ {
138
+ "epoch": 8.287292817679559,
139
+ "grad_norm": 0.1760077178478241,
140
+ "learning_rate": 0.001,
141
+ "loss": 0.3855,
142
+ "step": 1500
143
+ },
144
+ {
145
+ "epoch": 9.0,
146
+ "eval_explained_variance": 0.3612723258825449,
147
+ "eval_loss": 0.36585840582847595,
148
+ "eval_mae": 0.13438531756401062,
149
+ "eval_mse": 0.04076695442199707,
150
+ "eval_r2": 0.3554776353070419,
151
+ "eval_rmse": 0.20190827548503876,
152
+ "eval_runtime": 64.4706,
153
+ "eval_samples_per_second": 59.624,
154
+ "eval_steps_per_second": 0.946,
155
+ "learning_rate": 0.001,
156
+ "step": 1629
157
+ },
158
+ {
159
+ "epoch": 10.0,
160
+ "eval_explained_variance": 0.3561701728747441,
161
+ "eval_loss": 0.366574227809906,
162
+ "eval_mae": 0.13837845623493195,
163
+ "eval_mse": 0.04093795642256737,
164
+ "eval_r2": 0.3533183127533612,
165
+ "eval_rmse": 0.2023313045501709,
166
+ "eval_runtime": 63.2978,
167
+ "eval_samples_per_second": 60.729,
168
+ "eval_steps_per_second": 0.964,
169
+ "learning_rate": 0.001,
170
+ "step": 1810
171
+ },
172
+ {
173
+ "epoch": 11.0,
174
+ "eval_explained_variance": 0.3574172487625709,
175
+ "eval_loss": 0.36660775542259216,
176
+ "eval_mae": 0.13663478195667267,
177
+ "eval_mse": 0.04090488329529762,
178
+ "eval_r2": 0.35496352056496683,
179
+ "eval_rmse": 0.20224955677986145,
180
+ "eval_runtime": 66.2827,
181
+ "eval_samples_per_second": 57.994,
182
+ "eval_steps_per_second": 0.92,
183
+ "learning_rate": 0.001,
184
+ "step": 1991
185
+ },
186
+ {
187
+ "epoch": 11.049723756906078,
188
+ "grad_norm": 0.14891982078552246,
189
+ "learning_rate": 0.001,
190
+ "loss": 0.3816,
191
+ "step": 2000
192
+ },
193
+ {
194
+ "epoch": 12.0,
195
+ "eval_explained_variance": 0.3598099580177894,
196
+ "eval_loss": 0.36626219749450684,
197
+ "eval_mae": 0.13958622515201569,
198
+ "eval_mse": 0.04085636883974075,
199
+ "eval_r2": 0.35871773520484396,
200
+ "eval_rmse": 0.20212958753108978,
201
+ "eval_runtime": 64.7339,
202
+ "eval_samples_per_second": 59.382,
203
+ "eval_steps_per_second": 0.942,
204
+ "learning_rate": 0.001,
205
+ "step": 2172
206
+ },
207
+ {
208
+ "epoch": 13.0,
209
+ "eval_explained_variance": 0.37047534722548264,
210
+ "eval_loss": 0.3631901741027832,
211
+ "eval_mae": 0.1360856592655182,
212
+ "eval_mse": 0.03979066386818886,
213
+ "eval_r2": 0.3696611807758026,
214
+ "eval_rmse": 0.1994759738445282,
215
+ "eval_runtime": 65.3689,
216
+ "eval_samples_per_second": 58.805,
217
+ "eval_steps_per_second": 0.933,
218
+ "learning_rate": 0.001,
219
+ "step": 2353
220
+ },
221
+ {
222
+ "epoch": 13.812154696132596,
223
+ "grad_norm": 0.14235170185565948,
224
+ "learning_rate": 0.001,
225
+ "loss": 0.381,
226
+ "step": 2500
227
+ },
228
+ {
229
+ "epoch": 14.0,
230
+ "eval_explained_variance": 0.36284926304450404,
231
+ "eval_loss": 0.36694806814193726,
232
+ "eval_mae": 0.14229656755924225,
233
+ "eval_mse": 0.04098258540034294,
234
+ "eval_r2": 0.356153731540797,
235
+ "eval_rmse": 0.20244155824184418,
236
+ "eval_runtime": 64.126,
237
+ "eval_samples_per_second": 59.945,
238
+ "eval_steps_per_second": 0.951,
239
+ "learning_rate": 0.001,
240
+ "step": 2534
241
+ },
242
+ {
243
+ "epoch": 15.0,
244
+ "eval_explained_variance": 0.36449302159822905,
245
+ "eval_loss": 0.3644973933696747,
246
+ "eval_mae": 0.1395292580127716,
247
+ "eval_mse": 0.04036581516265869,
248
+ "eval_r2": 0.36203359510531696,
249
+ "eval_rmse": 0.2009124606847763,
250
+ "eval_runtime": 64.0305,
251
+ "eval_samples_per_second": 60.034,
252
+ "eval_steps_per_second": 0.953,
253
+ "learning_rate": 0.001,
254
+ "step": 2715
255
+ },
256
+ {
257
+ "epoch": 16.0,
258
+ "eval_explained_variance": 0.37152041838719296,
259
+ "eval_loss": 0.36393943428993225,
260
+ "eval_mae": 0.13569381833076477,
261
+ "eval_mse": 0.039987124502658844,
262
+ "eval_r2": 0.36948082804864185,
263
+ "eval_rmse": 0.19996780157089233,
264
+ "eval_runtime": 63.9139,
265
+ "eval_samples_per_second": 60.143,
266
+ "eval_steps_per_second": 0.954,
267
+ "learning_rate": 0.001,
268
+ "step": 2896
269
+ },
270
+ {
271
+ "epoch": 16.574585635359117,
272
+ "grad_norm": 0.13048891723155975,
273
+ "learning_rate": 0.001,
274
+ "loss": 0.3811,
275
+ "step": 3000
276
+ },
277
+ {
278
+ "epoch": 17.0,
279
+ "eval_explained_variance": 0.37284482900912946,
280
+ "eval_loss": 0.36665406823158264,
281
+ "eval_mae": 0.14128881692886353,
282
+ "eval_mse": 0.04064851254224777,
283
+ "eval_r2": 0.3621847777710853,
284
+ "eval_rmse": 0.20161476731300354,
285
+ "eval_runtime": 66.0408,
286
+ "eval_samples_per_second": 58.206,
287
+ "eval_steps_per_second": 0.924,
288
+ "learning_rate": 0.001,
289
+ "step": 3077
290
+ },
291
+ {
292
+ "epoch": 18.0,
293
+ "eval_explained_variance": 0.3705295782822829,
294
+ "eval_loss": 0.36318618059158325,
295
+ "eval_mae": 0.13683417439460754,
296
+ "eval_mse": 0.03981361910700798,
297
+ "eval_r2": 0.369508628045091,
298
+ "eval_rmse": 0.19953350722789764,
299
+ "eval_runtime": 63.7575,
300
+ "eval_samples_per_second": 60.291,
301
+ "eval_steps_per_second": 0.957,
302
+ "learning_rate": 0.001,
303
+ "step": 3258
304
+ },
305
+ {
306
+ "epoch": 19.0,
307
+ "eval_explained_variance": 0.3733598177249615,
308
+ "eval_loss": 0.36302879452705383,
309
+ "eval_mae": 0.13539017736911774,
310
+ "eval_mse": 0.03974781930446625,
311
+ "eval_r2": 0.3718927441003872,
312
+ "eval_rmse": 0.19936855137348175,
313
+ "eval_runtime": 63.4414,
314
+ "eval_samples_per_second": 60.591,
315
+ "eval_steps_per_second": 0.962,
316
+ "learning_rate": 0.001,
317
+ "step": 3439
318
+ },
319
+ {
320
+ "epoch": 19.337016574585636,
321
+ "grad_norm": 0.13633792102336884,
322
+ "learning_rate": 0.001,
323
+ "loss": 0.3792,
324
+ "step": 3500
325
+ },
326
+ {
327
+ "epoch": 20.0,
328
+ "eval_explained_variance": 0.3622324833503136,
329
+ "eval_loss": 0.36489424109458923,
330
+ "eval_mae": 0.13486731052398682,
331
+ "eval_mse": 0.04052112251520157,
332
+ "eval_r2": 0.35869592759647334,
333
+ "eval_rmse": 0.20129859447479248,
334
+ "eval_runtime": 64.295,
335
+ "eval_samples_per_second": 59.787,
336
+ "eval_steps_per_second": 0.949,
337
+ "learning_rate": 0.001,
338
+ "step": 3620
339
+ },
340
+ {
341
+ "epoch": 21.0,
342
+ "eval_explained_variance": 0.3630923858055702,
343
+ "eval_loss": 0.3665030300617218,
344
+ "eval_mae": 0.13610774278640747,
345
+ "eval_mse": 0.040700096637010574,
346
+ "eval_r2": 0.3584834523421166,
347
+ "eval_rmse": 0.20174264907836914,
348
+ "eval_runtime": 64.1739,
349
+ "eval_samples_per_second": 59.9,
350
+ "eval_steps_per_second": 0.951,
351
+ "learning_rate": 0.001,
352
+ "step": 3801
353
+ },
354
+ {
355
+ "epoch": 22.0,
356
+ "eval_explained_variance": 0.3704591485170218,
357
+ "eval_loss": 0.3647814095020294,
358
+ "eval_mae": 0.1368531733751297,
359
+ "eval_mse": 0.03999844938516617,
360
+ "eval_r2": 0.3677615209740873,
361
+ "eval_rmse": 0.19999612867832184,
362
+ "eval_runtime": 63.5961,
363
+ "eval_samples_per_second": 60.444,
364
+ "eval_steps_per_second": 0.959,
365
+ "learning_rate": 0.001,
366
+ "step": 3982
367
+ },
368
+ {
369
+ "epoch": 22.099447513812155,
370
+ "grad_norm": 0.1797100454568863,
371
+ "learning_rate": 0.001,
372
+ "loss": 0.3808,
373
+ "step": 4000
374
+ },
375
+ {
376
+ "epoch": 23.0,
377
+ "eval_explained_variance": 0.3736427976534917,
378
+ "eval_loss": 0.3633384704589844,
379
+ "eval_mae": 0.1356455683708191,
380
+ "eval_mse": 0.039849139750003815,
381
+ "eval_r2": 0.37049292256309013,
382
+ "eval_rmse": 0.1996224969625473,
383
+ "eval_runtime": 63.5905,
384
+ "eval_samples_per_second": 60.449,
385
+ "eval_steps_per_second": 0.959,
386
+ "learning_rate": 0.001,
387
+ "step": 4163
388
+ },
389
+ {
390
+ "epoch": 24.0,
391
+ "eval_explained_variance": 0.3761314291220445,
392
+ "eval_loss": 0.3632254898548126,
393
+ "eval_mae": 0.13934393227100372,
394
+ "eval_mse": 0.03965350612998009,
395
+ "eval_r2": 0.3725149173190659,
396
+ "eval_rmse": 0.19913187623023987,
397
+ "eval_runtime": 63.5074,
398
+ "eval_samples_per_second": 60.528,
399
+ "eval_steps_per_second": 0.961,
400
+ "learning_rate": 0.001,
401
+ "step": 4344
402
+ },
403
+ {
404
+ "epoch": 24.861878453038674,
405
+ "grad_norm": 0.10225138068199158,
406
+ "learning_rate": 0.001,
407
+ "loss": 0.3796,
408
+ "step": 4500
409
+ },
410
+ {
411
+ "epoch": 25.0,
412
+ "eval_explained_variance": 0.37342833555661714,
413
+ "eval_loss": 0.3638208210468292,
414
+ "eval_mae": 0.13812901079654694,
415
+ "eval_mse": 0.03988226130604744,
416
+ "eval_r2": 0.3698107462777432,
417
+ "eval_rmse": 0.19970543682575226,
418
+ "eval_runtime": 64.082,
419
+ "eval_samples_per_second": 59.986,
420
+ "eval_steps_per_second": 0.952,
421
+ "learning_rate": 0.001,
422
+ "step": 4525
423
+ },
424
+ {
425
+ "epoch": 26.0,
426
+ "eval_explained_variance": 0.38356072627581084,
427
+ "eval_loss": 0.3607248365879059,
428
+ "eval_mae": 0.132920041680336,
429
+ "eval_mse": 0.03901772201061249,
430
+ "eval_r2": 0.3818014601715421,
431
+ "eval_rmse": 0.19752904772758484,
432
+ "eval_runtime": 63.8273,
433
+ "eval_samples_per_second": 60.225,
434
+ "eval_steps_per_second": 0.956,
435
+ "learning_rate": 0.0001,
436
+ "step": 4706
437
+ },
438
+ {
439
+ "epoch": 27.0,
440
+ "eval_explained_variance": 0.3877932016666119,
441
+ "eval_loss": 0.3599555194377899,
442
+ "eval_mae": 0.13530299067497253,
443
+ "eval_mse": 0.038680098950862885,
444
+ "eval_r2": 0.3862897971569748,
445
+ "eval_rmse": 0.19667257368564606,
446
+ "eval_runtime": 63.6171,
447
+ "eval_samples_per_second": 60.424,
448
+ "eval_steps_per_second": 0.959,
449
+ "learning_rate": 0.0001,
450
+ "step": 4887
451
+ },
452
+ {
453
+ "epoch": 27.624309392265193,
454
+ "grad_norm": 0.09920254349708557,
455
+ "learning_rate": 0.0001,
456
+ "loss": 0.3765,
457
+ "step": 5000
458
+ },
459
+ {
460
+ "epoch": 28.0,
461
+ "eval_explained_variance": 0.39040088195067185,
462
+ "eval_loss": 0.35923057794570923,
463
+ "eval_mae": 0.13371111452579498,
464
+ "eval_mse": 0.038444750010967255,
465
+ "eval_r2": 0.3893828319749203,
466
+ "eval_rmse": 0.19607332348823547,
467
+ "eval_runtime": 63.6463,
468
+ "eval_samples_per_second": 60.396,
469
+ "eval_steps_per_second": 0.958,
470
+ "learning_rate": 0.0001,
471
+ "step": 5068
472
+ },
473
+ {
474
+ "epoch": 29.0,
475
+ "eval_explained_variance": 0.39147963432165295,
476
+ "eval_loss": 0.3595493733882904,
477
+ "eval_mae": 0.13497120141983032,
478
+ "eval_mse": 0.03846590965986252,
479
+ "eval_r2": 0.3891551349793923,
480
+ "eval_rmse": 0.1961272805929184,
481
+ "eval_runtime": 63.7787,
482
+ "eval_samples_per_second": 60.271,
483
+ "eval_steps_per_second": 0.956,
484
+ "learning_rate": 0.0001,
485
+ "step": 5249
486
+ },
487
+ {
488
+ "epoch": 30.0,
489
+ "eval_explained_variance": 0.3893452011621915,
490
+ "eval_loss": 0.35978832840919495,
491
+ "eval_mae": 0.13498304784297943,
492
+ "eval_mse": 0.03862994909286499,
493
+ "eval_r2": 0.3876274861623127,
494
+ "eval_rmse": 0.19654503464698792,
495
+ "eval_runtime": 63.918,
496
+ "eval_samples_per_second": 60.14,
497
+ "eval_steps_per_second": 0.954,
498
+ "learning_rate": 0.0001,
499
+ "step": 5430
500
+ },
501
+ {
502
+ "epoch": 30.386740331491712,
503
+ "grad_norm": 0.09680859744548798,
504
+ "learning_rate": 0.0001,
505
+ "loss": 0.373,
506
+ "step": 5500
507
+ },
508
+ {
509
+ "epoch": 31.0,
510
+ "eval_explained_variance": 0.39206390655957735,
511
+ "eval_loss": 0.35871124267578125,
512
+ "eval_mae": 0.131711944937706,
513
+ "eval_mse": 0.03838532418012619,
514
+ "eval_r2": 0.39069464683386806,
515
+ "eval_rmse": 0.19592173397541046,
516
+ "eval_runtime": 63.6009,
517
+ "eval_samples_per_second": 60.439,
518
+ "eval_steps_per_second": 0.959,
519
+ "learning_rate": 0.0001,
520
+ "step": 5611
521
+ },
522
+ {
523
+ "epoch": 32.0,
524
+ "eval_explained_variance": 0.39324428943487316,
525
+ "eval_loss": 0.35840144753456116,
526
+ "eval_mae": 0.13263028860092163,
527
+ "eval_mse": 0.0382704883813858,
528
+ "eval_r2": 0.39277553504116497,
529
+ "eval_rmse": 0.19562844932079315,
530
+ "eval_runtime": 63.4174,
531
+ "eval_samples_per_second": 60.614,
532
+ "eval_steps_per_second": 0.962,
533
+ "learning_rate": 0.0001,
534
+ "step": 5792
535
+ },
536
+ {
537
+ "epoch": 33.0,
538
+ "eval_explained_variance": 0.3953018326025743,
539
+ "eval_loss": 0.35809990763664246,
540
+ "eval_mae": 0.13110357522964478,
541
+ "eval_mse": 0.03812328726053238,
542
+ "eval_r2": 0.39453575095056653,
543
+ "eval_rmse": 0.19525185227394104,
544
+ "eval_runtime": 62.9848,
545
+ "eval_samples_per_second": 61.031,
546
+ "eval_steps_per_second": 0.968,
547
+ "learning_rate": 0.0001,
548
+ "step": 5973
549
+ },
550
+ {
551
+ "epoch": 33.149171270718234,
552
+ "grad_norm": 0.10557221621274948,
553
+ "learning_rate": 0.0001,
554
+ "loss": 0.3735,
555
+ "step": 6000
556
+ },
557
+ {
558
+ "epoch": 34.0,
559
+ "eval_explained_variance": 0.3966822119859549,
560
+ "eval_loss": 0.3580343723297119,
561
+ "eval_mae": 0.13232208788394928,
562
+ "eval_mse": 0.038077060133218765,
563
+ "eval_r2": 0.3953078482977419,
564
+ "eval_rmse": 0.19513344764709473,
565
+ "eval_runtime": 63.9448,
566
+ "eval_samples_per_second": 60.114,
567
+ "eval_steps_per_second": 0.954,
568
+ "learning_rate": 0.0001,
569
+ "step": 6154
570
+ },
571
+ {
572
+ "epoch": 35.0,
573
+ "eval_explained_variance": 0.39542460441589355,
574
+ "eval_loss": 0.3578670918941498,
575
+ "eval_mae": 0.13223391771316528,
576
+ "eval_mse": 0.038055673241615295,
577
+ "eval_r2": 0.3949423136793632,
578
+ "eval_rmse": 0.19507862627506256,
579
+ "eval_runtime": 62.8884,
580
+ "eval_samples_per_second": 61.124,
581
+ "eval_steps_per_second": 0.97,
582
+ "learning_rate": 0.0001,
583
+ "step": 6335
584
+ },
585
+ {
586
+ "epoch": 35.91160220994475,
587
+ "grad_norm": 0.11413700878620148,
588
+ "learning_rate": 0.0001,
589
+ "loss": 0.3711,
590
+ "step": 6500
591
+ },
592
+ {
593
+ "epoch": 36.0,
594
+ "eval_explained_variance": 0.38986305548594546,
595
+ "eval_loss": 0.35921958088874817,
596
+ "eval_mae": 0.13451573252677917,
597
+ "eval_mse": 0.0385238379240036,
598
+ "eval_r2": 0.3895210802446932,
599
+ "eval_rmse": 0.19627490639686584,
600
+ "eval_runtime": 63.9244,
601
+ "eval_samples_per_second": 60.134,
602
+ "eval_steps_per_second": 0.954,
603
+ "learning_rate": 0.0001,
604
+ "step": 6516
605
+ },
606
+ {
607
+ "epoch": 37.0,
608
+ "eval_explained_variance": 0.39700071628277117,
609
+ "eval_loss": 0.35754600167274475,
610
+ "eval_mae": 0.13133254647254944,
611
+ "eval_mse": 0.037971220910549164,
612
+ "eval_r2": 0.3965857587136563,
613
+ "eval_rmse": 0.19486205279827118,
614
+ "eval_runtime": 63.3201,
615
+ "eval_samples_per_second": 60.707,
616
+ "eval_steps_per_second": 0.963,
617
+ "learning_rate": 0.0001,
618
+ "step": 6697
619
+ },
620
+ {
621
+ "epoch": 38.0,
622
+ "eval_explained_variance": 0.39355502220300526,
623
+ "eval_loss": 0.35816583037376404,
624
+ "eval_mae": 0.13258841633796692,
625
+ "eval_mse": 0.038258858025074005,
626
+ "eval_r2": 0.39226546341596713,
627
+ "eval_rmse": 0.19559872150421143,
628
+ "eval_runtime": 62.6934,
629
+ "eval_samples_per_second": 61.314,
630
+ "eval_steps_per_second": 0.973,
631
+ "learning_rate": 0.0001,
632
+ "step": 6878
633
+ },
634
+ {
635
+ "epoch": 38.67403314917127,
636
+ "grad_norm": 0.147694930434227,
637
+ "learning_rate": 0.0001,
638
+ "loss": 0.3705,
639
+ "step": 7000
640
+ },
641
+ {
642
+ "epoch": 39.0,
643
+ "eval_explained_variance": 0.3965281844139099,
644
+ "eval_loss": 0.3575587570667267,
645
+ "eval_mae": 0.1313440054655075,
646
+ "eval_mse": 0.03796360641717911,
647
+ "eval_r2": 0.39630358388937376,
648
+ "eval_rmse": 0.19484251737594604,
649
+ "eval_runtime": 62.5891,
650
+ "eval_samples_per_second": 61.416,
651
+ "eval_steps_per_second": 0.975,
652
+ "learning_rate": 0.0001,
653
+ "step": 7059
654
+ },
655
+ {
656
+ "epoch": 40.0,
657
+ "eval_explained_variance": 0.399988224873176,
658
+ "eval_loss": 0.3574675917625427,
659
+ "eval_mae": 0.13325949013233185,
660
+ "eval_mse": 0.03790339455008507,
661
+ "eval_r2": 0.3980004685467563,
662
+ "eval_rmse": 0.19468794763088226,
663
+ "eval_runtime": 63.1438,
664
+ "eval_samples_per_second": 60.877,
665
+ "eval_steps_per_second": 0.966,
666
+ "learning_rate": 0.0001,
667
+ "step": 7240
668
+ },
669
+ {
670
+ "epoch": 41.0,
671
+ "eval_explained_variance": 0.39883482914704543,
672
+ "eval_loss": 0.35797080397605896,
673
+ "eval_mae": 0.13172872364521027,
674
+ "eval_mse": 0.03810995817184448,
675
+ "eval_r2": 0.3955525420135218,
676
+ "eval_rmse": 0.19521771371364594,
677
+ "eval_runtime": 63.8253,
678
+ "eval_samples_per_second": 60.227,
679
+ "eval_steps_per_second": 0.956,
680
+ "learning_rate": 0.0001,
681
+ "step": 7421
682
+ },
683
+ {
684
+ "epoch": 41.43646408839779,
685
+ "grad_norm": 0.13456250727176666,
686
+ "learning_rate": 0.0001,
687
+ "loss": 0.3704,
688
+ "step": 7500
689
+ },
690
+ {
691
+ "epoch": 42.0,
692
+ "eval_explained_variance": 0.39858559003243077,
693
+ "eval_loss": 0.3574862778186798,
694
+ "eval_mae": 0.13303333520889282,
695
+ "eval_mse": 0.03798728436231613,
696
+ "eval_r2": 0.39695276377811434,
697
+ "eval_rmse": 0.19490326941013336,
698
+ "eval_runtime": 67.394,
699
+ "eval_samples_per_second": 57.038,
700
+ "eval_steps_per_second": 0.905,
701
+ "learning_rate": 0.0001,
702
+ "step": 7602
703
+ },
704
+ {
705
+ "epoch": 43.0,
706
+ "eval_explained_variance": 0.40196093229147106,
707
+ "eval_loss": 0.3568632900714874,
708
+ "eval_mae": 0.13252291083335876,
709
+ "eval_mse": 0.03772151470184326,
710
+ "eval_r2": 0.4008098201061217,
711
+ "eval_rmse": 0.19422027468681335,
712
+ "eval_runtime": 64.6291,
713
+ "eval_samples_per_second": 59.478,
714
+ "eval_steps_per_second": 0.944,
715
+ "learning_rate": 0.0001,
716
+ "step": 7783
717
+ },
718
+ {
719
+ "epoch": 44.0,
720
+ "eval_explained_variance": 0.4026290269998404,
721
+ "eval_loss": 0.35680440068244934,
722
+ "eval_mae": 0.13054220378398895,
723
+ "eval_mse": 0.03770707920193672,
724
+ "eval_r2": 0.4009435040202465,
725
+ "eval_rmse": 0.1941831111907959,
726
+ "eval_runtime": 64.3612,
727
+ "eval_samples_per_second": 59.725,
728
+ "eval_steps_per_second": 0.948,
729
+ "learning_rate": 0.0001,
730
+ "step": 7964
731
+ },
732
+ {
733
+ "epoch": 44.19889502762431,
734
+ "grad_norm": 0.12347038835287094,
735
+ "learning_rate": 0.0001,
736
+ "loss": 0.3695,
737
+ "step": 8000
738
+ },
739
+ {
740
+ "epoch": 45.0,
741
+ "eval_explained_variance": 0.40327414182516247,
742
+ "eval_loss": 0.35672253370285034,
743
+ "eval_mae": 0.13190330564975739,
744
+ "eval_mse": 0.03762032091617584,
745
+ "eval_r2": 0.40209036636711937,
746
+ "eval_rmse": 0.193959578871727,
747
+ "eval_runtime": 63.5564,
748
+ "eval_samples_per_second": 60.482,
749
+ "eval_steps_per_second": 0.96,
750
+ "learning_rate": 0.0001,
751
+ "step": 8145
752
+ },
753
+ {
754
+ "epoch": 46.0,
755
+ "eval_explained_variance": 0.4014772314291734,
756
+ "eval_loss": 0.35691043734550476,
757
+ "eval_mae": 0.1298011690378189,
758
+ "eval_mse": 0.03774061053991318,
759
+ "eval_r2": 0.39979262898816803,
760
+ "eval_rmse": 0.19426943361759186,
761
+ "eval_runtime": 63.5835,
762
+ "eval_samples_per_second": 60.456,
763
+ "eval_steps_per_second": 0.959,
764
+ "learning_rate": 0.0001,
765
+ "step": 8326
766
+ },
767
+ {
768
+ "epoch": 46.96132596685083,
769
+ "grad_norm": 0.1476801335811615,
770
+ "learning_rate": 0.0001,
771
+ "loss": 0.369,
772
+ "step": 8500
773
+ },
774
+ {
775
+ "epoch": 47.0,
776
+ "eval_explained_variance": 0.39959606299033534,
777
+ "eval_loss": 0.3573501706123352,
778
+ "eval_mae": 0.12922033667564392,
779
+ "eval_mse": 0.03795965388417244,
780
+ "eval_r2": 0.39734844502667516,
781
+ "eval_rmse": 0.19483236968517303,
782
+ "eval_runtime": 64.1983,
783
+ "eval_samples_per_second": 59.877,
784
+ "eval_steps_per_second": 0.95,
785
+ "learning_rate": 0.0001,
786
+ "step": 8507
787
+ },
788
+ {
789
+ "epoch": 48.0,
790
+ "eval_explained_variance": 0.404104429941911,
791
+ "eval_loss": 0.35634738206863403,
792
+ "eval_mae": 0.13015295565128326,
793
+ "eval_mse": 0.03764864429831505,
794
+ "eval_r2": 0.4019054071784941,
795
+ "eval_rmse": 0.19403257966041565,
796
+ "eval_runtime": 63.8043,
797
+ "eval_samples_per_second": 60.247,
798
+ "eval_steps_per_second": 0.956,
799
+ "learning_rate": 0.0001,
800
+ "step": 8688
801
+ },
802
+ {
803
+ "epoch": 49.0,
804
+ "eval_explained_variance": 0.4024105530518752,
805
+ "eval_loss": 0.3566192090511322,
806
+ "eval_mae": 0.1305515021085739,
807
+ "eval_mse": 0.03765449672937393,
808
+ "eval_r2": 0.40112185894390806,
809
+ "eval_rmse": 0.19404765963554382,
810
+ "eval_runtime": 65.5486,
811
+ "eval_samples_per_second": 58.644,
812
+ "eval_steps_per_second": 0.931,
813
+ "learning_rate": 0.0001,
814
+ "step": 8869
815
+ },
816
+ {
817
+ "epoch": 49.72375690607735,
818
+ "grad_norm": 0.17585940659046173,
819
+ "learning_rate": 0.0001,
820
+ "loss": 0.3691,
821
+ "step": 9000
822
+ },
823
+ {
824
+ "epoch": 50.0,
825
+ "eval_explained_variance": 0.40147255475704485,
826
+ "eval_loss": 0.3571104109287262,
827
+ "eval_mae": 0.13218748569488525,
828
+ "eval_mse": 0.0377979539334774,
829
+ "eval_r2": 0.39978904676068683,
830
+ "eval_rmse": 0.19441695511341095,
831
+ "eval_runtime": 64.6683,
832
+ "eval_samples_per_second": 59.442,
833
+ "eval_steps_per_second": 0.943,
834
+ "learning_rate": 0.0001,
835
+ "step": 9050
836
+ },
837
+ {
838
+ "epoch": 51.0,
839
+ "eval_explained_variance": 0.4020539063673753,
840
+ "eval_loss": 0.3584417402744293,
841
+ "eval_mae": 0.13350461423397064,
842
+ "eval_mse": 0.03811892494559288,
843
+ "eval_r2": 0.39583579837070054,
844
+ "eval_rmse": 0.19524069130420685,
845
+ "eval_runtime": 64.7621,
846
+ "eval_samples_per_second": 59.356,
847
+ "eval_steps_per_second": 0.942,
848
+ "learning_rate": 0.0001,
849
+ "step": 9231
850
+ },
851
+ {
852
+ "epoch": 52.0,
853
+ "eval_explained_variance": 0.4045378336539635,
854
+ "eval_loss": 0.3561328649520874,
855
+ "eval_mae": 0.1308905929327011,
856
+ "eval_mse": 0.03748491033911705,
857
+ "eval_r2": 0.4042346756357482,
858
+ "eval_rmse": 0.19361020624637604,
859
+ "eval_runtime": 64.77,
860
+ "eval_samples_per_second": 59.349,
861
+ "eval_steps_per_second": 0.942,
862
+ "learning_rate": 0.0001,
863
+ "step": 9412
864
+ },
865
+ {
866
+ "epoch": 52.48618784530387,
867
+ "grad_norm": 0.15689648687839508,
868
+ "learning_rate": 0.0001,
869
+ "loss": 0.3677,
870
+ "step": 9500
871
+ },
872
+ {
873
+ "epoch": 53.0,
874
+ "eval_explained_variance": 0.4053275997822101,
875
+ "eval_loss": 0.35652926564216614,
876
+ "eval_mae": 0.13147617876529694,
877
+ "eval_mse": 0.03759394586086273,
878
+ "eval_r2": 0.4026062075156075,
879
+ "eval_rmse": 0.19389158487319946,
880
+ "eval_runtime": 64.8021,
881
+ "eval_samples_per_second": 59.319,
882
+ "eval_steps_per_second": 0.941,
883
+ "learning_rate": 0.0001,
884
+ "step": 9593
885
+ },
886
+ {
887
+ "epoch": 54.0,
888
+ "eval_explained_variance": 0.401798074062054,
889
+ "eval_loss": 0.3567388355731964,
890
+ "eval_mae": 0.13164331018924713,
891
+ "eval_mse": 0.03773793205618858,
892
+ "eval_r2": 0.40105556644385676,
893
+ "eval_rmse": 0.1942625343799591,
894
+ "eval_runtime": 65.4024,
895
+ "eval_samples_per_second": 58.775,
896
+ "eval_steps_per_second": 0.933,
897
+ "learning_rate": 0.0001,
898
+ "step": 9774
899
+ },
900
+ {
901
+ "epoch": 55.0,
902
+ "eval_explained_variance": 0.40524112719755906,
903
+ "eval_loss": 0.35645580291748047,
904
+ "eval_mae": 0.1291799694299698,
905
+ "eval_mse": 0.03761202096939087,
906
+ "eval_r2": 0.40258003846192925,
907
+ "eval_rmse": 0.19393819570541382,
908
+ "eval_runtime": 65.1148,
909
+ "eval_samples_per_second": 59.034,
910
+ "eval_steps_per_second": 0.937,
911
+ "learning_rate": 0.0001,
912
+ "step": 9955
913
+ },
914
+ {
915
+ "epoch": 55.248618784530386,
916
+ "grad_norm": 0.14432880282402039,
917
+ "learning_rate": 0.0001,
918
+ "loss": 0.3684,
919
+ "step": 10000
920
+ },
921
+ {
922
+ "epoch": 56.0,
923
+ "eval_explained_variance": 0.40458508179737973,
924
+ "eval_loss": 0.35665351152420044,
925
+ "eval_mae": 0.12790292501449585,
926
+ "eval_mse": 0.03767779469490051,
927
+ "eval_r2": 0.40173746899832624,
928
+ "eval_rmse": 0.19410768151283264,
929
+ "eval_runtime": 64.7859,
930
+ "eval_samples_per_second": 59.334,
931
+ "eval_steps_per_second": 0.942,
932
+ "learning_rate": 0.0001,
933
+ "step": 10136
934
+ },
935
+ {
936
+ "epoch": 57.0,
937
+ "eval_explained_variance": 0.40489131670731765,
938
+ "eval_loss": 0.35622259974479675,
939
+ "eval_mae": 0.12940338253974915,
940
+ "eval_mse": 0.03757502883672714,
941
+ "eval_r2": 0.40317412718530543,
942
+ "eval_rmse": 0.1938427984714508,
943
+ "eval_runtime": 64.354,
944
+ "eval_samples_per_second": 59.732,
945
+ "eval_steps_per_second": 0.948,
946
+ "learning_rate": 0.0001,
947
+ "step": 10317
948
+ },
949
+ {
950
+ "epoch": 58.0,
951
+ "eval_explained_variance": 0.40618401765823364,
952
+ "eval_loss": 0.35649776458740234,
953
+ "eval_mae": 0.12992320954799652,
954
+ "eval_mse": 0.03755363076925278,
955
+ "eval_r2": 0.40359610267984325,
956
+ "eval_rmse": 0.1937875896692276,
957
+ "eval_runtime": 63.5875,
958
+ "eval_samples_per_second": 60.452,
959
+ "eval_steps_per_second": 0.959,
960
+ "learning_rate": 0.0001,
961
+ "step": 10498
962
+ },
963
+ {
964
+ "epoch": 58.011049723756905,
965
+ "grad_norm": 0.17977654933929443,
966
+ "learning_rate": 1e-05,
967
+ "loss": 0.368,
968
+ "step": 10500
969
+ },
970
+ {
971
+ "epoch": 59.0,
972
+ "eval_explained_variance": 0.40612818186099714,
973
+ "eval_loss": 0.3559414744377136,
974
+ "eval_mae": 0.1292232871055603,
975
+ "eval_mse": 0.037484604865312576,
976
+ "eval_r2": 0.404684355302516,
977
+ "eval_rmse": 0.19360941648483276,
978
+ "eval_runtime": 63.0292,
979
+ "eval_samples_per_second": 60.988,
980
+ "eval_steps_per_second": 0.968,
981
+ "learning_rate": 1e-05,
982
+ "step": 10679
983
+ },
984
+ {
985
+ "epoch": 60.0,
986
+ "eval_explained_variance": 0.4082453021636376,
987
+ "eval_loss": 0.35587525367736816,
988
+ "eval_mae": 0.1295480728149414,
989
+ "eval_mse": 0.03739844262599945,
990
+ "eval_r2": 0.40598491760734956,
991
+ "eval_rmse": 0.1933867633342743,
992
+ "eval_runtime": 67.1089,
993
+ "eval_samples_per_second": 57.28,
994
+ "eval_steps_per_second": 0.909,
995
+ "learning_rate": 1e-05,
996
+ "step": 10860
997
+ },
998
+ {
999
+ "epoch": 60.773480662983424,
1000
+ "grad_norm": 0.1965423822402954,
1001
+ "learning_rate": 1e-05,
1002
+ "loss": 0.3664,
1003
+ "step": 11000
1004
+ },
1005
+ {
1006
+ "epoch": 61.0,
1007
+ "eval_explained_variance": 0.4074813173367427,
1008
+ "eval_loss": 0.35549554228782654,
1009
+ "eval_mae": 0.13036619126796722,
1010
+ "eval_mse": 0.03731352090835571,
1011
+ "eval_r2": 0.40719759569271147,
1012
+ "eval_rmse": 0.1931670755147934,
1013
+ "eval_runtime": 62.4919,
1014
+ "eval_samples_per_second": 61.512,
1015
+ "eval_steps_per_second": 0.976,
1016
+ "learning_rate": 1e-05,
1017
+ "step": 11041
1018
+ },
1019
+ {
1020
+ "epoch": 62.0,
1021
+ "eval_explained_variance": 0.4057550017650311,
1022
+ "eval_loss": 0.3564907908439636,
1023
+ "eval_mae": 0.13166674971580505,
1024
+ "eval_mse": 0.03761378303170204,
1025
+ "eval_r2": 0.4036480162510964,
1026
+ "eval_rmse": 0.19394272565841675,
1027
+ "eval_runtime": 64.0633,
1028
+ "eval_samples_per_second": 60.003,
1029
+ "eval_steps_per_second": 0.952,
1030
+ "learning_rate": 1e-05,
1031
+ "step": 11222
1032
+ },
1033
+ {
1034
+ "epoch": 63.0,
1035
+ "eval_explained_variance": 0.4086620624248798,
1036
+ "eval_loss": 0.35556313395500183,
1037
+ "eval_mae": 0.12934741377830505,
1038
+ "eval_mse": 0.03726600110530853,
1039
+ "eval_r2": 0.40751167332410276,
1040
+ "eval_rmse": 0.1930440366268158,
1041
+ "eval_runtime": 63.2366,
1042
+ "eval_samples_per_second": 60.788,
1043
+ "eval_steps_per_second": 0.965,
1044
+ "learning_rate": 1e-05,
1045
+ "step": 11403
1046
+ },
1047
+ {
1048
+ "epoch": 63.53591160220994,
1049
+ "grad_norm": 0.1525866687297821,
1050
+ "learning_rate": 1e-05,
1051
+ "loss": 0.366,
1052
+ "step": 11500
1053
+ },
1054
+ {
1055
+ "epoch": 64.0,
1056
+ "eval_explained_variance": 0.40886356280400205,
1057
+ "eval_loss": 0.35541364550590515,
1058
+ "eval_mae": 0.1295996755361557,
1059
+ "eval_mse": 0.03727412968873978,
1060
+ "eval_r2": 0.40770017250386054,
1061
+ "eval_rmse": 0.1930650919675827,
1062
+ "eval_runtime": 63.8539,
1063
+ "eval_samples_per_second": 60.2,
1064
+ "eval_steps_per_second": 0.955,
1065
+ "learning_rate": 1e-05,
1066
+ "step": 11584
1067
+ },
1068
+ {
1069
+ "epoch": 65.0,
1070
+ "eval_explained_variance": 0.40589494430101836,
1071
+ "eval_loss": 0.35602322220802307,
1072
+ "eval_mae": 0.13072702288627625,
1073
+ "eval_mse": 0.03753972053527832,
1074
+ "eval_r2": 0.4048648390836954,
1075
+ "eval_rmse": 0.19375169277191162,
1076
+ "eval_runtime": 63.5254,
1077
+ "eval_samples_per_second": 60.511,
1078
+ "eval_steps_per_second": 0.96,
1079
+ "learning_rate": 1e-05,
1080
+ "step": 11765
1081
+ },
1082
+ {
1083
+ "epoch": 66.0,
1084
+ "eval_explained_variance": 0.4085214688227727,
1085
+ "eval_loss": 0.35534363985061646,
1086
+ "eval_mae": 0.13003438711166382,
1087
+ "eval_mse": 0.03723596781492233,
1088
+ "eval_r2": 0.40801214840672984,
1089
+ "eval_rmse": 0.19296623766422272,
1090
+ "eval_runtime": 66.0061,
1091
+ "eval_samples_per_second": 58.237,
1092
+ "eval_steps_per_second": 0.924,
1093
+ "learning_rate": 1e-05,
1094
+ "step": 11946
1095
+ },
1096
+ {
1097
+ "epoch": 66.29834254143647,
1098
+ "grad_norm": 0.18801870942115784,
1099
+ "learning_rate": 1e-05,
1100
+ "loss": 0.3654,
1101
+ "step": 12000
1102
+ },
1103
+ {
1104
+ "epoch": 67.0,
1105
+ "eval_explained_variance": 0.4081741479726938,
1106
+ "eval_loss": 0.3554227948188782,
1107
+ "eval_mae": 0.12988974153995514,
1108
+ "eval_mse": 0.03726029023528099,
1109
+ "eval_r2": 0.4077790726698564,
1110
+ "eval_rmse": 0.1930292397737503,
1111
+ "eval_runtime": 65.2859,
1112
+ "eval_samples_per_second": 58.879,
1113
+ "eval_steps_per_second": 0.934,
1114
+ "learning_rate": 1e-05,
1115
+ "step": 12127
1116
+ },
1117
+ {
1118
+ "epoch": 68.0,
1119
+ "eval_explained_variance": 0.4073961698091947,
1120
+ "eval_loss": 0.35557952523231506,
1121
+ "eval_mae": 0.13015064597129822,
1122
+ "eval_mse": 0.03740492835640907,
1123
+ "eval_r2": 0.4058588832439236,
1124
+ "eval_rmse": 0.19340354204177856,
1125
+ "eval_runtime": 65.1267,
1126
+ "eval_samples_per_second": 59.023,
1127
+ "eval_steps_per_second": 0.937,
1128
+ "learning_rate": 1e-05,
1129
+ "step": 12308
1130
+ },
1131
+ {
1132
+ "epoch": 69.0,
1133
+ "eval_explained_variance": 0.4085943423784696,
1134
+ "eval_loss": 0.3553701937198639,
1135
+ "eval_mae": 0.12976409494876862,
1136
+ "eval_mse": 0.03725024312734604,
1137
+ "eval_r2": 0.40825238050595736,
1138
+ "eval_rmse": 0.19300322234630585,
1139
+ "eval_runtime": 64.7301,
1140
+ "eval_samples_per_second": 59.385,
1141
+ "eval_steps_per_second": 0.942,
1142
+ "learning_rate": 1e-05,
1143
+ "step": 12489
1144
+ },
1145
+ {
1146
+ "epoch": 69.06077348066299,
1147
+ "grad_norm": 0.15430860221385956,
1148
+ "learning_rate": 1e-05,
1149
+ "loss": 0.3658,
1150
+ "step": 12500
1151
+ },
1152
+ {
1153
+ "epoch": 70.0,
1154
+ "eval_explained_variance": 0.4094207286834717,
1155
+ "eval_loss": 0.35594871640205383,
1156
+ "eval_mae": 0.13069316744804382,
1157
+ "eval_mse": 0.03737233206629753,
1158
+ "eval_r2": 0.40659481251933094,
1159
+ "eval_rmse": 0.19331924617290497,
1160
+ "eval_runtime": 66.4386,
1161
+ "eval_samples_per_second": 57.858,
1162
+ "eval_steps_per_second": 0.918,
1163
+ "learning_rate": 1e-05,
1164
+ "step": 12670
1165
+ },
1166
+ {
1167
+ "epoch": 71.0,
1168
+ "eval_explained_variance": 0.40725430158468395,
1169
+ "eval_loss": 0.35573798418045044,
1170
+ "eval_mae": 0.1295761913061142,
1171
+ "eval_mse": 0.037380401045084,
1172
+ "eval_r2": 0.40697699949296745,
1173
+ "eval_rmse": 0.19334012269973755,
1174
+ "eval_runtime": 65.624,
1175
+ "eval_samples_per_second": 58.576,
1176
+ "eval_steps_per_second": 0.93,
1177
+ "learning_rate": 1e-05,
1178
+ "step": 12851
1179
+ },
1180
+ {
1181
+ "epoch": 71.8232044198895,
1182
+ "grad_norm": 0.35482099652290344,
1183
+ "learning_rate": 1e-05,
1184
+ "loss": 0.366,
1185
+ "step": 13000
1186
+ },
1187
+ {
1188
+ "epoch": 72.0,
1189
+ "eval_explained_variance": 0.40842239214823794,
1190
+ "eval_loss": 0.35571375489234924,
1191
+ "eval_mae": 0.13028408586978912,
1192
+ "eval_mse": 0.03734128177165985,
1193
+ "eval_r2": 0.40698361470433536,
1194
+ "eval_rmse": 0.19323892891407013,
1195
+ "eval_runtime": 64.0529,
1196
+ "eval_samples_per_second": 60.013,
1197
+ "eval_steps_per_second": 0.952,
1198
+ "learning_rate": 1e-05,
1199
+ "step": 13032
1200
+ },
1201
+ {
1202
+ "epoch": 73.0,
1203
+ "eval_explained_variance": 0.4089708603345431,
1204
+ "eval_loss": 0.3552262485027313,
1205
+ "eval_mae": 0.12985268235206604,
1206
+ "eval_mse": 0.037223465740680695,
1207
+ "eval_r2": 0.408222457948687,
1208
+ "eval_rmse": 0.1929338425397873,
1209
+ "eval_runtime": 65.5971,
1210
+ "eval_samples_per_second": 58.6,
1211
+ "eval_steps_per_second": 0.93,
1212
+ "learning_rate": 1.0000000000000002e-06,
1213
+ "step": 13213
1214
+ },
1215
+ {
1216
+ "epoch": 74.0,
1217
+ "eval_explained_variance": 0.40937405824661255,
1218
+ "eval_loss": 0.35516515374183655,
1219
+ "eval_mae": 0.1281428188085556,
1220
+ "eval_mse": 0.03721009939908981,
1221
+ "eval_r2": 0.4087432799234766,
1222
+ "eval_rmse": 0.1928991973400116,
1223
+ "eval_runtime": 63.5094,
1224
+ "eval_samples_per_second": 60.526,
1225
+ "eval_steps_per_second": 0.96,
1226
+ "learning_rate": 1.0000000000000002e-06,
1227
+ "step": 13394
1228
+ },
1229
+ {
1230
+ "epoch": 74.58563535911603,
1231
+ "grad_norm": 0.20831693708896637,
1232
+ "learning_rate": 1.0000000000000002e-06,
1233
+ "loss": 0.3654,
1234
+ "step": 13500
1235
+ },
1236
+ {
1237
+ "epoch": 75.0,
1238
+ "eval_explained_variance": 0.40568819871315587,
1239
+ "eval_loss": 0.3558255434036255,
1240
+ "eval_mae": 0.13025221228599548,
1241
+ "eval_mse": 0.037474822252988815,
1242
+ "eval_r2": 0.40474793306670837,
1243
+ "eval_rmse": 0.193584144115448,
1244
+ "eval_runtime": 63.853,
1245
+ "eval_samples_per_second": 60.201,
1246
+ "eval_steps_per_second": 0.955,
1247
+ "learning_rate": 1.0000000000000002e-06,
1248
+ "step": 13575
1249
+ },
1250
+ {
1251
+ "epoch": 76.0,
1252
+ "eval_explained_variance": 0.408390985085414,
1253
+ "eval_loss": 0.3555220663547516,
1254
+ "eval_mae": 0.12769028544425964,
1255
+ "eval_mse": 0.03735670447349548,
1256
+ "eval_r2": 0.40610327648301114,
1257
+ "eval_rmse": 0.19327881932258606,
1258
+ "eval_runtime": 66.3493,
1259
+ "eval_samples_per_second": 57.936,
1260
+ "eval_steps_per_second": 0.919,
1261
+ "learning_rate": 1.0000000000000002e-06,
1262
+ "step": 13756
1263
+ },
1264
+ {
1265
+ "epoch": 77.0,
1266
+ "eval_explained_variance": 0.4046147374006418,
1267
+ "eval_loss": 0.35615718364715576,
1268
+ "eval_mae": 0.13205072283744812,
1269
+ "eval_mse": 0.037551261484622955,
1270
+ "eval_r2": 0.4042150129069256,
1271
+ "eval_rmse": 0.19378148019313812,
1272
+ "eval_runtime": 65.1729,
1273
+ "eval_samples_per_second": 58.982,
1274
+ "eval_steps_per_second": 0.936,
1275
+ "learning_rate": 1.0000000000000002e-06,
1276
+ "step": 13937
1277
+ },
1278
+ {
1279
+ "epoch": 77.34806629834254,
1280
+ "grad_norm": 0.20255261659622192,
1281
+ "learning_rate": 1.0000000000000002e-06,
1282
+ "loss": 0.3663,
1283
+ "step": 14000
1284
+ },
1285
+ {
1286
+ "epoch": 78.0,
1287
+ "eval_explained_variance": 0.4090478007610028,
1288
+ "eval_loss": 0.35527750849723816,
1289
+ "eval_mae": 0.13062655925750732,
1290
+ "eval_mse": 0.037214502692222595,
1291
+ "eval_r2": 0.4086604768416133,
1292
+ "eval_rmse": 0.19291061162948608,
1293
+ "eval_runtime": 66.5281,
1294
+ "eval_samples_per_second": 57.78,
1295
+ "eval_steps_per_second": 0.917,
1296
+ "learning_rate": 1.0000000000000002e-06,
1297
+ "step": 14118
1298
+ },
1299
+ {
1300
+ "epoch": 79.0,
1301
+ "eval_explained_variance": 0.4019758334526649,
1302
+ "eval_loss": 0.3569395840167999,
1303
+ "eval_mae": 0.13103225827217102,
1304
+ "eval_mse": 0.037889137864112854,
1305
+ "eval_r2": 0.3999096598660514,
1306
+ "eval_rmse": 0.19465132057666779,
1307
+ "eval_runtime": 65.9236,
1308
+ "eval_samples_per_second": 58.31,
1309
+ "eval_steps_per_second": 0.925,
1310
+ "learning_rate": 1.0000000000000002e-06,
1311
+ "step": 14299
1312
+ },
1313
+ {
1314
+ "epoch": 80.0,
1315
+ "eval_explained_variance": 0.4057845427439763,
1316
+ "eval_loss": 0.35627198219299316,
1317
+ "eval_mae": 0.13107524812221527,
1318
+ "eval_mse": 0.037464920431375504,
1319
+ "eval_r2": 0.40523034358958093,
1320
+ "eval_rmse": 0.19355857372283936,
1321
+ "eval_runtime": 66.5566,
1322
+ "eval_samples_per_second": 57.755,
1323
+ "eval_steps_per_second": 0.917,
1324
+ "learning_rate": 1.0000000000000002e-06,
1325
+ "step": 14480
1326
+ },
1327
+ {
1328
+ "epoch": 80.11049723756906,
1329
+ "grad_norm": 0.18743179738521576,
1330
+ "learning_rate": 1.0000000000000002e-07,
1331
+ "loss": 0.3655,
1332
+ "step": 14500
1333
+ },
1334
+ {
1335
+ "epoch": 81.0,
1336
+ "eval_explained_variance": 0.4091951067631061,
1337
+ "eval_loss": 0.3555302619934082,
1338
+ "eval_mae": 0.13077440857887268,
1339
+ "eval_mse": 0.037267763167619705,
1340
+ "eval_r2": 0.4078657020062894,
1341
+ "eval_rmse": 0.1930485963821411,
1342
+ "eval_runtime": 67.6736,
1343
+ "eval_samples_per_second": 56.802,
1344
+ "eval_steps_per_second": 0.901,
1345
+ "learning_rate": 1.0000000000000002e-07,
1346
+ "step": 14661
1347
+ },
1348
+ {
1349
+ "epoch": 82.0,
1350
+ "eval_explained_variance": 0.408656867650839,
1351
+ "eval_loss": 0.35563620924949646,
1352
+ "eval_mae": 0.13087815046310425,
1353
+ "eval_mse": 0.03731405362486839,
1354
+ "eval_r2": 0.4071799006076709,
1355
+ "eval_rmse": 0.19316846132278442,
1356
+ "eval_runtime": 68.4549,
1357
+ "eval_samples_per_second": 56.154,
1358
+ "eval_steps_per_second": 0.891,
1359
+ "learning_rate": 1.0000000000000002e-07,
1360
+ "step": 14842
1361
+ },
1362
+ {
1363
+ "epoch": 82.87292817679558,
1364
+ "grad_norm": 0.20405510067939758,
1365
+ "learning_rate": 1.0000000000000002e-07,
1366
+ "loss": 0.3651,
1367
+ "step": 15000
1368
+ },
1369
+ {
1370
+ "epoch": 83.0,
1371
+ "eval_explained_variance": 0.41021374555734486,
1372
+ "eval_loss": 0.35571029782295227,
1373
+ "eval_mae": 0.13036301732063293,
1374
+ "eval_mse": 0.03731907904148102,
1375
+ "eval_r2": 0.4073602568430592,
1376
+ "eval_rmse": 0.19318147003650665,
1377
+ "eval_runtime": 68.1325,
1378
+ "eval_samples_per_second": 56.419,
1379
+ "eval_steps_per_second": 0.895,
1380
+ "learning_rate": 1.0000000000000002e-07,
1381
+ "step": 15023
1382
+ },
1383
+ {
1384
+ "epoch": 84.0,
1385
+ "eval_explained_variance": 0.4082063390658452,
1386
+ "eval_loss": 0.35581377148628235,
1387
+ "eval_mae": 0.1305844783782959,
1388
+ "eval_mse": 0.037393905222415924,
1389
+ "eval_r2": 0.4062799764902456,
1390
+ "eval_rmse": 0.19337503612041473,
1391
+ "eval_runtime": 66.3433,
1392
+ "eval_samples_per_second": 57.941,
1393
+ "eval_steps_per_second": 0.919,
1394
+ "learning_rate": 1.0000000000000002e-07,
1395
+ "step": 15204
1396
+ },
1397
+ {
1398
+ "epoch": 84.0,
1399
+ "learning_rate": 1.0000000000000002e-07,
1400
+ "step": 15204,
1401
+ "total_flos": 2.180798470217171e+19,
1402
+ "train_loss": 0.37467605181350044,
1403
+ "train_runtime": 24668.9414,
1404
+ "train_samples_per_second": 46.707,
1405
+ "train_steps_per_second": 0.734
1406
+ }
1407
+ ],
1408
+ "logging_steps": 500,
1409
+ "max_steps": 18100,
1410
+ "num_input_tokens_seen": 0,
1411
+ "num_train_epochs": 100,
1412
+ "save_steps": 500,
1413
+ "stateful_callbacks": {
1414
+ "EarlyStoppingCallback": {
1415
+ "args": {
1416
+ "early_stopping_patience": 10,
1417
+ "early_stopping_threshold": 0.0
1418
+ },
1419
+ "attributes": {
1420
+ "early_stopping_patience_counter": 0
1421
+ }
1422
+ },
1423
+ "TrainerControl": {
1424
+ "args": {
1425
+ "should_epoch_stop": false,
1426
+ "should_evaluate": false,
1427
+ "should_log": false,
1428
+ "should_save": true,
1429
+ "should_training_stop": true
1430
+ },
1431
+ "attributes": {}
1432
+ }
1433
+ },
1434
+ "total_flos": 2.180798470217171e+19,
1435
+ "train_batch_size": 64,
1436
+ "trial_name": null,
1437
+ "trial_params": null
1438
+ }