rshrott commited on
Commit
e9867c0
1 Parent(s): 9847c3c

🍻 cheers

Browse files
README.md CHANGED
@@ -2,6 +2,7 @@
2
  license: apache-2.0
3
  base_model: google/vit-base-patch16-224-in21k
4
  tags:
 
5
  - generated_from_trainer
6
  model-index:
7
  - name: ryan03302024
@@ -13,12 +14,12 @@ should probably proofread and complete it, then remove this comment. -->
13
 
14
  # ryan03302024
15
 
16
- This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the None dataset.
17
  It achieves the following results on the evaluation set:
18
- - Loss: 0.2520
19
- - Ordinal Mae: 0.3265
20
- - Ordinal Accuracy: 0.7167
21
- - Na Accuracy: 0.8144
22
 
23
  ## Model description
24
 
 
2
  license: apache-2.0
3
  base_model: google/vit-base-patch16-224-in21k
4
  tags:
5
+ - image-classification
6
  - generated_from_trainer
7
  model-index:
8
  - name: ryan03302024
 
14
 
15
  # ryan03302024
16
 
17
+ This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the properties dataset.
18
  It achieves the following results on the evaluation set:
19
+ - Loss: 0.2044
20
+ - Ordinal Mae: 0.4324
21
+ - Ordinal Accuracy: 0.6648
22
+ - Na Accuracy: 0.8333
23
 
24
  ## Model description
25
 
all_results.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "eval_loss": 0.20436730980873108,
4
+ "eval_na_accuracy": 0.8333333134651184,
5
+ "eval_ordinal_accuracy": 0.6647829413414001,
6
+ "eval_ordinal_mae": 0.4324062764644623,
7
+ "eval_runtime": 186.5896,
8
+ "eval_samples_per_second": 23.983,
9
+ "eval_steps_per_second": 3.001,
10
+ "train_loss": 0.13915026724546314,
11
+ "train_runtime": 27077.9759,
12
+ "train_samples_per_second": 4.916,
13
+ "train_steps_per_second": 0.307
14
+ }
eval_results.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "eval_loss": 0.20436730980873108,
4
+ "eval_na_accuracy": 0.8333333134651184,
5
+ "eval_ordinal_accuracy": 0.6647829413414001,
6
+ "eval_ordinal_mae": 0.4324062764644623,
7
+ "eval_runtime": 186.5896,
8
+ "eval_samples_per_second": 23.983,
9
+ "eval_steps_per_second": 3.001
10
+ }
runs/Mar30_19-49-55_ryanserver/events.out.tfevents.1711869903.ryanserver.6566.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ccc3d7fc01fcf87fcb8169c4396f44c003495bee72670fa00d8911172e0535e3
3
+ size 529
train_results.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "train_loss": 0.13915026724546314,
4
+ "train_runtime": 27077.9759,
5
+ "train_samples_per_second": 4.916,
6
+ "train_steps_per_second": 0.307
7
+ }
trainer_state.json ADDED
@@ -0,0 +1,3267 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.20436730980873108,
3
+ "best_model_checkpoint": "./ryan03302024/checkpoint-2700",
4
+ "epoch": 3.0,
5
+ "eval_steps": 100,
6
+ "global_step": 8319,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.01,
13
+ "grad_norm": 0.9509561657905579,
14
+ "learning_rate": 9.969948311095084e-05,
15
+ "loss": 0.5096,
16
+ "step": 25
17
+ },
18
+ {
19
+ "epoch": 0.02,
20
+ "grad_norm": 0.5529997944831848,
21
+ "learning_rate": 9.939896622190168e-05,
22
+ "loss": 0.3519,
23
+ "step": 50
24
+ },
25
+ {
26
+ "epoch": 0.03,
27
+ "grad_norm": 1.6374471187591553,
28
+ "learning_rate": 9.909844933285252e-05,
29
+ "loss": 0.3431,
30
+ "step": 75
31
+ },
32
+ {
33
+ "epoch": 0.04,
34
+ "grad_norm": 1.235526204109192,
35
+ "learning_rate": 9.879793244380335e-05,
36
+ "loss": 0.3617,
37
+ "step": 100
38
+ },
39
+ {
40
+ "epoch": 0.04,
41
+ "eval_loss": 0.3133379817008972,
42
+ "eval_na_accuracy": 0.876288652420044,
43
+ "eval_ordinal_accuracy": 0.41741588711738586,
44
+ "eval_ordinal_mae": 0.8465587496757507,
45
+ "eval_runtime": 338.0185,
46
+ "eval_samples_per_second": 13.239,
47
+ "eval_steps_per_second": 1.657,
48
+ "step": 100
49
+ },
50
+ {
51
+ "epoch": 0.05,
52
+ "grad_norm": 1.0370614528656006,
53
+ "learning_rate": 9.849741555475418e-05,
54
+ "loss": 0.3721,
55
+ "step": 125
56
+ },
57
+ {
58
+ "epoch": 0.05,
59
+ "grad_norm": 1.619168996810913,
60
+ "learning_rate": 9.819689866570502e-05,
61
+ "loss": 0.3498,
62
+ "step": 150
63
+ },
64
+ {
65
+ "epoch": 0.06,
66
+ "grad_norm": 0.6790041923522949,
67
+ "learning_rate": 9.789638177665586e-05,
68
+ "loss": 0.3633,
69
+ "step": 175
70
+ },
71
+ {
72
+ "epoch": 0.07,
73
+ "grad_norm": 0.47009700536727905,
74
+ "learning_rate": 9.759586488760669e-05,
75
+ "loss": 0.2891,
76
+ "step": 200
77
+ },
78
+ {
79
+ "epoch": 0.07,
80
+ "eval_loss": 0.3022407293319702,
81
+ "eval_na_accuracy": 0.699312686920166,
82
+ "eval_ordinal_accuracy": 0.5044952630996704,
83
+ "eval_ordinal_mae": 0.7738743424415588,
84
+ "eval_runtime": 195.8973,
85
+ "eval_samples_per_second": 22.844,
86
+ "eval_steps_per_second": 2.859,
87
+ "step": 200
88
+ },
89
+ {
90
+ "epoch": 0.08,
91
+ "grad_norm": 1.6545381546020508,
92
+ "learning_rate": 9.73073686741195e-05,
93
+ "loss": 0.3277,
94
+ "step": 225
95
+ },
96
+ {
97
+ "epoch": 0.09,
98
+ "grad_norm": 0.2690508961677551,
99
+ "learning_rate": 9.700685178507033e-05,
100
+ "loss": 0.3505,
101
+ "step": 250
102
+ },
103
+ {
104
+ "epoch": 0.1,
105
+ "grad_norm": 2.571977138519287,
106
+ "learning_rate": 9.670633489602116e-05,
107
+ "loss": 0.2819,
108
+ "step": 275
109
+ },
110
+ {
111
+ "epoch": 0.11,
112
+ "grad_norm": 1.7339119911193848,
113
+ "learning_rate": 9.6405818006972e-05,
114
+ "loss": 0.3163,
115
+ "step": 300
116
+ },
117
+ {
118
+ "epoch": 0.11,
119
+ "eval_loss": 0.26153865456581116,
120
+ "eval_na_accuracy": 0.7903780341148376,
121
+ "eval_ordinal_accuracy": 0.5597226023674011,
122
+ "eval_ordinal_mae": 0.6166509985923767,
123
+ "eval_runtime": 200.5643,
124
+ "eval_samples_per_second": 22.312,
125
+ "eval_steps_per_second": 2.792,
126
+ "step": 300
127
+ },
128
+ {
129
+ "epoch": 0.12,
130
+ "grad_norm": 0.86951744556427,
131
+ "learning_rate": 9.610530111792283e-05,
132
+ "loss": 0.2707,
133
+ "step": 325
134
+ },
135
+ {
136
+ "epoch": 0.13,
137
+ "grad_norm": 0.3769865036010742,
138
+ "learning_rate": 9.580478422887367e-05,
139
+ "loss": 0.2917,
140
+ "step": 350
141
+ },
142
+ {
143
+ "epoch": 0.14,
144
+ "grad_norm": 0.39000704884529114,
145
+ "learning_rate": 9.55042673398245e-05,
146
+ "loss": 0.2765,
147
+ "step": 375
148
+ },
149
+ {
150
+ "epoch": 0.14,
151
+ "grad_norm": 0.7430490255355835,
152
+ "learning_rate": 9.520375045077534e-05,
153
+ "loss": 0.2781,
154
+ "step": 400
155
+ },
156
+ {
157
+ "epoch": 0.14,
158
+ "eval_loss": 0.2597866952419281,
159
+ "eval_na_accuracy": 0.8298969268798828,
160
+ "eval_ordinal_accuracy": 0.5936295986175537,
161
+ "eval_ordinal_mae": 0.5432500839233398,
162
+ "eval_runtime": 199.7937,
163
+ "eval_samples_per_second": 22.398,
164
+ "eval_steps_per_second": 2.803,
165
+ "step": 400
166
+ },
167
+ {
168
+ "epoch": 0.15,
169
+ "grad_norm": 1.8847980499267578,
170
+ "learning_rate": 9.490323356172616e-05,
171
+ "loss": 0.2643,
172
+ "step": 425
173
+ },
174
+ {
175
+ "epoch": 0.16,
176
+ "grad_norm": 1.0712331533432007,
177
+ "learning_rate": 9.461473734823898e-05,
178
+ "loss": 0.2747,
179
+ "step": 450
180
+ },
181
+ {
182
+ "epoch": 0.17,
183
+ "grad_norm": 0.9433643221855164,
184
+ "learning_rate": 9.431422045918981e-05,
185
+ "loss": 0.295,
186
+ "step": 475
187
+ },
188
+ {
189
+ "epoch": 0.18,
190
+ "grad_norm": 1.2171131372451782,
191
+ "learning_rate": 9.401370357014065e-05,
192
+ "loss": 0.2731,
193
+ "step": 500
194
+ },
195
+ {
196
+ "epoch": 0.18,
197
+ "eval_loss": 0.26127612590789795,
198
+ "eval_na_accuracy": 0.8453608155250549,
199
+ "eval_ordinal_accuracy": 0.5566401481628418,
200
+ "eval_ordinal_mae": 0.5651282668113708,
201
+ "eval_runtime": 199.6257,
202
+ "eval_samples_per_second": 22.417,
203
+ "eval_steps_per_second": 2.805,
204
+ "step": 500
205
+ },
206
+ {
207
+ "epoch": 0.19,
208
+ "grad_norm": 1.1272056102752686,
209
+ "learning_rate": 9.371318668109149e-05,
210
+ "loss": 0.2878,
211
+ "step": 525
212
+ },
213
+ {
214
+ "epoch": 0.2,
215
+ "grad_norm": 0.4902811348438263,
216
+ "learning_rate": 9.341266979204232e-05,
217
+ "loss": 0.2426,
218
+ "step": 550
219
+ },
220
+ {
221
+ "epoch": 0.21,
222
+ "grad_norm": 0.5631658434867859,
223
+ "learning_rate": 9.311215290299315e-05,
224
+ "loss": 0.2487,
225
+ "step": 575
226
+ },
227
+ {
228
+ "epoch": 0.22,
229
+ "grad_norm": 0.6327888369560242,
230
+ "learning_rate": 9.281163601394399e-05,
231
+ "loss": 0.2926,
232
+ "step": 600
233
+ },
234
+ {
235
+ "epoch": 0.22,
236
+ "eval_loss": 0.2734401226043701,
237
+ "eval_na_accuracy": 0.9089347124099731,
238
+ "eval_ordinal_accuracy": 0.5494477152824402,
239
+ "eval_ordinal_mae": 0.5305272340774536,
240
+ "eval_runtime": 196.452,
241
+ "eval_samples_per_second": 22.779,
242
+ "eval_steps_per_second": 2.851,
243
+ "step": 600
244
+ },
245
+ {
246
+ "epoch": 0.23,
247
+ "grad_norm": 0.6612924337387085,
248
+ "learning_rate": 9.251111912489481e-05,
249
+ "loss": 0.2871,
250
+ "step": 625
251
+ },
252
+ {
253
+ "epoch": 0.23,
254
+ "grad_norm": 1.513719081878662,
255
+ "learning_rate": 9.221060223584565e-05,
256
+ "loss": 0.2538,
257
+ "step": 650
258
+ },
259
+ {
260
+ "epoch": 0.24,
261
+ "grad_norm": 1.1069471836090088,
262
+ "learning_rate": 9.191008534679649e-05,
263
+ "loss": 0.21,
264
+ "step": 675
265
+ },
266
+ {
267
+ "epoch": 0.25,
268
+ "grad_norm": 2.458164691925049,
269
+ "learning_rate": 9.160956845774733e-05,
270
+ "loss": 0.2686,
271
+ "step": 700
272
+ },
273
+ {
274
+ "epoch": 0.25,
275
+ "eval_loss": 0.23620270192623138,
276
+ "eval_na_accuracy": 0.7886598110198975,
277
+ "eval_ordinal_accuracy": 0.6249678730964661,
278
+ "eval_ordinal_mae": 0.485275000333786,
279
+ "eval_runtime": 190.0791,
280
+ "eval_samples_per_second": 23.543,
281
+ "eval_steps_per_second": 2.946,
282
+ "step": 700
283
+ },
284
+ {
285
+ "epoch": 0.26,
286
+ "grad_norm": 0.5598509907722473,
287
+ "learning_rate": 9.130905156869817e-05,
288
+ "loss": 0.2525,
289
+ "step": 725
290
+ },
291
+ {
292
+ "epoch": 0.27,
293
+ "grad_norm": 1.2617449760437012,
294
+ "learning_rate": 9.1008534679649e-05,
295
+ "loss": 0.2453,
296
+ "step": 750
297
+ },
298
+ {
299
+ "epoch": 0.28,
300
+ "grad_norm": 1.5153591632843018,
301
+ "learning_rate": 9.070801779059983e-05,
302
+ "loss": 0.225,
303
+ "step": 775
304
+ },
305
+ {
306
+ "epoch": 0.29,
307
+ "grad_norm": 1.562745451927185,
308
+ "learning_rate": 9.040750090155067e-05,
309
+ "loss": 0.2715,
310
+ "step": 800
311
+ },
312
+ {
313
+ "epoch": 0.29,
314
+ "eval_loss": 0.24541285634040833,
315
+ "eval_na_accuracy": 0.7714776396751404,
316
+ "eval_ordinal_accuracy": 0.6254816055297852,
317
+ "eval_ordinal_mae": 0.49135151505470276,
318
+ "eval_runtime": 193.3675,
319
+ "eval_samples_per_second": 23.142,
320
+ "eval_steps_per_second": 2.896,
321
+ "step": 800
322
+ },
323
+ {
324
+ "epoch": 0.3,
325
+ "grad_norm": 1.5192123651504517,
326
+ "learning_rate": 9.010698401250151e-05,
327
+ "loss": 0.2424,
328
+ "step": 825
329
+ },
330
+ {
331
+ "epoch": 0.31,
332
+ "grad_norm": 0.8706183433532715,
333
+ "learning_rate": 8.980646712345235e-05,
334
+ "loss": 0.2341,
335
+ "step": 850
336
+ },
337
+ {
338
+ "epoch": 0.32,
339
+ "grad_norm": 1.0854986906051636,
340
+ "learning_rate": 8.950595023440318e-05,
341
+ "loss": 0.2694,
342
+ "step": 875
343
+ },
344
+ {
345
+ "epoch": 0.32,
346
+ "grad_norm": 2.208933115005493,
347
+ "learning_rate": 8.920543334535402e-05,
348
+ "loss": 0.2459,
349
+ "step": 900
350
+ },
351
+ {
352
+ "epoch": 0.32,
353
+ "eval_loss": 0.2451752871274948,
354
+ "eval_na_accuracy": 0.7800687551498413,
355
+ "eval_ordinal_accuracy": 0.6072437763214111,
356
+ "eval_ordinal_mae": 0.47625866532325745,
357
+ "eval_runtime": 192.5483,
358
+ "eval_samples_per_second": 23.241,
359
+ "eval_steps_per_second": 2.908,
360
+ "step": 900
361
+ },
362
+ {
363
+ "epoch": 0.33,
364
+ "grad_norm": 0.8433920741081238,
365
+ "learning_rate": 8.890491645630485e-05,
366
+ "loss": 0.2375,
367
+ "step": 925
368
+ },
369
+ {
370
+ "epoch": 0.34,
371
+ "grad_norm": 2.9415366649627686,
372
+ "learning_rate": 8.860439956725569e-05,
373
+ "loss": 0.2918,
374
+ "step": 950
375
+ },
376
+ {
377
+ "epoch": 0.35,
378
+ "grad_norm": 1.6653732061386108,
379
+ "learning_rate": 8.830388267820651e-05,
380
+ "loss": 0.2615,
381
+ "step": 975
382
+ },
383
+ {
384
+ "epoch": 0.36,
385
+ "grad_norm": 0.47592782974243164,
386
+ "learning_rate": 8.800336578915735e-05,
387
+ "loss": 0.2033,
388
+ "step": 1000
389
+ },
390
+ {
391
+ "epoch": 0.36,
392
+ "eval_loss": 0.23654678463935852,
393
+ "eval_na_accuracy": 0.7663230299949646,
394
+ "eval_ordinal_accuracy": 0.6105831265449524,
395
+ "eval_ordinal_mae": 0.4967080354690552,
396
+ "eval_runtime": 192.911,
397
+ "eval_samples_per_second": 23.197,
398
+ "eval_steps_per_second": 2.903,
399
+ "step": 1000
400
+ },
401
+ {
402
+ "epoch": 0.37,
403
+ "grad_norm": 0.8534409403800964,
404
+ "learning_rate": 8.770284890010819e-05,
405
+ "loss": 0.2082,
406
+ "step": 1025
407
+ },
408
+ {
409
+ "epoch": 0.38,
410
+ "grad_norm": 0.6388253569602966,
411
+ "learning_rate": 8.740233201105903e-05,
412
+ "loss": 0.2238,
413
+ "step": 1050
414
+ },
415
+ {
416
+ "epoch": 0.39,
417
+ "grad_norm": 1.5685051679611206,
418
+ "learning_rate": 8.710181512200987e-05,
419
+ "loss": 0.2775,
420
+ "step": 1075
421
+ },
422
+ {
423
+ "epoch": 0.4,
424
+ "grad_norm": 0.9940025806427002,
425
+ "learning_rate": 8.680129823296069e-05,
426
+ "loss": 0.2234,
427
+ "step": 1100
428
+ },
429
+ {
430
+ "epoch": 0.4,
431
+ "eval_loss": 0.22986453771591187,
432
+ "eval_na_accuracy": 0.8676975965499878,
433
+ "eval_ordinal_accuracy": 0.6180323362350464,
434
+ "eval_ordinal_mae": 0.49474066495895386,
435
+ "eval_runtime": 191.9691,
436
+ "eval_samples_per_second": 23.311,
437
+ "eval_steps_per_second": 2.917,
438
+ "step": 1100
439
+ },
440
+ {
441
+ "epoch": 0.41,
442
+ "grad_norm": 0.6706956624984741,
443
+ "learning_rate": 8.650078134391153e-05,
444
+ "loss": 0.2027,
445
+ "step": 1125
446
+ },
447
+ {
448
+ "epoch": 0.41,
449
+ "grad_norm": 0.4717120826244354,
450
+ "learning_rate": 8.620026445486237e-05,
451
+ "loss": 0.2072,
452
+ "step": 1150
453
+ },
454
+ {
455
+ "epoch": 0.42,
456
+ "grad_norm": 1.5387229919433594,
457
+ "learning_rate": 8.58997475658132e-05,
458
+ "loss": 0.2328,
459
+ "step": 1175
460
+ },
461
+ {
462
+ "epoch": 0.43,
463
+ "grad_norm": 1.0794016122817993,
464
+ "learning_rate": 8.559923067676405e-05,
465
+ "loss": 0.2035,
466
+ "step": 1200
467
+ },
468
+ {
469
+ "epoch": 0.43,
470
+ "eval_loss": 0.2313634753227234,
471
+ "eval_na_accuracy": 0.7835051417350769,
472
+ "eval_ordinal_accuracy": 0.6308759450912476,
473
+ "eval_ordinal_mae": 0.4743907153606415,
474
+ "eval_runtime": 191.4648,
475
+ "eval_samples_per_second": 23.372,
476
+ "eval_steps_per_second": 2.925,
477
+ "step": 1200
478
+ },
479
+ {
480
+ "epoch": 0.44,
481
+ "grad_norm": 1.2199459075927734,
482
+ "learning_rate": 8.529871378771488e-05,
483
+ "loss": 0.2769,
484
+ "step": 1225
485
+ },
486
+ {
487
+ "epoch": 0.45,
488
+ "grad_norm": 0.5869888067245483,
489
+ "learning_rate": 8.499819689866571e-05,
490
+ "loss": 0.2194,
491
+ "step": 1250
492
+ },
493
+ {
494
+ "epoch": 0.46,
495
+ "grad_norm": 1.1209378242492676,
496
+ "learning_rate": 8.469768000961655e-05,
497
+ "loss": 0.2536,
498
+ "step": 1275
499
+ },
500
+ {
501
+ "epoch": 0.47,
502
+ "grad_norm": 0.4376941919326782,
503
+ "learning_rate": 8.439716312056739e-05,
504
+ "loss": 0.2277,
505
+ "step": 1300
506
+ },
507
+ {
508
+ "epoch": 0.47,
509
+ "eval_loss": 0.23891955614089966,
510
+ "eval_na_accuracy": 0.730240523815155,
511
+ "eval_ordinal_accuracy": 0.643462598323822,
512
+ "eval_ordinal_mae": 0.46490678191185,
513
+ "eval_runtime": 193.3761,
514
+ "eval_samples_per_second": 23.141,
515
+ "eval_steps_per_second": 2.896,
516
+ "step": 1300
517
+ },
518
+ {
519
+ "epoch": 0.48,
520
+ "grad_norm": 1.8051934242248535,
521
+ "learning_rate": 8.409664623151821e-05,
522
+ "loss": 0.2007,
523
+ "step": 1325
524
+ },
525
+ {
526
+ "epoch": 0.49,
527
+ "grad_norm": 1.5033048391342163,
528
+ "learning_rate": 8.379612934246905e-05,
529
+ "loss": 0.1993,
530
+ "step": 1350
531
+ },
532
+ {
533
+ "epoch": 0.5,
534
+ "grad_norm": 0.577111542224884,
535
+ "learning_rate": 8.349561245341989e-05,
536
+ "loss": 0.2337,
537
+ "step": 1375
538
+ },
539
+ {
540
+ "epoch": 0.5,
541
+ "grad_norm": 0.8847935795783997,
542
+ "learning_rate": 8.319509556437071e-05,
543
+ "loss": 0.2535,
544
+ "step": 1400
545
+ },
546
+ {
547
+ "epoch": 0.5,
548
+ "eval_loss": 0.22593823075294495,
549
+ "eval_na_accuracy": 0.8247422575950623,
550
+ "eval_ordinal_accuracy": 0.6021063327789307,
551
+ "eval_ordinal_mae": 0.45086583495140076,
552
+ "eval_runtime": 192.5485,
553
+ "eval_samples_per_second": 23.241,
554
+ "eval_steps_per_second": 2.908,
555
+ "step": 1400
556
+ },
557
+ {
558
+ "epoch": 0.51,
559
+ "grad_norm": 1.888088345527649,
560
+ "learning_rate": 8.289457867532155e-05,
561
+ "loss": 0.2162,
562
+ "step": 1425
563
+ },
564
+ {
565
+ "epoch": 0.52,
566
+ "grad_norm": 0.9532464146614075,
567
+ "learning_rate": 8.259406178627239e-05,
568
+ "loss": 0.2632,
569
+ "step": 1450
570
+ },
571
+ {
572
+ "epoch": 0.53,
573
+ "grad_norm": 2.7865819931030273,
574
+ "learning_rate": 8.229354489722323e-05,
575
+ "loss": 0.2005,
576
+ "step": 1475
577
+ },
578
+ {
579
+ "epoch": 0.54,
580
+ "grad_norm": 1.0594526529312134,
581
+ "learning_rate": 8.199302800817407e-05,
582
+ "loss": 0.2209,
583
+ "step": 1500
584
+ },
585
+ {
586
+ "epoch": 0.54,
587
+ "eval_loss": 0.23685050010681152,
588
+ "eval_na_accuracy": 0.7577319741249084,
589
+ "eval_ordinal_accuracy": 0.6362702250480652,
590
+ "eval_ordinal_mae": 0.45073509216308594,
591
+ "eval_runtime": 192.1913,
592
+ "eval_samples_per_second": 23.284,
593
+ "eval_steps_per_second": 2.914,
594
+ "step": 1500
595
+ },
596
+ {
597
+ "epoch": 0.55,
598
+ "grad_norm": 0.9557788372039795,
599
+ "learning_rate": 8.16925111191249e-05,
600
+ "loss": 0.2009,
601
+ "step": 1525
602
+ },
603
+ {
604
+ "epoch": 0.56,
605
+ "grad_norm": 0.7707318067550659,
606
+ "learning_rate": 8.139199423007573e-05,
607
+ "loss": 0.2172,
608
+ "step": 1550
609
+ },
610
+ {
611
+ "epoch": 0.57,
612
+ "grad_norm": 1.569989562034607,
613
+ "learning_rate": 8.109147734102657e-05,
614
+ "loss": 0.2229,
615
+ "step": 1575
616
+ },
617
+ {
618
+ "epoch": 0.58,
619
+ "grad_norm": 1.5089187622070312,
620
+ "learning_rate": 8.079096045197741e-05,
621
+ "loss": 0.2007,
622
+ "step": 1600
623
+ },
624
+ {
625
+ "epoch": 0.58,
626
+ "eval_loss": 0.21611952781677246,
627
+ "eval_na_accuracy": 0.831615149974823,
628
+ "eval_ordinal_accuracy": 0.6539943218231201,
629
+ "eval_ordinal_mae": 0.427180677652359,
630
+ "eval_runtime": 190.3474,
631
+ "eval_samples_per_second": 23.51,
632
+ "eval_steps_per_second": 2.942,
633
+ "step": 1600
634
+ },
635
+ {
636
+ "epoch": 0.59,
637
+ "grad_norm": 1.9716092348098755,
638
+ "learning_rate": 8.049044356292825e-05,
639
+ "loss": 0.2375,
640
+ "step": 1625
641
+ },
642
+ {
643
+ "epoch": 0.6,
644
+ "grad_norm": 2.04927659034729,
645
+ "learning_rate": 8.018992667387908e-05,
646
+ "loss": 0.2548,
647
+ "step": 1650
648
+ },
649
+ {
650
+ "epoch": 0.6,
651
+ "grad_norm": 1.1613515615463257,
652
+ "learning_rate": 7.988940978482991e-05,
653
+ "loss": 0.3238,
654
+ "step": 1675
655
+ },
656
+ {
657
+ "epoch": 0.61,
658
+ "grad_norm": 1.4977881908416748,
659
+ "learning_rate": 7.958889289578075e-05,
660
+ "loss": 0.2013,
661
+ "step": 1700
662
+ },
663
+ {
664
+ "epoch": 0.61,
665
+ "eval_loss": 0.24333100020885468,
666
+ "eval_na_accuracy": 0.7731958627700806,
667
+ "eval_ordinal_accuracy": 0.6128949522972107,
668
+ "eval_ordinal_mae": 0.43257907032966614,
669
+ "eval_runtime": 193.154,
670
+ "eval_samples_per_second": 23.168,
671
+ "eval_steps_per_second": 2.899,
672
+ "step": 1700
673
+ },
674
+ {
675
+ "epoch": 0.62,
676
+ "grad_norm": 0.6059219837188721,
677
+ "learning_rate": 7.928837600673157e-05,
678
+ "loss": 0.273,
679
+ "step": 1725
680
+ },
681
+ {
682
+ "epoch": 0.63,
683
+ "grad_norm": 1.3245080709457397,
684
+ "learning_rate": 7.898785911768241e-05,
685
+ "loss": 0.2148,
686
+ "step": 1750
687
+ },
688
+ {
689
+ "epoch": 0.64,
690
+ "grad_norm": 0.5151876211166382,
691
+ "learning_rate": 7.868734222863325e-05,
692
+ "loss": 0.1901,
693
+ "step": 1775
694
+ },
695
+ {
696
+ "epoch": 0.65,
697
+ "grad_norm": 0.37926343083381653,
698
+ "learning_rate": 7.838682533958409e-05,
699
+ "loss": 0.1999,
700
+ "step": 1800
701
+ },
702
+ {
703
+ "epoch": 0.65,
704
+ "eval_loss": 0.22266778349876404,
705
+ "eval_na_accuracy": 0.8247422575950623,
706
+ "eval_ordinal_accuracy": 0.6552786827087402,
707
+ "eval_ordinal_mae": 0.4460422396659851,
708
+ "eval_runtime": 193.6404,
709
+ "eval_samples_per_second": 23.11,
710
+ "eval_steps_per_second": 2.892,
711
+ "step": 1800
712
+ },
713
+ {
714
+ "epoch": 0.66,
715
+ "grad_norm": 4.746518611907959,
716
+ "learning_rate": 7.808630845053493e-05,
717
+ "loss": 0.2169,
718
+ "step": 1825
719
+ },
720
+ {
721
+ "epoch": 0.67,
722
+ "grad_norm": 0.3521455228328705,
723
+ "learning_rate": 7.778579156148575e-05,
724
+ "loss": 0.228,
725
+ "step": 1850
726
+ },
727
+ {
728
+ "epoch": 0.68,
729
+ "grad_norm": 1.3322985172271729,
730
+ "learning_rate": 7.748527467243659e-05,
731
+ "loss": 0.2047,
732
+ "step": 1875
733
+ },
734
+ {
735
+ "epoch": 0.69,
736
+ "grad_norm": 0.6390167474746704,
737
+ "learning_rate": 7.718475778338743e-05,
738
+ "loss": 0.2157,
739
+ "step": 1900
740
+ },
741
+ {
742
+ "epoch": 0.69,
743
+ "eval_loss": 0.2134324014186859,
744
+ "eval_na_accuracy": 0.8161512017250061,
745
+ "eval_ordinal_accuracy": 0.6362702250480652,
746
+ "eval_ordinal_mae": 0.4728001058101654,
747
+ "eval_runtime": 191.9038,
748
+ "eval_samples_per_second": 23.319,
749
+ "eval_steps_per_second": 2.918,
750
+ "step": 1900
751
+ },
752
+ {
753
+ "epoch": 0.69,
754
+ "grad_norm": 2.581801176071167,
755
+ "learning_rate": 7.688424089433827e-05,
756
+ "loss": 0.2335,
757
+ "step": 1925
758
+ },
759
+ {
760
+ "epoch": 0.7,
761
+ "grad_norm": 1.3418828248977661,
762
+ "learning_rate": 7.65837240052891e-05,
763
+ "loss": 0.2446,
764
+ "step": 1950
765
+ },
766
+ {
767
+ "epoch": 0.71,
768
+ "grad_norm": 0.605707585811615,
769
+ "learning_rate": 7.628320711623994e-05,
770
+ "loss": 0.2144,
771
+ "step": 1975
772
+ },
773
+ {
774
+ "epoch": 0.72,
775
+ "grad_norm": 0.6443113088607788,
776
+ "learning_rate": 7.598269022719077e-05,
777
+ "loss": 0.2154,
778
+ "step": 2000
779
+ },
780
+ {
781
+ "epoch": 0.72,
782
+ "eval_loss": 0.2238776683807373,
783
+ "eval_na_accuracy": 0.8573883175849915,
784
+ "eval_ordinal_accuracy": 0.5787310600280762,
785
+ "eval_ordinal_mae": 0.4734483063220978,
786
+ "eval_runtime": 192.57,
787
+ "eval_samples_per_second": 23.238,
788
+ "eval_steps_per_second": 2.908,
789
+ "step": 2000
790
+ },
791
+ {
792
+ "epoch": 0.73,
793
+ "grad_norm": 1.4812798500061035,
794
+ "learning_rate": 7.568217333814161e-05,
795
+ "loss": 0.2172,
796
+ "step": 2025
797
+ },
798
+ {
799
+ "epoch": 0.74,
800
+ "grad_norm": 1.5400125980377197,
801
+ "learning_rate": 7.538165644909245e-05,
802
+ "loss": 0.1907,
803
+ "step": 2050
804
+ },
805
+ {
806
+ "epoch": 0.75,
807
+ "grad_norm": 0.6313005685806274,
808
+ "learning_rate": 7.508113956004327e-05,
809
+ "loss": 0.2078,
810
+ "step": 2075
811
+ },
812
+ {
813
+ "epoch": 0.76,
814
+ "grad_norm": 1.3446621894836426,
815
+ "learning_rate": 7.478062267099411e-05,
816
+ "loss": 0.2169,
817
+ "step": 2100
818
+ },
819
+ {
820
+ "epoch": 0.76,
821
+ "eval_loss": 0.23939213156700134,
822
+ "eval_na_accuracy": 0.8848797082901001,
823
+ "eval_ordinal_accuracy": 0.6254816055297852,
824
+ "eval_ordinal_mae": 0.4392252564430237,
825
+ "eval_runtime": 194.2506,
826
+ "eval_samples_per_second": 23.037,
827
+ "eval_steps_per_second": 2.883,
828
+ "step": 2100
829
+ },
830
+ {
831
+ "epoch": 0.77,
832
+ "grad_norm": 0.5709229111671448,
833
+ "learning_rate": 7.448010578194495e-05,
834
+ "loss": 0.2155,
835
+ "step": 2125
836
+ },
837
+ {
838
+ "epoch": 0.78,
839
+ "grad_norm": 2.9002838134765625,
840
+ "learning_rate": 7.417958889289577e-05,
841
+ "loss": 0.2534,
842
+ "step": 2150
843
+ },
844
+ {
845
+ "epoch": 0.78,
846
+ "grad_norm": 1.3746689558029175,
847
+ "learning_rate": 7.387907200384661e-05,
848
+ "loss": 0.2489,
849
+ "step": 2175
850
+ },
851
+ {
852
+ "epoch": 0.79,
853
+ "grad_norm": 1.1106956005096436,
854
+ "learning_rate": 7.357855511479745e-05,
855
+ "loss": 0.2719,
856
+ "step": 2200
857
+ },
858
+ {
859
+ "epoch": 0.79,
860
+ "eval_loss": 0.2283402532339096,
861
+ "eval_na_accuracy": 0.8780068755149841,
862
+ "eval_ordinal_accuracy": 0.6229129433631897,
863
+ "eval_ordinal_mae": 0.4323791265487671,
864
+ "eval_runtime": 193.3183,
865
+ "eval_samples_per_second": 23.148,
866
+ "eval_steps_per_second": 2.897,
867
+ "step": 2200
868
+ },
869
+ {
870
+ "epoch": 0.8,
871
+ "grad_norm": 1.5689876079559326,
872
+ "learning_rate": 7.327803822574829e-05,
873
+ "loss": 0.2054,
874
+ "step": 2225
875
+ },
876
+ {
877
+ "epoch": 0.81,
878
+ "grad_norm": 2.14363694190979,
879
+ "learning_rate": 7.297752133669913e-05,
880
+ "loss": 0.213,
881
+ "step": 2250
882
+ },
883
+ {
884
+ "epoch": 0.82,
885
+ "grad_norm": 1.4198272228240967,
886
+ "learning_rate": 7.267700444764997e-05,
887
+ "loss": 0.1729,
888
+ "step": 2275
889
+ },
890
+ {
891
+ "epoch": 0.83,
892
+ "grad_norm": 0.5297597646713257,
893
+ "learning_rate": 7.237648755860079e-05,
894
+ "loss": 0.2244,
895
+ "step": 2300
896
+ },
897
+ {
898
+ "epoch": 0.83,
899
+ "eval_loss": 0.2139568328857422,
900
+ "eval_na_accuracy": 0.8728522062301636,
901
+ "eval_ordinal_accuracy": 0.6313896775245667,
902
+ "eval_ordinal_mae": 0.4482755959033966,
903
+ "eval_runtime": 193.8188,
904
+ "eval_samples_per_second": 23.089,
905
+ "eval_steps_per_second": 2.889,
906
+ "step": 2300
907
+ },
908
+ {
909
+ "epoch": 0.84,
910
+ "grad_norm": 0.4800972044467926,
911
+ "learning_rate": 7.207597066955163e-05,
912
+ "loss": 0.231,
913
+ "step": 2325
914
+ },
915
+ {
916
+ "epoch": 0.85,
917
+ "grad_norm": 2.125603199005127,
918
+ "learning_rate": 7.177545378050247e-05,
919
+ "loss": 0.2371,
920
+ "step": 2350
921
+ },
922
+ {
923
+ "epoch": 0.86,
924
+ "grad_norm": 1.553396224975586,
925
+ "learning_rate": 7.147493689145331e-05,
926
+ "loss": 0.217,
927
+ "step": 2375
928
+ },
929
+ {
930
+ "epoch": 0.87,
931
+ "grad_norm": 1.5980515480041504,
932
+ "learning_rate": 7.117442000240415e-05,
933
+ "loss": 0.2072,
934
+ "step": 2400
935
+ },
936
+ {
937
+ "epoch": 0.87,
938
+ "eval_loss": 0.2198052555322647,
939
+ "eval_na_accuracy": 0.8213058710098267,
940
+ "eval_ordinal_accuracy": 0.6439763903617859,
941
+ "eval_ordinal_mae": 0.4330386817455292,
942
+ "eval_runtime": 193.8423,
943
+ "eval_samples_per_second": 23.086,
944
+ "eval_steps_per_second": 2.889,
945
+ "step": 2400
946
+ },
947
+ {
948
+ "epoch": 0.87,
949
+ "grad_norm": 1.0013697147369385,
950
+ "learning_rate": 7.087390311335498e-05,
951
+ "loss": 0.2141,
952
+ "step": 2425
953
+ },
954
+ {
955
+ "epoch": 0.88,
956
+ "grad_norm": 2.5208072662353516,
957
+ "learning_rate": 7.057338622430581e-05,
958
+ "loss": 0.2109,
959
+ "step": 2450
960
+ },
961
+ {
962
+ "epoch": 0.89,
963
+ "grad_norm": 0.3612011969089508,
964
+ "learning_rate": 7.027286933525665e-05,
965
+ "loss": 0.1701,
966
+ "step": 2475
967
+ },
968
+ {
969
+ "epoch": 0.9,
970
+ "grad_norm": 0.6919033527374268,
971
+ "learning_rate": 6.997235244620747e-05,
972
+ "loss": 0.1754,
973
+ "step": 2500
974
+ },
975
+ {
976
+ "epoch": 0.9,
977
+ "eval_loss": 0.20990537106990814,
978
+ "eval_na_accuracy": 0.8419243693351746,
979
+ "eval_ordinal_accuracy": 0.6712047457695007,
980
+ "eval_ordinal_mae": 0.41983720660209656,
981
+ "eval_runtime": 193.4285,
982
+ "eval_samples_per_second": 23.135,
983
+ "eval_steps_per_second": 2.895,
984
+ "step": 2500
985
+ },
986
+ {
987
+ "epoch": 0.91,
988
+ "grad_norm": 2.494234323501587,
989
+ "learning_rate": 6.967183555715831e-05,
990
+ "loss": 0.2547,
991
+ "step": 2525
992
+ },
993
+ {
994
+ "epoch": 0.92,
995
+ "grad_norm": 1.2194478511810303,
996
+ "learning_rate": 6.937131866810915e-05,
997
+ "loss": 0.2105,
998
+ "step": 2550
999
+ },
1000
+ {
1001
+ "epoch": 0.93,
1002
+ "grad_norm": 0.8242117166519165,
1003
+ "learning_rate": 6.907080177905999e-05,
1004
+ "loss": 0.185,
1005
+ "step": 2575
1006
+ },
1007
+ {
1008
+ "epoch": 0.94,
1009
+ "grad_norm": 2.185267210006714,
1010
+ "learning_rate": 6.877028489001081e-05,
1011
+ "loss": 0.1773,
1012
+ "step": 2600
1013
+ },
1014
+ {
1015
+ "epoch": 0.94,
1016
+ "eval_loss": 0.20526131987571716,
1017
+ "eval_na_accuracy": 0.8642611503601074,
1018
+ "eval_ordinal_accuracy": 0.6586180329322815,
1019
+ "eval_ordinal_mae": 0.4104856550693512,
1020
+ "eval_runtime": 191.4771,
1021
+ "eval_samples_per_second": 23.371,
1022
+ "eval_steps_per_second": 2.925,
1023
+ "step": 2600
1024
+ },
1025
+ {
1026
+ "epoch": 0.95,
1027
+ "grad_norm": 1.5727041959762573,
1028
+ "learning_rate": 6.846976800096165e-05,
1029
+ "loss": 0.1994,
1030
+ "step": 2625
1031
+ },
1032
+ {
1033
+ "epoch": 0.96,
1034
+ "grad_norm": 1.3567266464233398,
1035
+ "learning_rate": 6.816925111191249e-05,
1036
+ "loss": 0.2087,
1037
+ "step": 2650
1038
+ },
1039
+ {
1040
+ "epoch": 0.96,
1041
+ "grad_norm": 0.43862801790237427,
1042
+ "learning_rate": 6.786873422286333e-05,
1043
+ "loss": 0.2394,
1044
+ "step": 2675
1045
+ },
1046
+ {
1047
+ "epoch": 0.97,
1048
+ "grad_norm": 0.7140023708343506,
1049
+ "learning_rate": 6.756821733381417e-05,
1050
+ "loss": 0.2378,
1051
+ "step": 2700
1052
+ },
1053
+ {
1054
+ "epoch": 0.97,
1055
+ "eval_loss": 0.20436730980873108,
1056
+ "eval_na_accuracy": 0.8333333134651184,
1057
+ "eval_ordinal_accuracy": 0.6647829413414001,
1058
+ "eval_ordinal_mae": 0.4324062764644623,
1059
+ "eval_runtime": 193.6819,
1060
+ "eval_samples_per_second": 23.105,
1061
+ "eval_steps_per_second": 2.891,
1062
+ "step": 2700
1063
+ },
1064
+ {
1065
+ "epoch": 0.98,
1066
+ "grad_norm": 0.44048529863357544,
1067
+ "learning_rate": 6.7267700444765e-05,
1068
+ "loss": 0.172,
1069
+ "step": 2725
1070
+ },
1071
+ {
1072
+ "epoch": 0.99,
1073
+ "grad_norm": 2.4433631896972656,
1074
+ "learning_rate": 6.696718355571583e-05,
1075
+ "loss": 0.1666,
1076
+ "step": 2750
1077
+ },
1078
+ {
1079
+ "epoch": 1.0,
1080
+ "grad_norm": 0.6148251891136169,
1081
+ "learning_rate": 6.666666666666667e-05,
1082
+ "loss": 0.2,
1083
+ "step": 2775
1084
+ },
1085
+ {
1086
+ "epoch": 1.01,
1087
+ "grad_norm": 0.5114859342575073,
1088
+ "learning_rate": 6.636614977761751e-05,
1089
+ "loss": 0.1295,
1090
+ "step": 2800
1091
+ },
1092
+ {
1093
+ "epoch": 1.01,
1094
+ "eval_loss": 0.20437294244766235,
1095
+ "eval_na_accuracy": 0.8247422575950623,
1096
+ "eval_ordinal_accuracy": 0.6843051910400391,
1097
+ "eval_ordinal_mae": 0.4015503525733948,
1098
+ "eval_runtime": 191.7797,
1099
+ "eval_samples_per_second": 23.334,
1100
+ "eval_steps_per_second": 2.92,
1101
+ "step": 2800
1102
+ },
1103
+ {
1104
+ "epoch": 1.02,
1105
+ "grad_norm": 0.5778698921203613,
1106
+ "learning_rate": 6.606563288856835e-05,
1107
+ "loss": 0.1506,
1108
+ "step": 2825
1109
+ },
1110
+ {
1111
+ "epoch": 1.03,
1112
+ "grad_norm": 0.3983612358570099,
1113
+ "learning_rate": 6.576511599951917e-05,
1114
+ "loss": 0.1358,
1115
+ "step": 2850
1116
+ },
1117
+ {
1118
+ "epoch": 1.04,
1119
+ "grad_norm": 0.8497461080551147,
1120
+ "learning_rate": 6.546459911047001e-05,
1121
+ "loss": 0.1288,
1122
+ "step": 2875
1123
+ },
1124
+ {
1125
+ "epoch": 1.05,
1126
+ "grad_norm": 1.2482887506484985,
1127
+ "learning_rate": 6.516408222142085e-05,
1128
+ "loss": 0.1126,
1129
+ "step": 2900
1130
+ },
1131
+ {
1132
+ "epoch": 1.05,
1133
+ "eval_loss": 0.2301819771528244,
1134
+ "eval_na_accuracy": 0.7577319741249084,
1135
+ "eval_ordinal_accuracy": 0.6804521083831787,
1136
+ "eval_ordinal_mae": 0.402468740940094,
1137
+ "eval_runtime": 195.9356,
1138
+ "eval_samples_per_second": 22.839,
1139
+ "eval_steps_per_second": 2.858,
1140
+ "step": 2900
1141
+ },
1142
+ {
1143
+ "epoch": 1.05,
1144
+ "grad_norm": 12.926443099975586,
1145
+ "learning_rate": 6.486356533237167e-05,
1146
+ "loss": 0.1244,
1147
+ "step": 2925
1148
+ },
1149
+ {
1150
+ "epoch": 1.06,
1151
+ "grad_norm": 0.5321182012557983,
1152
+ "learning_rate": 6.456304844332251e-05,
1153
+ "loss": 0.1368,
1154
+ "step": 2950
1155
+ },
1156
+ {
1157
+ "epoch": 1.07,
1158
+ "grad_norm": 0.4846901297569275,
1159
+ "learning_rate": 6.426253155427335e-05,
1160
+ "loss": 0.1837,
1161
+ "step": 2975
1162
+ },
1163
+ {
1164
+ "epoch": 1.08,
1165
+ "grad_norm": 0.6585441827774048,
1166
+ "learning_rate": 6.396201466522419e-05,
1167
+ "loss": 0.1262,
1168
+ "step": 3000
1169
+ },
1170
+ {
1171
+ "epoch": 1.08,
1172
+ "eval_loss": 0.22050338983535767,
1173
+ "eval_na_accuracy": 0.8092783689498901,
1174
+ "eval_ordinal_accuracy": 0.6516824960708618,
1175
+ "eval_ordinal_mae": 0.4016842246055603,
1176
+ "eval_runtime": 191.162,
1177
+ "eval_samples_per_second": 23.409,
1178
+ "eval_steps_per_second": 2.929,
1179
+ "step": 3000
1180
+ },
1181
+ {
1182
+ "epoch": 1.09,
1183
+ "grad_norm": 0.30136457085609436,
1184
+ "learning_rate": 6.366149777617503e-05,
1185
+ "loss": 0.1417,
1186
+ "step": 3025
1187
+ },
1188
+ {
1189
+ "epoch": 1.1,
1190
+ "grad_norm": 2.017369508743286,
1191
+ "learning_rate": 6.336098088712587e-05,
1192
+ "loss": 0.1155,
1193
+ "step": 3050
1194
+ },
1195
+ {
1196
+ "epoch": 1.11,
1197
+ "grad_norm": 0.6489114165306091,
1198
+ "learning_rate": 6.306046399807669e-05,
1199
+ "loss": 0.1074,
1200
+ "step": 3075
1201
+ },
1202
+ {
1203
+ "epoch": 1.12,
1204
+ "grad_norm": 0.5059970617294312,
1205
+ "learning_rate": 6.275994710902753e-05,
1206
+ "loss": 0.1104,
1207
+ "step": 3100
1208
+ },
1209
+ {
1210
+ "epoch": 1.12,
1211
+ "eval_loss": 0.21169866621494293,
1212
+ "eval_na_accuracy": 0.8453608155250549,
1213
+ "eval_ordinal_accuracy": 0.6778833866119385,
1214
+ "eval_ordinal_mae": 0.39305010437965393,
1215
+ "eval_runtime": 189.5391,
1216
+ "eval_samples_per_second": 23.61,
1217
+ "eval_steps_per_second": 2.955,
1218
+ "step": 3100
1219
+ },
1220
+ {
1221
+ "epoch": 1.13,
1222
+ "grad_norm": 1.127930998802185,
1223
+ "learning_rate": 6.245943021997837e-05,
1224
+ "loss": 0.1353,
1225
+ "step": 3125
1226
+ },
1227
+ {
1228
+ "epoch": 1.14,
1229
+ "grad_norm": 1.9614742994308472,
1230
+ "learning_rate": 6.215891333092921e-05,
1231
+ "loss": 0.1069,
1232
+ "step": 3150
1233
+ },
1234
+ {
1235
+ "epoch": 1.14,
1236
+ "grad_norm": 0.5779272317886353,
1237
+ "learning_rate": 6.185839644188005e-05,
1238
+ "loss": 0.1161,
1239
+ "step": 3175
1240
+ },
1241
+ {
1242
+ "epoch": 1.15,
1243
+ "grad_norm": 0.6223546266555786,
1244
+ "learning_rate": 6.155787955283087e-05,
1245
+ "loss": 0.1657,
1246
+ "step": 3200
1247
+ },
1248
+ {
1249
+ "epoch": 1.15,
1250
+ "eval_loss": 0.21738822758197784,
1251
+ "eval_na_accuracy": 0.8591065406799316,
1252
+ "eval_ordinal_accuracy": 0.6665810346603394,
1253
+ "eval_ordinal_mae": 0.38904985785484314,
1254
+ "eval_runtime": 193.8286,
1255
+ "eval_samples_per_second": 23.087,
1256
+ "eval_steps_per_second": 2.889,
1257
+ "step": 3200
1258
+ },
1259
+ {
1260
+ "epoch": 1.16,
1261
+ "grad_norm": 1.1987367868423462,
1262
+ "learning_rate": 6.125736266378171e-05,
1263
+ "loss": 0.1139,
1264
+ "step": 3225
1265
+ },
1266
+ {
1267
+ "epoch": 1.17,
1268
+ "grad_norm": 0.510863184928894,
1269
+ "learning_rate": 6.095684577473254e-05,
1270
+ "loss": 0.1286,
1271
+ "step": 3250
1272
+ },
1273
+ {
1274
+ "epoch": 1.18,
1275
+ "grad_norm": 0.7094324827194214,
1276
+ "learning_rate": 6.065632888568338e-05,
1277
+ "loss": 0.1231,
1278
+ "step": 3275
1279
+ },
1280
+ {
1281
+ "epoch": 1.19,
1282
+ "grad_norm": 2.9405245780944824,
1283
+ "learning_rate": 6.035581199663422e-05,
1284
+ "loss": 0.1186,
1285
+ "step": 3300
1286
+ },
1287
+ {
1288
+ "epoch": 1.19,
1289
+ "eval_loss": 0.22988717257976532,
1290
+ "eval_na_accuracy": 0.8058419227600098,
1291
+ "eval_ordinal_accuracy": 0.6622142195701599,
1292
+ "eval_ordinal_mae": 0.40129175782203674,
1293
+ "eval_runtime": 191.4373,
1294
+ "eval_samples_per_second": 23.376,
1295
+ "eval_steps_per_second": 2.925,
1296
+ "step": 3300
1297
+ },
1298
+ {
1299
+ "epoch": 1.2,
1300
+ "grad_norm": 1.9970686435699463,
1301
+ "learning_rate": 6.005529510758505e-05,
1302
+ "loss": 0.1492,
1303
+ "step": 3325
1304
+ },
1305
+ {
1306
+ "epoch": 1.21,
1307
+ "grad_norm": 0.4853156805038452,
1308
+ "learning_rate": 5.975477821853589e-05,
1309
+ "loss": 0.0923,
1310
+ "step": 3350
1311
+ },
1312
+ {
1313
+ "epoch": 1.22,
1314
+ "grad_norm": 3.004967451095581,
1315
+ "learning_rate": 5.9454261329486714e-05,
1316
+ "loss": 0.1395,
1317
+ "step": 3375
1318
+ },
1319
+ {
1320
+ "epoch": 1.23,
1321
+ "grad_norm": 2.5411417484283447,
1322
+ "learning_rate": 5.915374444043755e-05,
1323
+ "loss": 0.1304,
1324
+ "step": 3400
1325
+ },
1326
+ {
1327
+ "epoch": 1.23,
1328
+ "eval_loss": 0.21757462620735168,
1329
+ "eval_na_accuracy": 0.8109965920448303,
1330
+ "eval_ordinal_accuracy": 0.6902132034301758,
1331
+ "eval_ordinal_mae": 0.3801174759864807,
1332
+ "eval_runtime": 191.9938,
1333
+ "eval_samples_per_second": 23.308,
1334
+ "eval_steps_per_second": 2.917,
1335
+ "step": 3400
1336
+ },
1337
+ {
1338
+ "epoch": 1.24,
1339
+ "grad_norm": 2.031360626220703,
1340
+ "learning_rate": 5.885322755138839e-05,
1341
+ "loss": 0.161,
1342
+ "step": 3425
1343
+ },
1344
+ {
1345
+ "epoch": 1.24,
1346
+ "grad_norm": 0.9166708588600159,
1347
+ "learning_rate": 5.855271066233923e-05,
1348
+ "loss": 0.1062,
1349
+ "step": 3450
1350
+ },
1351
+ {
1352
+ "epoch": 1.25,
1353
+ "grad_norm": 2.4028913974761963,
1354
+ "learning_rate": 5.825219377329007e-05,
1355
+ "loss": 0.1085,
1356
+ "step": 3475
1357
+ },
1358
+ {
1359
+ "epoch": 1.26,
1360
+ "grad_norm": 0.99100261926651,
1361
+ "learning_rate": 5.79516768842409e-05,
1362
+ "loss": 0.1081,
1363
+ "step": 3500
1364
+ },
1365
+ {
1366
+ "epoch": 1.26,
1367
+ "eval_loss": 0.23295216262340546,
1368
+ "eval_na_accuracy": 0.831615149974823,
1369
+ "eval_ordinal_accuracy": 0.664269208908081,
1370
+ "eval_ordinal_mae": 0.3867188096046448,
1371
+ "eval_runtime": 194.3732,
1372
+ "eval_samples_per_second": 23.023,
1373
+ "eval_steps_per_second": 2.881,
1374
+ "step": 3500
1375
+ },
1376
+ {
1377
+ "epoch": 1.27,
1378
+ "grad_norm": 0.6745060682296753,
1379
+ "learning_rate": 5.765115999519173e-05,
1380
+ "loss": 0.1079,
1381
+ "step": 3525
1382
+ },
1383
+ {
1384
+ "epoch": 1.28,
1385
+ "grad_norm": 0.5718048810958862,
1386
+ "learning_rate": 5.735064310614256e-05,
1387
+ "loss": 0.1353,
1388
+ "step": 3550
1389
+ },
1390
+ {
1391
+ "epoch": 1.29,
1392
+ "grad_norm": 0.4868764579296112,
1393
+ "learning_rate": 5.70501262170934e-05,
1394
+ "loss": 0.1872,
1395
+ "step": 3575
1396
+ },
1397
+ {
1398
+ "epoch": 1.3,
1399
+ "grad_norm": 1.2596338987350464,
1400
+ "learning_rate": 5.674960932804424e-05,
1401
+ "loss": 0.1281,
1402
+ "step": 3600
1403
+ },
1404
+ {
1405
+ "epoch": 1.3,
1406
+ "eval_loss": 0.23198525607585907,
1407
+ "eval_na_accuracy": 0.7680412530899048,
1408
+ "eval_ordinal_accuracy": 0.6902132034301758,
1409
+ "eval_ordinal_mae": 0.3953803777694702,
1410
+ "eval_runtime": 193.0669,
1411
+ "eval_samples_per_second": 23.178,
1412
+ "eval_steps_per_second": 2.901,
1413
+ "step": 3600
1414
+ },
1415
+ {
1416
+ "epoch": 1.31,
1417
+ "grad_norm": 0.5509974360466003,
1418
+ "learning_rate": 5.644909243899508e-05,
1419
+ "loss": 0.1085,
1420
+ "step": 3625
1421
+ },
1422
+ {
1423
+ "epoch": 1.32,
1424
+ "grad_norm": 1.8018646240234375,
1425
+ "learning_rate": 5.614857554994592e-05,
1426
+ "loss": 0.1134,
1427
+ "step": 3650
1428
+ },
1429
+ {
1430
+ "epoch": 1.33,
1431
+ "grad_norm": 0.46525055170059204,
1432
+ "learning_rate": 5.584805866089674e-05,
1433
+ "loss": 0.1789,
1434
+ "step": 3675
1435
+ },
1436
+ {
1437
+ "epoch": 1.33,
1438
+ "grad_norm": 0.4697776734828949,
1439
+ "learning_rate": 5.554754177184758e-05,
1440
+ "loss": 0.1192,
1441
+ "step": 3700
1442
+ },
1443
+ {
1444
+ "epoch": 1.33,
1445
+ "eval_loss": 0.231234610080719,
1446
+ "eval_na_accuracy": 0.7989690899848938,
1447
+ "eval_ordinal_accuracy": 0.6768559217453003,
1448
+ "eval_ordinal_mae": 0.4108929932117462,
1449
+ "eval_runtime": 192.9099,
1450
+ "eval_samples_per_second": 23.197,
1451
+ "eval_steps_per_second": 2.903,
1452
+ "step": 3700
1453
+ },
1454
+ {
1455
+ "epoch": 1.34,
1456
+ "grad_norm": 1.9536067247390747,
1457
+ "learning_rate": 5.524702488279841e-05,
1458
+ "loss": 0.1531,
1459
+ "step": 3725
1460
+ },
1461
+ {
1462
+ "epoch": 1.35,
1463
+ "grad_norm": 0.6636119484901428,
1464
+ "learning_rate": 5.494650799374925e-05,
1465
+ "loss": 0.1152,
1466
+ "step": 3750
1467
+ },
1468
+ {
1469
+ "epoch": 1.36,
1470
+ "grad_norm": 3.2403175830841064,
1471
+ "learning_rate": 5.464599110470009e-05,
1472
+ "loss": 0.133,
1473
+ "step": 3775
1474
+ },
1475
+ {
1476
+ "epoch": 1.37,
1477
+ "grad_norm": 0.7575627565383911,
1478
+ "learning_rate": 5.434547421565093e-05,
1479
+ "loss": 0.1029,
1480
+ "step": 3800
1481
+ },
1482
+ {
1483
+ "epoch": 1.37,
1484
+ "eval_loss": 0.21953611075878143,
1485
+ "eval_na_accuracy": 0.8024054765701294,
1486
+ "eval_ordinal_accuracy": 0.681993305683136,
1487
+ "eval_ordinal_mae": 0.38697296380996704,
1488
+ "eval_runtime": 192.4936,
1489
+ "eval_samples_per_second": 23.248,
1490
+ "eval_steps_per_second": 2.909,
1491
+ "step": 3800
1492
+ },
1493
+ {
1494
+ "epoch": 1.38,
1495
+ "grad_norm": 0.6479007601737976,
1496
+ "learning_rate": 5.404495732660175e-05,
1497
+ "loss": 0.1174,
1498
+ "step": 3825
1499
+ },
1500
+ {
1501
+ "epoch": 1.39,
1502
+ "grad_norm": 2.8630564212799072,
1503
+ "learning_rate": 5.374444043755259e-05,
1504
+ "loss": 0.1398,
1505
+ "step": 3850
1506
+ },
1507
+ {
1508
+ "epoch": 1.4,
1509
+ "grad_norm": 0.6482840180397034,
1510
+ "learning_rate": 5.344392354850343e-05,
1511
+ "loss": 0.1144,
1512
+ "step": 3875
1513
+ },
1514
+ {
1515
+ "epoch": 1.41,
1516
+ "grad_norm": 0.39722609519958496,
1517
+ "learning_rate": 5.314340665945426e-05,
1518
+ "loss": 0.1159,
1519
+ "step": 3900
1520
+ },
1521
+ {
1522
+ "epoch": 1.41,
1523
+ "eval_loss": 0.22003380954265594,
1524
+ "eval_na_accuracy": 0.7903780341148376,
1525
+ "eval_ordinal_accuracy": 0.6812227368354797,
1526
+ "eval_ordinal_mae": 0.3859783113002777,
1527
+ "eval_runtime": 193.9688,
1528
+ "eval_samples_per_second": 23.071,
1529
+ "eval_steps_per_second": 2.887,
1530
+ "step": 3900
1531
+ },
1532
+ {
1533
+ "epoch": 1.42,
1534
+ "grad_norm": 0.7128223776817322,
1535
+ "learning_rate": 5.28428897704051e-05,
1536
+ "loss": 0.1437,
1537
+ "step": 3925
1538
+ },
1539
+ {
1540
+ "epoch": 1.42,
1541
+ "grad_norm": 2.3041298389434814,
1542
+ "learning_rate": 5.254237288135594e-05,
1543
+ "loss": 0.1413,
1544
+ "step": 3950
1545
+ },
1546
+ {
1547
+ "epoch": 1.43,
1548
+ "grad_norm": 0.9336991310119629,
1549
+ "learning_rate": 5.2241855992306764e-05,
1550
+ "loss": 0.1112,
1551
+ "step": 3975
1552
+ },
1553
+ {
1554
+ "epoch": 1.44,
1555
+ "grad_norm": 3.5301501750946045,
1556
+ "learning_rate": 5.19413391032576e-05,
1557
+ "loss": 0.1159,
1558
+ "step": 4000
1559
+ },
1560
+ {
1561
+ "epoch": 1.44,
1562
+ "eval_loss": 0.21585588157176971,
1563
+ "eval_na_accuracy": 0.7989690899848938,
1564
+ "eval_ordinal_accuracy": 0.6981762051582336,
1565
+ "eval_ordinal_mae": 0.3712264895439148,
1566
+ "eval_runtime": 191.7494,
1567
+ "eval_samples_per_second": 23.338,
1568
+ "eval_steps_per_second": 2.92,
1569
+ "step": 4000
1570
+ },
1571
+ {
1572
+ "epoch": 1.45,
1573
+ "grad_norm": 0.8215810060501099,
1574
+ "learning_rate": 5.164082221420844e-05,
1575
+ "loss": 0.0948,
1576
+ "step": 4025
1577
+ },
1578
+ {
1579
+ "epoch": 1.46,
1580
+ "grad_norm": 0.8609411716461182,
1581
+ "learning_rate": 5.134030532515928e-05,
1582
+ "loss": 0.0936,
1583
+ "step": 4050
1584
+ },
1585
+ {
1586
+ "epoch": 1.47,
1587
+ "grad_norm": 0.7595614790916443,
1588
+ "learning_rate": 5.103978843611011e-05,
1589
+ "loss": 0.1181,
1590
+ "step": 4075
1591
+ },
1592
+ {
1593
+ "epoch": 1.48,
1594
+ "grad_norm": 0.7951629757881165,
1595
+ "learning_rate": 5.073927154706095e-05,
1596
+ "loss": 0.107,
1597
+ "step": 4100
1598
+ },
1599
+ {
1600
+ "epoch": 1.48,
1601
+ "eval_loss": 0.22618594765663147,
1602
+ "eval_na_accuracy": 0.8213058710098267,
1603
+ "eval_ordinal_accuracy": 0.6904700994491577,
1604
+ "eval_ordinal_mae": 0.37572237849235535,
1605
+ "eval_runtime": 193.8218,
1606
+ "eval_samples_per_second": 23.088,
1607
+ "eval_steps_per_second": 2.889,
1608
+ "step": 4100
1609
+ },
1610
+ {
1611
+ "epoch": 1.49,
1612
+ "grad_norm": 1.809657096862793,
1613
+ "learning_rate": 5.0438754658011775e-05,
1614
+ "loss": 0.1275,
1615
+ "step": 4125
1616
+ },
1617
+ {
1618
+ "epoch": 1.5,
1619
+ "grad_norm": 0.6124809384346008,
1620
+ "learning_rate": 5.0138237768962613e-05,
1621
+ "loss": 0.0975,
1622
+ "step": 4150
1623
+ },
1624
+ {
1625
+ "epoch": 1.51,
1626
+ "grad_norm": 0.5735268592834473,
1627
+ "learning_rate": 4.983772087991345e-05,
1628
+ "loss": 0.1267,
1629
+ "step": 4175
1630
+ },
1631
+ {
1632
+ "epoch": 1.51,
1633
+ "grad_norm": 0.2053624987602234,
1634
+ "learning_rate": 4.953720399086429e-05,
1635
+ "loss": 0.1262,
1636
+ "step": 4200
1637
+ },
1638
+ {
1639
+ "epoch": 1.51,
1640
+ "eval_loss": 0.22913989424705505,
1641
+ "eval_na_accuracy": 0.8247422575950623,
1642
+ "eval_ordinal_accuracy": 0.683534562587738,
1643
+ "eval_ordinal_mae": 0.3841015100479126,
1644
+ "eval_runtime": 192.0713,
1645
+ "eval_samples_per_second": 23.299,
1646
+ "eval_steps_per_second": 2.916,
1647
+ "step": 4200
1648
+ },
1649
+ {
1650
+ "epoch": 1.52,
1651
+ "grad_norm": 1.163071632385254,
1652
+ "learning_rate": 4.923668710181513e-05,
1653
+ "loss": 0.112,
1654
+ "step": 4225
1655
+ },
1656
+ {
1657
+ "epoch": 1.53,
1658
+ "grad_norm": 0.48347094655036926,
1659
+ "learning_rate": 4.893617021276596e-05,
1660
+ "loss": 0.1312,
1661
+ "step": 4250
1662
+ },
1663
+ {
1664
+ "epoch": 1.54,
1665
+ "grad_norm": 2.0154635906219482,
1666
+ "learning_rate": 4.863565332371679e-05,
1667
+ "loss": 0.1057,
1668
+ "step": 4275
1669
+ },
1670
+ {
1671
+ "epoch": 1.55,
1672
+ "grad_norm": 2.7999942302703857,
1673
+ "learning_rate": 4.833513643466763e-05,
1674
+ "loss": 0.1437,
1675
+ "step": 4300
1676
+ },
1677
+ {
1678
+ "epoch": 1.55,
1679
+ "eval_loss": 0.23114901781082153,
1680
+ "eval_na_accuracy": 0.800687313079834,
1681
+ "eval_ordinal_accuracy": 0.6922681927680969,
1682
+ "eval_ordinal_mae": 0.3750612139701843,
1683
+ "eval_runtime": 193.0261,
1684
+ "eval_samples_per_second": 23.183,
1685
+ "eval_steps_per_second": 2.901,
1686
+ "step": 4300
1687
+ },
1688
+ {
1689
+ "epoch": 1.56,
1690
+ "grad_norm": 0.7880312204360962,
1691
+ "learning_rate": 4.803461954561846e-05,
1692
+ "loss": 0.1099,
1693
+ "step": 4325
1694
+ },
1695
+ {
1696
+ "epoch": 1.57,
1697
+ "grad_norm": 0.5259920954704285,
1698
+ "learning_rate": 4.77341026565693e-05,
1699
+ "loss": 0.1355,
1700
+ "step": 4350
1701
+ },
1702
+ {
1703
+ "epoch": 1.58,
1704
+ "grad_norm": 0.49388599395751953,
1705
+ "learning_rate": 4.743358576752014e-05,
1706
+ "loss": 0.1331,
1707
+ "step": 4375
1708
+ },
1709
+ {
1710
+ "epoch": 1.59,
1711
+ "grad_norm": 0.7215922474861145,
1712
+ "learning_rate": 4.714508955403294e-05,
1713
+ "loss": 0.0916,
1714
+ "step": 4400
1715
+ },
1716
+ {
1717
+ "epoch": 1.59,
1718
+ "eval_loss": 0.23432399332523346,
1719
+ "eval_na_accuracy": 0.8659793734550476,
1720
+ "eval_ordinal_accuracy": 0.6791677474975586,
1721
+ "eval_ordinal_mae": 0.37433725595474243,
1722
+ "eval_runtime": 191.7082,
1723
+ "eval_samples_per_second": 23.343,
1724
+ "eval_steps_per_second": 2.921,
1725
+ "step": 4400
1726
+ },
1727
+ {
1728
+ "epoch": 1.6,
1729
+ "grad_norm": 0.4619504511356354,
1730
+ "learning_rate": 4.684457266498378e-05,
1731
+ "loss": 0.11,
1732
+ "step": 4425
1733
+ },
1734
+ {
1735
+ "epoch": 1.6,
1736
+ "grad_norm": 2.470409393310547,
1737
+ "learning_rate": 4.654405577593461e-05,
1738
+ "loss": 0.138,
1739
+ "step": 4450
1740
+ },
1741
+ {
1742
+ "epoch": 1.61,
1743
+ "grad_norm": 0.46536511182785034,
1744
+ "learning_rate": 4.6243538886885443e-05,
1745
+ "loss": 0.1233,
1746
+ "step": 4475
1747
+ },
1748
+ {
1749
+ "epoch": 1.62,
1750
+ "grad_norm": 0.4539566934108734,
1751
+ "learning_rate": 4.594302199783628e-05,
1752
+ "loss": 0.1266,
1753
+ "step": 4500
1754
+ },
1755
+ {
1756
+ "epoch": 1.62,
1757
+ "eval_loss": 0.22511447966098785,
1758
+ "eval_na_accuracy": 0.8505154848098755,
1759
+ "eval_ordinal_accuracy": 0.6861032843589783,
1760
+ "eval_ordinal_mae": 0.37239742279052734,
1761
+ "eval_runtime": 185.8881,
1762
+ "eval_samples_per_second": 24.074,
1763
+ "eval_steps_per_second": 3.013,
1764
+ "step": 4500
1765
+ },
1766
+ {
1767
+ "epoch": 1.63,
1768
+ "grad_norm": 0.497732013463974,
1769
+ "learning_rate": 4.5642505108787114e-05,
1770
+ "loss": 0.109,
1771
+ "step": 4525
1772
+ },
1773
+ {
1774
+ "epoch": 1.64,
1775
+ "grad_norm": 2.833667278289795,
1776
+ "learning_rate": 4.534198821973795e-05,
1777
+ "loss": 0.1342,
1778
+ "step": 4550
1779
+ },
1780
+ {
1781
+ "epoch": 1.65,
1782
+ "grad_norm": 0.6105310320854187,
1783
+ "learning_rate": 4.504147133068879e-05,
1784
+ "loss": 0.125,
1785
+ "step": 4575
1786
+ },
1787
+ {
1788
+ "epoch": 1.66,
1789
+ "grad_norm": 0.5035513639450073,
1790
+ "learning_rate": 4.474095444163962e-05,
1791
+ "loss": 0.1185,
1792
+ "step": 4600
1793
+ },
1794
+ {
1795
+ "epoch": 1.66,
1796
+ "eval_loss": 0.22424614429473877,
1797
+ "eval_na_accuracy": 0.8264604806900024,
1798
+ "eval_ordinal_accuracy": 0.6902132034301758,
1799
+ "eval_ordinal_mae": 0.36660563945770264,
1800
+ "eval_runtime": 188.1827,
1801
+ "eval_samples_per_second": 23.78,
1802
+ "eval_steps_per_second": 2.976,
1803
+ "step": 4600
1804
+ },
1805
+ {
1806
+ "epoch": 1.67,
1807
+ "grad_norm": 0.6165774464607239,
1808
+ "learning_rate": 4.444043755259046e-05,
1809
+ "loss": 0.1273,
1810
+ "step": 4625
1811
+ },
1812
+ {
1813
+ "epoch": 1.68,
1814
+ "grad_norm": 2.2787024974823,
1815
+ "learning_rate": 4.413992066354129e-05,
1816
+ "loss": 0.1537,
1817
+ "step": 4650
1818
+ },
1819
+ {
1820
+ "epoch": 1.69,
1821
+ "grad_norm": 1.2412303686141968,
1822
+ "learning_rate": 4.3839403774492125e-05,
1823
+ "loss": 0.1197,
1824
+ "step": 4675
1825
+ },
1826
+ {
1827
+ "epoch": 1.69,
1828
+ "grad_norm": 2.671398162841797,
1829
+ "learning_rate": 4.353888688544296e-05,
1830
+ "loss": 0.1037,
1831
+ "step": 4700
1832
+ },
1833
+ {
1834
+ "epoch": 1.69,
1835
+ "eval_loss": 0.22189220786094666,
1836
+ "eval_na_accuracy": 0.8522336483001709,
1837
+ "eval_ordinal_accuracy": 0.6845620274543762,
1838
+ "eval_ordinal_mae": 0.3699798583984375,
1839
+ "eval_runtime": 185.1462,
1840
+ "eval_samples_per_second": 24.17,
1841
+ "eval_steps_per_second": 3.025,
1842
+ "step": 4700
1843
+ },
1844
+ {
1845
+ "epoch": 1.7,
1846
+ "grad_norm": 0.7951823472976685,
1847
+ "learning_rate": 4.32383699963938e-05,
1848
+ "loss": 0.1335,
1849
+ "step": 4725
1850
+ },
1851
+ {
1852
+ "epoch": 1.71,
1853
+ "grad_norm": 0.5750038027763367,
1854
+ "learning_rate": 4.2937853107344634e-05,
1855
+ "loss": 0.1225,
1856
+ "step": 4750
1857
+ },
1858
+ {
1859
+ "epoch": 1.72,
1860
+ "grad_norm": 0.599660336971283,
1861
+ "learning_rate": 4.263733621829547e-05,
1862
+ "loss": 0.1153,
1863
+ "step": 4775
1864
+ },
1865
+ {
1866
+ "epoch": 1.73,
1867
+ "grad_norm": 0.5971084237098694,
1868
+ "learning_rate": 4.233681932924631e-05,
1869
+ "loss": 0.1264,
1870
+ "step": 4800
1871
+ },
1872
+ {
1873
+ "epoch": 1.73,
1874
+ "eval_loss": 0.22111621499061584,
1875
+ "eval_na_accuracy": 0.8350515365600586,
1876
+ "eval_ordinal_accuracy": 0.6891857385635376,
1877
+ "eval_ordinal_mae": 0.36766940355300903,
1878
+ "eval_runtime": 187.791,
1879
+ "eval_samples_per_second": 23.83,
1880
+ "eval_steps_per_second": 2.982,
1881
+ "step": 4800
1882
+ },
1883
+ {
1884
+ "epoch": 1.74,
1885
+ "grad_norm": 0.6335272789001465,
1886
+ "learning_rate": 4.203630244019714e-05,
1887
+ "loss": 0.1309,
1888
+ "step": 4825
1889
+ },
1890
+ {
1891
+ "epoch": 1.75,
1892
+ "grad_norm": 1.4906469583511353,
1893
+ "learning_rate": 4.1735785551147974e-05,
1894
+ "loss": 0.1481,
1895
+ "step": 4850
1896
+ },
1897
+ {
1898
+ "epoch": 1.76,
1899
+ "grad_norm": 0.5834776163101196,
1900
+ "learning_rate": 4.143526866209881e-05,
1901
+ "loss": 0.1082,
1902
+ "step": 4875
1903
+ },
1904
+ {
1905
+ "epoch": 1.77,
1906
+ "grad_norm": 0.3094954788684845,
1907
+ "learning_rate": 4.1134751773049644e-05,
1908
+ "loss": 0.1404,
1909
+ "step": 4900
1910
+ },
1911
+ {
1912
+ "epoch": 1.77,
1913
+ "eval_loss": 0.22064529359340668,
1914
+ "eval_na_accuracy": 0.7938144207000732,
1915
+ "eval_ordinal_accuracy": 0.6945800185203552,
1916
+ "eval_ordinal_mae": 0.371782511472702,
1917
+ "eval_runtime": 187.3419,
1918
+ "eval_samples_per_second": 23.887,
1919
+ "eval_steps_per_second": 2.989,
1920
+ "step": 4900
1921
+ },
1922
+ {
1923
+ "epoch": 1.78,
1924
+ "grad_norm": 0.43263766169548035,
1925
+ "learning_rate": 4.083423488400048e-05,
1926
+ "loss": 0.0901,
1927
+ "step": 4925
1928
+ },
1929
+ {
1930
+ "epoch": 1.79,
1931
+ "grad_norm": 1.0660347938537598,
1932
+ "learning_rate": 4.053371799495132e-05,
1933
+ "loss": 0.1197,
1934
+ "step": 4950
1935
+ },
1936
+ {
1937
+ "epoch": 1.79,
1938
+ "grad_norm": 0.7330833077430725,
1939
+ "learning_rate": 4.023320110590215e-05,
1940
+ "loss": 0.1052,
1941
+ "step": 4975
1942
+ },
1943
+ {
1944
+ "epoch": 1.8,
1945
+ "grad_norm": 2.152076005935669,
1946
+ "learning_rate": 3.993268421685299e-05,
1947
+ "loss": 0.1238,
1948
+ "step": 5000
1949
+ },
1950
+ {
1951
+ "epoch": 1.8,
1952
+ "eval_loss": 0.20976723730564117,
1953
+ "eval_na_accuracy": 0.8264604806900024,
1954
+ "eval_ordinal_accuracy": 0.6948369145393372,
1955
+ "eval_ordinal_mae": 0.37225744128227234,
1956
+ "eval_runtime": 186.1293,
1957
+ "eval_samples_per_second": 24.042,
1958
+ "eval_steps_per_second": 3.009,
1959
+ "step": 5000
1960
+ },
1961
+ {
1962
+ "epoch": 1.81,
1963
+ "grad_norm": 0.28938254714012146,
1964
+ "learning_rate": 3.9632167327803824e-05,
1965
+ "loss": 0.1089,
1966
+ "step": 5025
1967
+ },
1968
+ {
1969
+ "epoch": 1.82,
1970
+ "grad_norm": 0.5542977452278137,
1971
+ "learning_rate": 3.9331650438754655e-05,
1972
+ "loss": 0.1034,
1973
+ "step": 5050
1974
+ },
1975
+ {
1976
+ "epoch": 1.83,
1977
+ "grad_norm": 3.0670604705810547,
1978
+ "learning_rate": 3.9031133549705494e-05,
1979
+ "loss": 0.1207,
1980
+ "step": 5075
1981
+ },
1982
+ {
1983
+ "epoch": 1.84,
1984
+ "grad_norm": 1.1476399898529053,
1985
+ "learning_rate": 3.873061666065633e-05,
1986
+ "loss": 0.0868,
1987
+ "step": 5100
1988
+ },
1989
+ {
1990
+ "epoch": 1.84,
1991
+ "eval_loss": 0.2088630199432373,
1992
+ "eval_na_accuracy": 0.8144329786300659,
1993
+ "eval_ordinal_accuracy": 0.7025430202484131,
1994
+ "eval_ordinal_mae": 0.3573513329029083,
1995
+ "eval_runtime": 186.6283,
1996
+ "eval_samples_per_second": 23.978,
1997
+ "eval_steps_per_second": 3.001,
1998
+ "step": 5100
1999
+ },
2000
+ {
2001
+ "epoch": 1.85,
2002
+ "grad_norm": 1.155088186264038,
2003
+ "learning_rate": 3.8430099771607164e-05,
2004
+ "loss": 0.1285,
2005
+ "step": 5125
2006
+ },
2007
+ {
2008
+ "epoch": 1.86,
2009
+ "grad_norm": 5.079723358154297,
2010
+ "learning_rate": 3.8129582882558e-05,
2011
+ "loss": 0.1503,
2012
+ "step": 5150
2013
+ },
2014
+ {
2015
+ "epoch": 1.87,
2016
+ "grad_norm": 1.3054969310760498,
2017
+ "learning_rate": 3.782906599350884e-05,
2018
+ "loss": 0.1117,
2019
+ "step": 5175
2020
+ },
2021
+ {
2022
+ "epoch": 1.88,
2023
+ "grad_norm": 0.676068902015686,
2024
+ "learning_rate": 3.752854910445967e-05,
2025
+ "loss": 0.0828,
2026
+ "step": 5200
2027
+ },
2028
+ {
2029
+ "epoch": 1.88,
2030
+ "eval_loss": 0.2203822135925293,
2031
+ "eval_na_accuracy": 0.7817869186401367,
2032
+ "eval_ordinal_accuracy": 0.7030567526817322,
2033
+ "eval_ordinal_mae": 0.3679908215999603,
2034
+ "eval_runtime": 186.9501,
2035
+ "eval_samples_per_second": 23.937,
2036
+ "eval_steps_per_second": 2.995,
2037
+ "step": 5200
2038
+ },
2039
+ {
2040
+ "epoch": 1.88,
2041
+ "grad_norm": 0.5627509355545044,
2042
+ "learning_rate": 3.7228032215410505e-05,
2043
+ "loss": 0.1013,
2044
+ "step": 5225
2045
+ },
2046
+ {
2047
+ "epoch": 1.89,
2048
+ "grad_norm": 0.74530029296875,
2049
+ "learning_rate": 3.6927515326361343e-05,
2050
+ "loss": 0.1163,
2051
+ "step": 5250
2052
+ },
2053
+ {
2054
+ "epoch": 1.9,
2055
+ "grad_norm": 0.4432946741580963,
2056
+ "learning_rate": 3.6626998437312175e-05,
2057
+ "loss": 0.1076,
2058
+ "step": 5275
2059
+ },
2060
+ {
2061
+ "epoch": 1.91,
2062
+ "grad_norm": 0.8598417639732361,
2063
+ "learning_rate": 3.6326481548263014e-05,
2064
+ "loss": 0.0986,
2065
+ "step": 5300
2066
+ },
2067
+ {
2068
+ "epoch": 1.91,
2069
+ "eval_loss": 0.21255970001220703,
2070
+ "eval_na_accuracy": 0.8127147555351257,
2071
+ "eval_ordinal_accuracy": 0.6981762051582336,
2072
+ "eval_ordinal_mae": 0.35426145792007446,
2073
+ "eval_runtime": 188.2634,
2074
+ "eval_samples_per_second": 23.77,
2075
+ "eval_steps_per_second": 2.975,
2076
+ "step": 5300
2077
+ },
2078
+ {
2079
+ "epoch": 1.92,
2080
+ "grad_norm": 0.5436437726020813,
2081
+ "learning_rate": 3.602596465921385e-05,
2082
+ "loss": 0.1368,
2083
+ "step": 5325
2084
+ },
2085
+ {
2086
+ "epoch": 1.93,
2087
+ "grad_norm": 0.3918437063694,
2088
+ "learning_rate": 3.5725447770164684e-05,
2089
+ "loss": 0.1062,
2090
+ "step": 5350
2091
+ },
2092
+ {
2093
+ "epoch": 1.94,
2094
+ "grad_norm": 3.4165215492248535,
2095
+ "learning_rate": 3.542493088111552e-05,
2096
+ "loss": 0.1394,
2097
+ "step": 5375
2098
+ },
2099
+ {
2100
+ "epoch": 1.95,
2101
+ "grad_norm": 0.5507489442825317,
2102
+ "learning_rate": 3.5124413992066354e-05,
2103
+ "loss": 0.0869,
2104
+ "step": 5400
2105
+ },
2106
+ {
2107
+ "epoch": 1.95,
2108
+ "eval_loss": 0.2247212827205658,
2109
+ "eval_na_accuracy": 0.80756014585495,
2110
+ "eval_ordinal_accuracy": 0.7107629179954529,
2111
+ "eval_ordinal_mae": 0.35320159792900085,
2112
+ "eval_runtime": 186.4594,
2113
+ "eval_samples_per_second": 24.0,
2114
+ "eval_steps_per_second": 3.003,
2115
+ "step": 5400
2116
+ },
2117
+ {
2118
+ "epoch": 1.96,
2119
+ "grad_norm": 0.36401164531707764,
2120
+ "learning_rate": 3.4823897103017186e-05,
2121
+ "loss": 0.1216,
2122
+ "step": 5425
2123
+ },
2124
+ {
2125
+ "epoch": 1.97,
2126
+ "grad_norm": 0.5114885568618774,
2127
+ "learning_rate": 3.4523380213968025e-05,
2128
+ "loss": 0.1034,
2129
+ "step": 5450
2130
+ },
2131
+ {
2132
+ "epoch": 1.97,
2133
+ "grad_norm": 0.5193465352058411,
2134
+ "learning_rate": 3.422286332491886e-05,
2135
+ "loss": 0.095,
2136
+ "step": 5475
2137
+ },
2138
+ {
2139
+ "epoch": 1.98,
2140
+ "grad_norm": 0.3319372236728668,
2141
+ "learning_rate": 3.3922346435869695e-05,
2142
+ "loss": 0.1006,
2143
+ "step": 5500
2144
+ },
2145
+ {
2146
+ "epoch": 1.98,
2147
+ "eval_loss": 0.22681304812431335,
2148
+ "eval_na_accuracy": 0.8161512017250061,
2149
+ "eval_ordinal_accuracy": 0.702799916267395,
2150
+ "eval_ordinal_mae": 0.3637341260910034,
2151
+ "eval_runtime": 187.993,
2152
+ "eval_samples_per_second": 23.804,
2153
+ "eval_steps_per_second": 2.979,
2154
+ "step": 5500
2155
+ },
2156
+ {
2157
+ "epoch": 1.99,
2158
+ "grad_norm": 0.6313589215278625,
2159
+ "learning_rate": 3.3621829546820533e-05,
2160
+ "loss": 0.1265,
2161
+ "step": 5525
2162
+ },
2163
+ {
2164
+ "epoch": 2.0,
2165
+ "grad_norm": 0.3126126825809479,
2166
+ "learning_rate": 3.332131265777137e-05,
2167
+ "loss": 0.1002,
2168
+ "step": 5550
2169
+ },
2170
+ {
2171
+ "epoch": 2.01,
2172
+ "grad_norm": 0.7703510522842407,
2173
+ "learning_rate": 3.3020795768722204e-05,
2174
+ "loss": 0.065,
2175
+ "step": 5575
2176
+ },
2177
+ {
2178
+ "epoch": 2.02,
2179
+ "grad_norm": 0.5371888875961304,
2180
+ "learning_rate": 3.2720278879673036e-05,
2181
+ "loss": 0.0639,
2182
+ "step": 5600
2183
+ },
2184
+ {
2185
+ "epoch": 2.02,
2186
+ "eval_loss": 0.22521378099918365,
2187
+ "eval_na_accuracy": 0.8109965920448303,
2188
+ "eval_ordinal_accuracy": 0.7069098353385925,
2189
+ "eval_ordinal_mae": 0.3478536903858185,
2190
+ "eval_runtime": 186.1883,
2191
+ "eval_samples_per_second": 24.035,
2192
+ "eval_steps_per_second": 3.008,
2193
+ "step": 5600
2194
+ },
2195
+ {
2196
+ "epoch": 2.03,
2197
+ "grad_norm": 0.3779759109020233,
2198
+ "learning_rate": 3.2419761990623874e-05,
2199
+ "loss": 0.0491,
2200
+ "step": 5625
2201
+ },
2202
+ {
2203
+ "epoch": 2.04,
2204
+ "grad_norm": 0.3881787657737732,
2205
+ "learning_rate": 3.2119245101574706e-05,
2206
+ "loss": 0.0475,
2207
+ "step": 5650
2208
+ },
2209
+ {
2210
+ "epoch": 2.05,
2211
+ "grad_norm": 0.2288794368505478,
2212
+ "learning_rate": 3.1818728212525544e-05,
2213
+ "loss": 0.062,
2214
+ "step": 5675
2215
+ },
2216
+ {
2217
+ "epoch": 2.06,
2218
+ "grad_norm": 0.4537404477596283,
2219
+ "learning_rate": 3.151821132347638e-05,
2220
+ "loss": 0.0569,
2221
+ "step": 5700
2222
+ },
2223
+ {
2224
+ "epoch": 2.06,
2225
+ "eval_loss": 0.23154324293136597,
2226
+ "eval_na_accuracy": 0.80756014585495,
2227
+ "eval_ordinal_accuracy": 0.7166709303855896,
2228
+ "eval_ordinal_mae": 0.3399028480052948,
2229
+ "eval_runtime": 188.0319,
2230
+ "eval_samples_per_second": 23.799,
2231
+ "eval_steps_per_second": 2.978,
2232
+ "step": 5700
2233
+ },
2234
+ {
2235
+ "epoch": 2.06,
2236
+ "grad_norm": 0.4112221300601959,
2237
+ "learning_rate": 3.1217694434427215e-05,
2238
+ "loss": 0.0513,
2239
+ "step": 5725
2240
+ },
2241
+ {
2242
+ "epoch": 2.07,
2243
+ "grad_norm": 0.4015844166278839,
2244
+ "learning_rate": 3.091717754537805e-05,
2245
+ "loss": 0.0608,
2246
+ "step": 5750
2247
+ },
2248
+ {
2249
+ "epoch": 2.08,
2250
+ "grad_norm": 0.43563222885131836,
2251
+ "learning_rate": 3.0616660656328885e-05,
2252
+ "loss": 0.0463,
2253
+ "step": 5775
2254
+ },
2255
+ {
2256
+ "epoch": 2.09,
2257
+ "grad_norm": 2.328296422958374,
2258
+ "learning_rate": 3.031614376727972e-05,
2259
+ "loss": 0.0626,
2260
+ "step": 5800
2261
+ },
2262
+ {
2263
+ "epoch": 2.09,
2264
+ "eval_loss": 0.2304299920797348,
2265
+ "eval_na_accuracy": 0.8127147555351257,
2266
+ "eval_ordinal_accuracy": 0.702799916267395,
2267
+ "eval_ordinal_mae": 0.34806403517723083,
2268
+ "eval_runtime": 186.3734,
2269
+ "eval_samples_per_second": 24.011,
2270
+ "eval_steps_per_second": 3.005,
2271
+ "step": 5800
2272
+ },
2273
+ {
2274
+ "epoch": 2.1,
2275
+ "grad_norm": 0.5192309021949768,
2276
+ "learning_rate": 3.001562687823056e-05,
2277
+ "loss": 0.0506,
2278
+ "step": 5825
2279
+ },
2280
+ {
2281
+ "epoch": 2.11,
2282
+ "grad_norm": 0.7943512201309204,
2283
+ "learning_rate": 2.9715109989181394e-05,
2284
+ "loss": 0.0612,
2285
+ "step": 5850
2286
+ },
2287
+ {
2288
+ "epoch": 2.12,
2289
+ "grad_norm": 0.5543213486671448,
2290
+ "learning_rate": 2.9414593100132226e-05,
2291
+ "loss": 0.0571,
2292
+ "step": 5875
2293
+ },
2294
+ {
2295
+ "epoch": 2.13,
2296
+ "grad_norm": 0.32637158036231995,
2297
+ "learning_rate": 2.9114076211083064e-05,
2298
+ "loss": 0.0502,
2299
+ "step": 5900
2300
+ },
2301
+ {
2302
+ "epoch": 2.13,
2303
+ "eval_loss": 0.23814311623573303,
2304
+ "eval_na_accuracy": 0.8092783689498901,
2305
+ "eval_ordinal_accuracy": 0.6953506469726562,
2306
+ "eval_ordinal_mae": 0.3623509407043457,
2307
+ "eval_runtime": 185.1929,
2308
+ "eval_samples_per_second": 24.164,
2309
+ "eval_steps_per_second": 3.024,
2310
+ "step": 5900
2311
+ },
2312
+ {
2313
+ "epoch": 2.14,
2314
+ "grad_norm": 0.4265578091144562,
2315
+ "learning_rate": 2.88135593220339e-05,
2316
+ "loss": 0.0529,
2317
+ "step": 5925
2318
+ },
2319
+ {
2320
+ "epoch": 2.15,
2321
+ "grad_norm": 0.5854742527008057,
2322
+ "learning_rate": 2.8513042432984738e-05,
2323
+ "loss": 0.0458,
2324
+ "step": 5950
2325
+ },
2326
+ {
2327
+ "epoch": 2.15,
2328
+ "grad_norm": 0.6325811147689819,
2329
+ "learning_rate": 2.821252554393557e-05,
2330
+ "loss": 0.0422,
2331
+ "step": 5975
2332
+ },
2333
+ {
2334
+ "epoch": 2.16,
2335
+ "grad_norm": 0.3043919503688812,
2336
+ "learning_rate": 2.7912008654886408e-05,
2337
+ "loss": 0.0541,
2338
+ "step": 6000
2339
+ },
2340
+ {
2341
+ "epoch": 2.16,
2342
+ "eval_loss": 0.2298140674829483,
2343
+ "eval_na_accuracy": 0.8109965920448303,
2344
+ "eval_ordinal_accuracy": 0.7159003615379333,
2345
+ "eval_ordinal_mae": 0.34048062562942505,
2346
+ "eval_runtime": 188.0938,
2347
+ "eval_samples_per_second": 23.791,
2348
+ "eval_steps_per_second": 2.977,
2349
+ "step": 6000
2350
+ },
2351
+ {
2352
+ "epoch": 2.17,
2353
+ "grad_norm": 0.5921896696090698,
2354
+ "learning_rate": 2.7611491765837243e-05,
2355
+ "loss": 0.0441,
2356
+ "step": 6025
2357
+ },
2358
+ {
2359
+ "epoch": 2.18,
2360
+ "grad_norm": 0.5739880204200745,
2361
+ "learning_rate": 2.7310974876788075e-05,
2362
+ "loss": 0.0399,
2363
+ "step": 6050
2364
+ },
2365
+ {
2366
+ "epoch": 2.19,
2367
+ "grad_norm": 0.6747081279754639,
2368
+ "learning_rate": 2.7010457987738914e-05,
2369
+ "loss": 0.0467,
2370
+ "step": 6075
2371
+ },
2372
+ {
2373
+ "epoch": 2.2,
2374
+ "grad_norm": 0.5710461139678955,
2375
+ "learning_rate": 2.670994109868975e-05,
2376
+ "loss": 0.0671,
2377
+ "step": 6100
2378
+ },
2379
+ {
2380
+ "epoch": 2.2,
2381
+ "eval_loss": 0.24321676790714264,
2382
+ "eval_na_accuracy": 0.7989690899848938,
2383
+ "eval_ordinal_accuracy": 0.7030567526817322,
2384
+ "eval_ordinal_mae": 0.3529086709022522,
2385
+ "eval_runtime": 188.2916,
2386
+ "eval_samples_per_second": 23.766,
2387
+ "eval_steps_per_second": 2.974,
2388
+ "step": 6100
2389
+ },
2390
+ {
2391
+ "epoch": 2.21,
2392
+ "grad_norm": 0.3675624430179596,
2393
+ "learning_rate": 2.640942420964058e-05,
2394
+ "loss": 0.0466,
2395
+ "step": 6125
2396
+ },
2397
+ {
2398
+ "epoch": 2.22,
2399
+ "grad_norm": 0.31548529863357544,
2400
+ "learning_rate": 2.610890732059142e-05,
2401
+ "loss": 0.0764,
2402
+ "step": 6150
2403
+ },
2404
+ {
2405
+ "epoch": 2.23,
2406
+ "grad_norm": 0.43977534770965576,
2407
+ "learning_rate": 2.5808390431542258e-05,
2408
+ "loss": 0.0475,
2409
+ "step": 6175
2410
+ },
2411
+ {
2412
+ "epoch": 2.24,
2413
+ "grad_norm": 0.6533438563346863,
2414
+ "learning_rate": 2.550787354249309e-05,
2415
+ "loss": 0.0672,
2416
+ "step": 6200
2417
+ },
2418
+ {
2419
+ "epoch": 2.24,
2420
+ "eval_loss": 0.2430579513311386,
2421
+ "eval_na_accuracy": 0.7714776396751404,
2422
+ "eval_ordinal_accuracy": 0.7194965481758118,
2423
+ "eval_ordinal_mae": 0.3360508680343628,
2424
+ "eval_runtime": 186.7478,
2425
+ "eval_samples_per_second": 23.963,
2426
+ "eval_steps_per_second": 2.999,
2427
+ "step": 6200
2428
+ },
2429
+ {
2430
+ "epoch": 2.24,
2431
+ "grad_norm": 0.2441914677619934,
2432
+ "learning_rate": 2.5207356653443925e-05,
2433
+ "loss": 0.0503,
2434
+ "step": 6225
2435
+ },
2436
+ {
2437
+ "epoch": 2.25,
2438
+ "grad_norm": 0.3184777796268463,
2439
+ "learning_rate": 2.490683976439476e-05,
2440
+ "loss": 0.0641,
2441
+ "step": 6250
2442
+ },
2443
+ {
2444
+ "epoch": 2.26,
2445
+ "grad_norm": 0.5295473337173462,
2446
+ "learning_rate": 2.4606322875345595e-05,
2447
+ "loss": 0.0453,
2448
+ "step": 6275
2449
+ },
2450
+ {
2451
+ "epoch": 2.27,
2452
+ "grad_norm": 0.6061444282531738,
2453
+ "learning_rate": 2.430580598629643e-05,
2454
+ "loss": 0.0446,
2455
+ "step": 6300
2456
+ },
2457
+ {
2458
+ "epoch": 2.27,
2459
+ "eval_loss": 0.24466152489185333,
2460
+ "eval_na_accuracy": 0.7938144207000732,
2461
+ "eval_ordinal_accuracy": 0.7141022086143494,
2462
+ "eval_ordinal_mae": 0.34008586406707764,
2463
+ "eval_runtime": 184.0098,
2464
+ "eval_samples_per_second": 24.319,
2465
+ "eval_steps_per_second": 3.043,
2466
+ "step": 6300
2467
+ },
2468
+ {
2469
+ "epoch": 2.28,
2470
+ "grad_norm": 0.7771974802017212,
2471
+ "learning_rate": 2.4005289097247265e-05,
2472
+ "loss": 0.0506,
2473
+ "step": 6325
2474
+ },
2475
+ {
2476
+ "epoch": 2.29,
2477
+ "grad_norm": 0.48779726028442383,
2478
+ "learning_rate": 2.37047722081981e-05,
2479
+ "loss": 0.0625,
2480
+ "step": 6350
2481
+ },
2482
+ {
2483
+ "epoch": 2.3,
2484
+ "grad_norm": 0.5689972639083862,
2485
+ "learning_rate": 2.340425531914894e-05,
2486
+ "loss": 0.0504,
2487
+ "step": 6375
2488
+ },
2489
+ {
2490
+ "epoch": 2.31,
2491
+ "grad_norm": 0.35768938064575195,
2492
+ "learning_rate": 2.310373843009977e-05,
2493
+ "loss": 0.0424,
2494
+ "step": 6400
2495
+ },
2496
+ {
2497
+ "epoch": 2.31,
2498
+ "eval_loss": 0.24263423681259155,
2499
+ "eval_na_accuracy": 0.8161512017250061,
2500
+ "eval_ordinal_accuracy": 0.7017723917961121,
2501
+ "eval_ordinal_mae": 0.34848281741142273,
2502
+ "eval_runtime": 184.8006,
2503
+ "eval_samples_per_second": 24.215,
2504
+ "eval_steps_per_second": 3.03,
2505
+ "step": 6400
2506
+ },
2507
+ {
2508
+ "epoch": 2.32,
2509
+ "grad_norm": 0.28792133927345276,
2510
+ "learning_rate": 2.280322154105061e-05,
2511
+ "loss": 0.0444,
2512
+ "step": 6425
2513
+ },
2514
+ {
2515
+ "epoch": 2.33,
2516
+ "grad_norm": 0.5131263136863708,
2517
+ "learning_rate": 2.2502704652001444e-05,
2518
+ "loss": 0.0494,
2519
+ "step": 6450
2520
+ },
2521
+ {
2522
+ "epoch": 2.34,
2523
+ "grad_norm": 0.4703335464000702,
2524
+ "learning_rate": 2.220218776295228e-05,
2525
+ "loss": 0.0568,
2526
+ "step": 6475
2527
+ },
2528
+ {
2529
+ "epoch": 2.34,
2530
+ "grad_norm": 0.48926159739494324,
2531
+ "learning_rate": 2.1901670873903115e-05,
2532
+ "loss": 0.0386,
2533
+ "step": 6500
2534
+ },
2535
+ {
2536
+ "epoch": 2.34,
2537
+ "eval_loss": 0.24884438514709473,
2538
+ "eval_na_accuracy": 0.8127147555351257,
2539
+ "eval_ordinal_accuracy": 0.7123041152954102,
2540
+ "eval_ordinal_mae": 0.33866390585899353,
2541
+ "eval_runtime": 182.7756,
2542
+ "eval_samples_per_second": 24.484,
2543
+ "eval_steps_per_second": 3.064,
2544
+ "step": 6500
2545
+ },
2546
+ {
2547
+ "epoch": 2.35,
2548
+ "grad_norm": 0.7936327457427979,
2549
+ "learning_rate": 2.160115398485395e-05,
2550
+ "loss": 0.0553,
2551
+ "step": 6525
2552
+ },
2553
+ {
2554
+ "epoch": 2.36,
2555
+ "grad_norm": 0.41186073422431946,
2556
+ "learning_rate": 2.1300637095804785e-05,
2557
+ "loss": 0.0441,
2558
+ "step": 6550
2559
+ },
2560
+ {
2561
+ "epoch": 2.37,
2562
+ "grad_norm": 0.7716944813728333,
2563
+ "learning_rate": 2.1000120206755623e-05,
2564
+ "loss": 0.0562,
2565
+ "step": 6575
2566
+ },
2567
+ {
2568
+ "epoch": 2.38,
2569
+ "grad_norm": 0.40576568245887756,
2570
+ "learning_rate": 2.0699603317706455e-05,
2571
+ "loss": 0.0736,
2572
+ "step": 6600
2573
+ },
2574
+ {
2575
+ "epoch": 2.38,
2576
+ "eval_loss": 0.24542821943759918,
2577
+ "eval_na_accuracy": 0.831615149974823,
2578
+ "eval_ordinal_accuracy": 0.7053686380386353,
2579
+ "eval_ordinal_mae": 0.3381500244140625,
2580
+ "eval_runtime": 188.0054,
2581
+ "eval_samples_per_second": 23.803,
2582
+ "eval_steps_per_second": 2.979,
2583
+ "step": 6600
2584
+ },
2585
+ {
2586
+ "epoch": 2.39,
2587
+ "grad_norm": 2.147202968597412,
2588
+ "learning_rate": 2.039908642865729e-05,
2589
+ "loss": 0.0552,
2590
+ "step": 6625
2591
+ },
2592
+ {
2593
+ "epoch": 2.4,
2594
+ "grad_norm": 1.0888266563415527,
2595
+ "learning_rate": 2.009856953960813e-05,
2596
+ "loss": 0.0598,
2597
+ "step": 6650
2598
+ },
2599
+ {
2600
+ "epoch": 2.41,
2601
+ "grad_norm": 0.6044988036155701,
2602
+ "learning_rate": 1.9798052650558964e-05,
2603
+ "loss": 0.039,
2604
+ "step": 6675
2605
+ },
2606
+ {
2607
+ "epoch": 2.42,
2608
+ "grad_norm": 0.5873610973358154,
2609
+ "learning_rate": 1.9497535761509796e-05,
2610
+ "loss": 0.0421,
2611
+ "step": 6700
2612
+ },
2613
+ {
2614
+ "epoch": 2.42,
2615
+ "eval_loss": 0.25130367279052734,
2616
+ "eval_na_accuracy": 0.831615149974823,
2617
+ "eval_ordinal_accuracy": 0.712047278881073,
2618
+ "eval_ordinal_mae": 0.3393591046333313,
2619
+ "eval_runtime": 185.2317,
2620
+ "eval_samples_per_second": 24.159,
2621
+ "eval_steps_per_second": 3.023,
2622
+ "step": 6700
2623
+ },
2624
+ {
2625
+ "epoch": 2.43,
2626
+ "grad_norm": 0.5890676975250244,
2627
+ "learning_rate": 1.9197018872460634e-05,
2628
+ "loss": 0.0717,
2629
+ "step": 6725
2630
+ },
2631
+ {
2632
+ "epoch": 2.43,
2633
+ "grad_norm": 0.3779636025428772,
2634
+ "learning_rate": 1.889650198341147e-05,
2635
+ "loss": 0.0559,
2636
+ "step": 6750
2637
+ },
2638
+ {
2639
+ "epoch": 2.44,
2640
+ "grad_norm": 0.38733378052711487,
2641
+ "learning_rate": 1.8595985094362305e-05,
2642
+ "loss": 0.0458,
2643
+ "step": 6775
2644
+ },
2645
+ {
2646
+ "epoch": 2.45,
2647
+ "grad_norm": 0.6435078978538513,
2648
+ "learning_rate": 1.829546820531314e-05,
2649
+ "loss": 0.0607,
2650
+ "step": 6800
2651
+ },
2652
+ {
2653
+ "epoch": 2.45,
2654
+ "eval_loss": 0.2546432614326477,
2655
+ "eval_na_accuracy": 0.8264604806900024,
2656
+ "eval_ordinal_accuracy": 0.7092216610908508,
2657
+ "eval_ordinal_mae": 0.33696386218070984,
2658
+ "eval_runtime": 185.2223,
2659
+ "eval_samples_per_second": 24.16,
2660
+ "eval_steps_per_second": 3.023,
2661
+ "step": 6800
2662
+ },
2663
+ {
2664
+ "epoch": 2.46,
2665
+ "grad_norm": 0.34649285674095154,
2666
+ "learning_rate": 1.7994951316263975e-05,
2667
+ "loss": 0.0459,
2668
+ "step": 6825
2669
+ },
2670
+ {
2671
+ "epoch": 2.47,
2672
+ "grad_norm": 0.7860547304153442,
2673
+ "learning_rate": 1.769443442721481e-05,
2674
+ "loss": 0.0483,
2675
+ "step": 6850
2676
+ },
2677
+ {
2678
+ "epoch": 2.48,
2679
+ "grad_norm": 0.4681091904640198,
2680
+ "learning_rate": 1.7393917538165645e-05,
2681
+ "loss": 0.0481,
2682
+ "step": 6875
2683
+ },
2684
+ {
2685
+ "epoch": 2.49,
2686
+ "grad_norm": 1.788452386856079,
2687
+ "learning_rate": 1.709340064911648e-05,
2688
+ "loss": 0.0517,
2689
+ "step": 6900
2690
+ },
2691
+ {
2692
+ "epoch": 2.49,
2693
+ "eval_loss": 0.25944724678993225,
2694
+ "eval_na_accuracy": 0.8298969268798828,
2695
+ "eval_ordinal_accuracy": 0.7081941962242126,
2696
+ "eval_ordinal_mae": 0.3375920057296753,
2697
+ "eval_runtime": 187.8946,
2698
+ "eval_samples_per_second": 23.817,
2699
+ "eval_steps_per_second": 2.98,
2700
+ "step": 6900
2701
+ },
2702
+ {
2703
+ "epoch": 2.5,
2704
+ "grad_norm": 0.35276201367378235,
2705
+ "learning_rate": 1.6792883760067316e-05,
2706
+ "loss": 0.0602,
2707
+ "step": 6925
2708
+ },
2709
+ {
2710
+ "epoch": 2.51,
2711
+ "grad_norm": 0.607291042804718,
2712
+ "learning_rate": 1.6492366871018154e-05,
2713
+ "loss": 0.0426,
2714
+ "step": 6950
2715
+ },
2716
+ {
2717
+ "epoch": 2.52,
2718
+ "grad_norm": 0.29660192131996155,
2719
+ "learning_rate": 1.6191849981968986e-05,
2720
+ "loss": 0.0499,
2721
+ "step": 6975
2722
+ },
2723
+ {
2724
+ "epoch": 2.52,
2725
+ "grad_norm": 0.6227554082870483,
2726
+ "learning_rate": 1.589133309291982e-05,
2727
+ "loss": 0.062,
2728
+ "step": 7000
2729
+ },
2730
+ {
2731
+ "epoch": 2.52,
2732
+ "eval_loss": 0.2532944679260254,
2733
+ "eval_na_accuracy": 0.8109965920448303,
2734
+ "eval_ordinal_accuracy": 0.710506021976471,
2735
+ "eval_ordinal_mae": 0.33686304092407227,
2736
+ "eval_runtime": 184.4841,
2737
+ "eval_samples_per_second": 24.257,
2738
+ "eval_steps_per_second": 3.035,
2739
+ "step": 7000
2740
+ },
2741
+ {
2742
+ "epoch": 2.53,
2743
+ "grad_norm": 0.37592121958732605,
2744
+ "learning_rate": 1.559081620387066e-05,
2745
+ "loss": 0.0529,
2746
+ "step": 7025
2747
+ },
2748
+ {
2749
+ "epoch": 2.54,
2750
+ "grad_norm": 0.7044575810432434,
2751
+ "learning_rate": 1.5290299314821495e-05,
2752
+ "loss": 0.0555,
2753
+ "step": 7050
2754
+ },
2755
+ {
2756
+ "epoch": 2.55,
2757
+ "grad_norm": 0.47716042399406433,
2758
+ "learning_rate": 1.4989782425772328e-05,
2759
+ "loss": 0.0424,
2760
+ "step": 7075
2761
+ },
2762
+ {
2763
+ "epoch": 2.56,
2764
+ "grad_norm": 0.6846262812614441,
2765
+ "learning_rate": 1.4689265536723165e-05,
2766
+ "loss": 0.0664,
2767
+ "step": 7100
2768
+ },
2769
+ {
2770
+ "epoch": 2.56,
2771
+ "eval_loss": 0.2534230649471283,
2772
+ "eval_na_accuracy": 0.8024054765701294,
2773
+ "eval_ordinal_accuracy": 0.7184690237045288,
2774
+ "eval_ordinal_mae": 0.33289140462875366,
2775
+ "eval_runtime": 185.2999,
2776
+ "eval_samples_per_second": 24.15,
2777
+ "eval_steps_per_second": 3.022,
2778
+ "step": 7100
2779
+ },
2780
+ {
2781
+ "epoch": 2.57,
2782
+ "grad_norm": 0.8216082453727722,
2783
+ "learning_rate": 1.4388748647674e-05,
2784
+ "loss": 0.0481,
2785
+ "step": 7125
2786
+ },
2787
+ {
2788
+ "epoch": 2.58,
2789
+ "grad_norm": 0.8088984489440918,
2790
+ "learning_rate": 1.4088231758624834e-05,
2791
+ "loss": 0.0575,
2792
+ "step": 7150
2793
+ },
2794
+ {
2795
+ "epoch": 2.59,
2796
+ "grad_norm": 0.7712150812149048,
2797
+ "learning_rate": 1.3787714869575672e-05,
2798
+ "loss": 0.0464,
2799
+ "step": 7175
2800
+ },
2801
+ {
2802
+ "epoch": 2.6,
2803
+ "grad_norm": 0.2501738369464874,
2804
+ "learning_rate": 1.3487197980526506e-05,
2805
+ "loss": 0.0389,
2806
+ "step": 7200
2807
+ },
2808
+ {
2809
+ "epoch": 2.6,
2810
+ "eval_loss": 0.246970534324646,
2811
+ "eval_na_accuracy": 0.8092783689498901,
2812
+ "eval_ordinal_accuracy": 0.7259182929992676,
2813
+ "eval_ordinal_mae": 0.328827440738678,
2814
+ "eval_runtime": 185.0735,
2815
+ "eval_samples_per_second": 24.18,
2816
+ "eval_steps_per_second": 3.026,
2817
+ "step": 7200
2818
+ },
2819
+ {
2820
+ "epoch": 2.61,
2821
+ "grad_norm": 0.6835588216781616,
2822
+ "learning_rate": 1.318668109147734e-05,
2823
+ "loss": 0.0475,
2824
+ "step": 7225
2825
+ },
2826
+ {
2827
+ "epoch": 2.61,
2828
+ "grad_norm": 0.5996441841125488,
2829
+ "learning_rate": 1.2886164202428178e-05,
2830
+ "loss": 0.0443,
2831
+ "step": 7250
2832
+ },
2833
+ {
2834
+ "epoch": 2.62,
2835
+ "grad_norm": 2.0018677711486816,
2836
+ "learning_rate": 1.2585647313379013e-05,
2837
+ "loss": 0.0425,
2838
+ "step": 7275
2839
+ },
2840
+ {
2841
+ "epoch": 2.63,
2842
+ "grad_norm": 0.300843209028244,
2843
+ "learning_rate": 1.2285130424329848e-05,
2844
+ "loss": 0.0671,
2845
+ "step": 7300
2846
+ },
2847
+ {
2848
+ "epoch": 2.63,
2849
+ "eval_loss": 0.25160375237464905,
2850
+ "eval_na_accuracy": 0.8041236996650696,
2851
+ "eval_ordinal_accuracy": 0.7159003615379333,
2852
+ "eval_ordinal_mae": 0.3293640613555908,
2853
+ "eval_runtime": 186.9929,
2854
+ "eval_samples_per_second": 23.931,
2855
+ "eval_steps_per_second": 2.995,
2856
+ "step": 7300
2857
+ },
2858
+ {
2859
+ "epoch": 2.64,
2860
+ "grad_norm": 0.677975058555603,
2861
+ "learning_rate": 1.1984613535280683e-05,
2862
+ "loss": 0.0438,
2863
+ "step": 7325
2864
+ },
2865
+ {
2866
+ "epoch": 2.65,
2867
+ "grad_norm": 0.45172086358070374,
2868
+ "learning_rate": 1.1684096646231518e-05,
2869
+ "loss": 0.0503,
2870
+ "step": 7350
2871
+ },
2872
+ {
2873
+ "epoch": 2.66,
2874
+ "grad_norm": 0.8617995381355286,
2875
+ "learning_rate": 1.1383579757182353e-05,
2876
+ "loss": 0.0426,
2877
+ "step": 7375
2878
+ },
2879
+ {
2880
+ "epoch": 2.67,
2881
+ "grad_norm": 0.47193852066993713,
2882
+ "learning_rate": 1.108306286813319e-05,
2883
+ "loss": 0.0416,
2884
+ "step": 7400
2885
+ },
2886
+ {
2887
+ "epoch": 2.67,
2888
+ "eval_loss": 0.25071558356285095,
2889
+ "eval_na_accuracy": 0.8058419227600098,
2890
+ "eval_ordinal_accuracy": 0.7133316397666931,
2891
+ "eval_ordinal_mae": 0.33067333698272705,
2892
+ "eval_runtime": 187.7549,
2893
+ "eval_samples_per_second": 23.834,
2894
+ "eval_steps_per_second": 2.983,
2895
+ "step": 7400
2896
+ },
2897
+ {
2898
+ "epoch": 2.68,
2899
+ "grad_norm": 1.0205371379852295,
2900
+ "learning_rate": 1.0782545979084024e-05,
2901
+ "loss": 0.0495,
2902
+ "step": 7425
2903
+ },
2904
+ {
2905
+ "epoch": 2.69,
2906
+ "grad_norm": 0.8885800838470459,
2907
+ "learning_rate": 1.048202909003486e-05,
2908
+ "loss": 0.0407,
2909
+ "step": 7450
2910
+ },
2911
+ {
2912
+ "epoch": 2.7,
2913
+ "grad_norm": 0.6127232313156128,
2914
+ "learning_rate": 1.0181512200985696e-05,
2915
+ "loss": 0.0611,
2916
+ "step": 7475
2917
+ },
2918
+ {
2919
+ "epoch": 2.7,
2920
+ "grad_norm": 0.7351526021957397,
2921
+ "learning_rate": 9.880995311936533e-06,
2922
+ "loss": 0.0541,
2923
+ "step": 7500
2924
+ },
2925
+ {
2926
+ "epoch": 2.7,
2927
+ "eval_loss": 0.25292566418647766,
2928
+ "eval_na_accuracy": 0.8058419227600098,
2929
+ "eval_ordinal_accuracy": 0.71101975440979,
2930
+ "eval_ordinal_mae": 0.335502564907074,
2931
+ "eval_runtime": 185.7438,
2932
+ "eval_samples_per_second": 24.092,
2933
+ "eval_steps_per_second": 3.015,
2934
+ "step": 7500
2935
+ },
2936
+ {
2937
+ "epoch": 2.71,
2938
+ "grad_norm": 0.715017557144165,
2939
+ "learning_rate": 9.580478422887366e-06,
2940
+ "loss": 0.0581,
2941
+ "step": 7525
2942
+ },
2943
+ {
2944
+ "epoch": 2.72,
2945
+ "grad_norm": 0.2852751910686493,
2946
+ "learning_rate": 9.279961533838203e-06,
2947
+ "loss": 0.0507,
2948
+ "step": 7550
2949
+ },
2950
+ {
2951
+ "epoch": 2.73,
2952
+ "grad_norm": 0.5301045775413513,
2953
+ "learning_rate": 8.979444644789038e-06,
2954
+ "loss": 0.0427,
2955
+ "step": 7575
2956
+ },
2957
+ {
2958
+ "epoch": 2.74,
2959
+ "grad_norm": 0.3962666094303131,
2960
+ "learning_rate": 8.678927755739873e-06,
2961
+ "loss": 0.0374,
2962
+ "step": 7600
2963
+ },
2964
+ {
2965
+ "epoch": 2.74,
2966
+ "eval_loss": 0.2529982626438141,
2967
+ "eval_na_accuracy": 0.8109965920448303,
2968
+ "eval_ordinal_accuracy": 0.7148728370666504,
2969
+ "eval_ordinal_mae": 0.33152100443840027,
2970
+ "eval_runtime": 184.9088,
2971
+ "eval_samples_per_second": 24.201,
2972
+ "eval_steps_per_second": 3.029,
2973
+ "step": 7600
2974
+ },
2975
+ {
2976
+ "epoch": 2.75,
2977
+ "grad_norm": 0.5724599957466125,
2978
+ "learning_rate": 8.378410866690708e-06,
2979
+ "loss": 0.0383,
2980
+ "step": 7625
2981
+ },
2982
+ {
2983
+ "epoch": 2.76,
2984
+ "grad_norm": 0.6337906718254089,
2985
+ "learning_rate": 8.077893977641545e-06,
2986
+ "loss": 0.0339,
2987
+ "step": 7650
2988
+ },
2989
+ {
2990
+ "epoch": 2.77,
2991
+ "grad_norm": 0.645459771156311,
2992
+ "learning_rate": 7.777377088592379e-06,
2993
+ "loss": 0.0485,
2994
+ "step": 7675
2995
+ },
2996
+ {
2997
+ "epoch": 2.78,
2998
+ "grad_norm": 5.039384841918945,
2999
+ "learning_rate": 7.476860199543215e-06,
3000
+ "loss": 0.04,
3001
+ "step": 7700
3002
+ },
3003
+ {
3004
+ "epoch": 2.78,
3005
+ "eval_loss": 0.2519625723361969,
3006
+ "eval_na_accuracy": 0.80756014585495,
3007
+ "eval_ordinal_accuracy": 0.7166709303855896,
3008
+ "eval_ordinal_mae": 0.32904064655303955,
3009
+ "eval_runtime": 188.802,
3010
+ "eval_samples_per_second": 23.702,
3011
+ "eval_steps_per_second": 2.966,
3012
+ "step": 7700
3013
+ },
3014
+ {
3015
+ "epoch": 2.79,
3016
+ "grad_norm": 0.30601269006729126,
3017
+ "learning_rate": 7.176343310494051e-06,
3018
+ "loss": 0.053,
3019
+ "step": 7725
3020
+ },
3021
+ {
3022
+ "epoch": 2.79,
3023
+ "grad_norm": 0.6143732070922852,
3024
+ "learning_rate": 6.875826421444885e-06,
3025
+ "loss": 0.0433,
3026
+ "step": 7750
3027
+ },
3028
+ {
3029
+ "epoch": 2.8,
3030
+ "grad_norm": 0.5583890080451965,
3031
+ "learning_rate": 6.575309532395721e-06,
3032
+ "loss": 0.0534,
3033
+ "step": 7775
3034
+ },
3035
+ {
3036
+ "epoch": 2.81,
3037
+ "grad_norm": 0.5204710960388184,
3038
+ "learning_rate": 6.274792643346557e-06,
3039
+ "loss": 0.0507,
3040
+ "step": 7800
3041
+ },
3042
+ {
3043
+ "epoch": 2.81,
3044
+ "eval_loss": 0.2555212676525116,
3045
+ "eval_na_accuracy": 0.8127147555351257,
3046
+ "eval_ordinal_accuracy": 0.710506021976471,
3047
+ "eval_ordinal_mae": 0.3297020196914673,
3048
+ "eval_runtime": 185.0982,
3049
+ "eval_samples_per_second": 24.176,
3050
+ "eval_steps_per_second": 3.025,
3051
+ "step": 7800
3052
+ },
3053
+ {
3054
+ "epoch": 2.82,
3055
+ "grad_norm": 0.11941999197006226,
3056
+ "learning_rate": 5.974275754297392e-06,
3057
+ "loss": 0.0392,
3058
+ "step": 7825
3059
+ },
3060
+ {
3061
+ "epoch": 2.83,
3062
+ "grad_norm": 0.20734691619873047,
3063
+ "learning_rate": 5.673758865248227e-06,
3064
+ "loss": 0.06,
3065
+ "step": 7850
3066
+ },
3067
+ {
3068
+ "epoch": 2.84,
3069
+ "grad_norm": 0.48085981607437134,
3070
+ "learning_rate": 5.373241976199063e-06,
3071
+ "loss": 0.0502,
3072
+ "step": 7875
3073
+ },
3074
+ {
3075
+ "epoch": 2.85,
3076
+ "grad_norm": 0.5097972750663757,
3077
+ "learning_rate": 5.072725087149898e-06,
3078
+ "loss": 0.0379,
3079
+ "step": 7900
3080
+ },
3081
+ {
3082
+ "epoch": 2.85,
3083
+ "eval_loss": 0.2531285285949707,
3084
+ "eval_na_accuracy": 0.8127147555351257,
3085
+ "eval_ordinal_accuracy": 0.7161571979522705,
3086
+ "eval_ordinal_mae": 0.3273853659629822,
3087
+ "eval_runtime": 183.7697,
3088
+ "eval_samples_per_second": 24.351,
3089
+ "eval_steps_per_second": 3.047,
3090
+ "step": 7900
3091
+ },
3092
+ {
3093
+ "epoch": 2.86,
3094
+ "grad_norm": 0.1783066987991333,
3095
+ "learning_rate": 4.7722081981007335e-06,
3096
+ "loss": 0.0455,
3097
+ "step": 7925
3098
+ },
3099
+ {
3100
+ "epoch": 2.87,
3101
+ "grad_norm": 0.6303412318229675,
3102
+ "learning_rate": 4.4716913090515695e-06,
3103
+ "loss": 0.0639,
3104
+ "step": 7950
3105
+ },
3106
+ {
3107
+ "epoch": 2.88,
3108
+ "grad_norm": 0.7201049327850342,
3109
+ "learning_rate": 4.171174420002405e-06,
3110
+ "loss": 0.0373,
3111
+ "step": 7975
3112
+ },
3113
+ {
3114
+ "epoch": 2.88,
3115
+ "grad_norm": 0.31524857878685,
3116
+ "learning_rate": 3.87065753095324e-06,
3117
+ "loss": 0.0736,
3118
+ "step": 8000
3119
+ },
3120
+ {
3121
+ "epoch": 2.88,
3122
+ "eval_loss": 0.25256654620170593,
3123
+ "eval_na_accuracy": 0.8195876479148865,
3124
+ "eval_ordinal_accuracy": 0.7164140939712524,
3125
+ "eval_ordinal_mae": 0.3279343843460083,
3126
+ "eval_runtime": 187.2067,
3127
+ "eval_samples_per_second": 23.904,
3128
+ "eval_steps_per_second": 2.991,
3129
+ "step": 8000
3130
+ },
3131
+ {
3132
+ "epoch": 2.89,
3133
+ "grad_norm": 0.1482267528772354,
3134
+ "learning_rate": 3.5701406419040754e-06,
3135
+ "loss": 0.0363,
3136
+ "step": 8025
3137
+ },
3138
+ {
3139
+ "epoch": 2.9,
3140
+ "grad_norm": 0.6438891887664795,
3141
+ "learning_rate": 3.2696237528549105e-06,
3142
+ "loss": 0.043,
3143
+ "step": 8050
3144
+ },
3145
+ {
3146
+ "epoch": 2.91,
3147
+ "grad_norm": 0.5657501816749573,
3148
+ "learning_rate": 2.969106863805746e-06,
3149
+ "loss": 0.0351,
3150
+ "step": 8075
3151
+ },
3152
+ {
3153
+ "epoch": 2.92,
3154
+ "grad_norm": 0.35640329122543335,
3155
+ "learning_rate": 2.6685899747565817e-06,
3156
+ "loss": 0.0589,
3157
+ "step": 8100
3158
+ },
3159
+ {
3160
+ "epoch": 2.92,
3161
+ "eval_loss": 0.2521790862083435,
3162
+ "eval_na_accuracy": 0.8161512017250061,
3163
+ "eval_ordinal_accuracy": 0.7143591046333313,
3164
+ "eval_ordinal_mae": 0.3267035186290741,
3165
+ "eval_runtime": 185.6368,
3166
+ "eval_samples_per_second": 24.106,
3167
+ "eval_steps_per_second": 3.017,
3168
+ "step": 8100
3169
+ },
3170
+ {
3171
+ "epoch": 2.93,
3172
+ "grad_norm": 0.6037762761116028,
3173
+ "learning_rate": 2.368073085707417e-06,
3174
+ "loss": 0.052,
3175
+ "step": 8125
3176
+ },
3177
+ {
3178
+ "epoch": 2.94,
3179
+ "grad_norm": 0.5858399868011475,
3180
+ "learning_rate": 2.067556196658252e-06,
3181
+ "loss": 0.043,
3182
+ "step": 8150
3183
+ },
3184
+ {
3185
+ "epoch": 2.95,
3186
+ "grad_norm": 0.9954311847686768,
3187
+ "learning_rate": 1.7670393076090878e-06,
3188
+ "loss": 0.0708,
3189
+ "step": 8175
3190
+ },
3191
+ {
3192
+ "epoch": 2.96,
3193
+ "grad_norm": 0.448538601398468,
3194
+ "learning_rate": 1.4665224185599232e-06,
3195
+ "loss": 0.0449,
3196
+ "step": 8200
3197
+ },
3198
+ {
3199
+ "epoch": 2.96,
3200
+ "eval_loss": 0.252143532037735,
3201
+ "eval_na_accuracy": 0.8161512017250061,
3202
+ "eval_ordinal_accuracy": 0.7148728370666504,
3203
+ "eval_ordinal_mae": 0.3271646499633789,
3204
+ "eval_runtime": 188.3236,
3205
+ "eval_samples_per_second": 23.762,
3206
+ "eval_steps_per_second": 2.974,
3207
+ "step": 8200
3208
+ },
3209
+ {
3210
+ "epoch": 2.97,
3211
+ "grad_norm": 0.3916318416595459,
3212
+ "learning_rate": 1.1660055295107585e-06,
3213
+ "loss": 0.0519,
3214
+ "step": 8225
3215
+ },
3216
+ {
3217
+ "epoch": 2.98,
3218
+ "grad_norm": 0.7591487765312195,
3219
+ "learning_rate": 8.654886404615941e-07,
3220
+ "loss": 0.0447,
3221
+ "step": 8250
3222
+ },
3223
+ {
3224
+ "epoch": 2.98,
3225
+ "grad_norm": 0.33720019459724426,
3226
+ "learning_rate": 5.649717514124295e-07,
3227
+ "loss": 0.0486,
3228
+ "step": 8275
3229
+ },
3230
+ {
3231
+ "epoch": 2.99,
3232
+ "grad_norm": 0.5342977046966553,
3233
+ "learning_rate": 2.644548623632648e-07,
3234
+ "loss": 0.0498,
3235
+ "step": 8300
3236
+ },
3237
+ {
3238
+ "epoch": 2.99,
3239
+ "eval_loss": 0.2520281970500946,
3240
+ "eval_na_accuracy": 0.8144329786300659,
3241
+ "eval_ordinal_accuracy": 0.7166709303855896,
3242
+ "eval_ordinal_mae": 0.3264659643173218,
3243
+ "eval_runtime": 186.96,
3244
+ "eval_samples_per_second": 23.936,
3245
+ "eval_steps_per_second": 2.995,
3246
+ "step": 8300
3247
+ },
3248
+ {
3249
+ "epoch": 3.0,
3250
+ "step": 8319,
3251
+ "total_flos": 1.0314863567841853e+19,
3252
+ "train_loss": 0.13915026724546314,
3253
+ "train_runtime": 27077.9759,
3254
+ "train_samples_per_second": 4.916,
3255
+ "train_steps_per_second": 0.307
3256
+ }
3257
+ ],
3258
+ "logging_steps": 25,
3259
+ "max_steps": 8319,
3260
+ "num_input_tokens_seen": 0,
3261
+ "num_train_epochs": 3,
3262
+ "save_steps": 100,
3263
+ "total_flos": 1.0314863567841853e+19,
3264
+ "train_batch_size": 16,
3265
+ "trial_name": null,
3266
+ "trial_params": null
3267
+ }