jarod0411 commited on
Commit
e8cb2e6
·
verified ·
1 Parent(s): eae97bf

End of training

Browse files
Files changed (5) hide show
  1. README.md +14 -2
  2. all_results.json +15 -0
  3. eval_results.json +10 -0
  4. train_results.json +8 -0
  5. trainer_state.json +2988 -0
README.md CHANGED
@@ -2,11 +2,23 @@
2
  base_model: jarod0411/zinc10M_gpt2_SMILES_bpe_combined_step1
3
  tags:
4
  - generated_from_trainer
 
 
5
  metrics:
6
  - accuracy
7
  model-index:
8
  - name: stage1
9
- results: []
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -14,7 +26,7 @@ should probably proofread and complete it, then remove this comment. -->
14
 
15
  # stage1
16
 
17
- This model is a fine-tuned version of [jarod0411/zinc10M_gpt2_SMILES_bpe_combined_step1](https://huggingface.co/jarod0411/zinc10M_gpt2_SMILES_bpe_combined_step1) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
  - Loss: 0.3311
20
  - Accuracy: 0.8936
 
2
  base_model: jarod0411/zinc10M_gpt2_SMILES_bpe_combined_step1
3
  tags:
4
  - generated_from_trainer
5
+ datasets:
6
+ - jarod0411/linker_v2
7
  metrics:
8
  - accuracy
9
  model-index:
10
  - name: stage1
11
+ results:
12
+ - task:
13
+ name: Causal Language Modeling
14
+ type: text-generation
15
+ dataset:
16
+ name: jarod0411/linker_v2
17
+ type: jarod0411/linker_v2
18
+ metrics:
19
+ - name: Accuracy
20
+ type: accuracy
21
+ value: 0.8936249984035948
22
  ---
23
 
24
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
26
 
27
  # stage1
28
 
29
+ This model is a fine-tuned version of [jarod0411/zinc10M_gpt2_SMILES_bpe_combined_step1](https://huggingface.co/jarod0411/zinc10M_gpt2_SMILES_bpe_combined_step1) on the jarod0411/linker_v2 dataset.
30
  It achieves the following results on the evaluation set:
31
  - Loss: 0.3311
32
  - Accuracy: 0.8936
all_results.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "eval_accuracy": 0.8936249984035948,
4
+ "eval_loss": 0.3311329483985901,
5
+ "eval_runtime": 1388.3157,
6
+ "eval_samples": 382464,
7
+ "eval_samples_per_second": 275.488,
8
+ "eval_steps_per_second": 1.913,
9
+ "perplexity": 1.3925449166185604,
10
+ "train_loss": 0.35271068150736834,
11
+ "train_runtime": 202889.1723,
12
+ "train_samples": 3445995,
13
+ "train_samples_per_second": 169.846,
14
+ "train_steps_per_second": 1.18
15
+ }
eval_results.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "eval_accuracy": 0.8936249984035948,
4
+ "eval_loss": 0.3311329483985901,
5
+ "eval_runtime": 1388.3157,
6
+ "eval_samples": 382464,
7
+ "eval_samples_per_second": 275.488,
8
+ "eval_steps_per_second": 1.913,
9
+ "perplexity": 1.3925449166185604
10
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "train_loss": 0.35271068150736834,
4
+ "train_runtime": 202889.1723,
5
+ "train_samples": 3445995,
6
+ "train_samples_per_second": 169.846,
7
+ "train_steps_per_second": 1.18
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,2988 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 10.0,
5
+ "eval_steps": 500,
6
+ "global_step": 239310,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.02,
13
+ "learning_rate": 4.989553299068154e-05,
14
+ "loss": 0.5821,
15
+ "step": 500
16
+ },
17
+ {
18
+ "epoch": 0.04,
19
+ "learning_rate": 4.979106598136309e-05,
20
+ "loss": 0.497,
21
+ "step": 1000
22
+ },
23
+ {
24
+ "epoch": 0.06,
25
+ "learning_rate": 4.968659897204463e-05,
26
+ "loss": 0.4743,
27
+ "step": 1500
28
+ },
29
+ {
30
+ "epoch": 0.08,
31
+ "learning_rate": 4.9582131962726175e-05,
32
+ "loss": 0.4581,
33
+ "step": 2000
34
+ },
35
+ {
36
+ "epoch": 0.1,
37
+ "learning_rate": 4.947766495340771e-05,
38
+ "loss": 0.4468,
39
+ "step": 2500
40
+ },
41
+ {
42
+ "epoch": 0.13,
43
+ "learning_rate": 4.9373197944089264e-05,
44
+ "loss": 0.4379,
45
+ "step": 3000
46
+ },
47
+ {
48
+ "epoch": 0.15,
49
+ "learning_rate": 4.92687309347708e-05,
50
+ "loss": 0.4312,
51
+ "step": 3500
52
+ },
53
+ {
54
+ "epoch": 0.17,
55
+ "learning_rate": 4.916426392545234e-05,
56
+ "loss": 0.4258,
57
+ "step": 4000
58
+ },
59
+ {
60
+ "epoch": 0.19,
61
+ "learning_rate": 4.9059796916133885e-05,
62
+ "loss": 0.4214,
63
+ "step": 4500
64
+ },
65
+ {
66
+ "epoch": 0.21,
67
+ "learning_rate": 4.895532990681543e-05,
68
+ "loss": 0.4177,
69
+ "step": 5000
70
+ },
71
+ {
72
+ "epoch": 0.23,
73
+ "learning_rate": 4.8850862897496974e-05,
74
+ "loss": 0.414,
75
+ "step": 5500
76
+ },
77
+ {
78
+ "epoch": 0.25,
79
+ "learning_rate": 4.874639588817851e-05,
80
+ "loss": 0.4112,
81
+ "step": 6000
82
+ },
83
+ {
84
+ "epoch": 0.27,
85
+ "learning_rate": 4.864192887886006e-05,
86
+ "loss": 0.4084,
87
+ "step": 6500
88
+ },
89
+ {
90
+ "epoch": 0.29,
91
+ "learning_rate": 4.85374618695416e-05,
92
+ "loss": 0.4065,
93
+ "step": 7000
94
+ },
95
+ {
96
+ "epoch": 0.31,
97
+ "learning_rate": 4.8432994860223146e-05,
98
+ "loss": 0.4041,
99
+ "step": 7500
100
+ },
101
+ {
102
+ "epoch": 0.33,
103
+ "learning_rate": 4.8328527850904684e-05,
104
+ "loss": 0.4022,
105
+ "step": 8000
106
+ },
107
+ {
108
+ "epoch": 0.36,
109
+ "learning_rate": 4.822406084158623e-05,
110
+ "loss": 0.4003,
111
+ "step": 8500
112
+ },
113
+ {
114
+ "epoch": 0.38,
115
+ "learning_rate": 4.8119593832267774e-05,
116
+ "loss": 0.399,
117
+ "step": 9000
118
+ },
119
+ {
120
+ "epoch": 0.4,
121
+ "learning_rate": 4.801512682294931e-05,
122
+ "loss": 0.3971,
123
+ "step": 9500
124
+ },
125
+ {
126
+ "epoch": 0.42,
127
+ "learning_rate": 4.791065981363086e-05,
128
+ "loss": 0.3958,
129
+ "step": 10000
130
+ },
131
+ {
132
+ "epoch": 0.44,
133
+ "learning_rate": 4.78061928043124e-05,
134
+ "loss": 0.3945,
135
+ "step": 10500
136
+ },
137
+ {
138
+ "epoch": 0.46,
139
+ "learning_rate": 4.7701725794993946e-05,
140
+ "loss": 0.3935,
141
+ "step": 11000
142
+ },
143
+ {
144
+ "epoch": 0.48,
145
+ "learning_rate": 4.7597258785675484e-05,
146
+ "loss": 0.3924,
147
+ "step": 11500
148
+ },
149
+ {
150
+ "epoch": 0.5,
151
+ "learning_rate": 4.749279177635703e-05,
152
+ "loss": 0.3909,
153
+ "step": 12000
154
+ },
155
+ {
156
+ "epoch": 0.52,
157
+ "learning_rate": 4.7388324767038574e-05,
158
+ "loss": 0.3898,
159
+ "step": 12500
160
+ },
161
+ {
162
+ "epoch": 0.54,
163
+ "learning_rate": 4.728385775772011e-05,
164
+ "loss": 0.3891,
165
+ "step": 13000
166
+ },
167
+ {
168
+ "epoch": 0.56,
169
+ "learning_rate": 4.7179390748401656e-05,
170
+ "loss": 0.388,
171
+ "step": 13500
172
+ },
173
+ {
174
+ "epoch": 0.59,
175
+ "learning_rate": 4.70749237390832e-05,
176
+ "loss": 0.387,
177
+ "step": 14000
178
+ },
179
+ {
180
+ "epoch": 0.61,
181
+ "learning_rate": 4.6970456729764746e-05,
182
+ "loss": 0.3867,
183
+ "step": 14500
184
+ },
185
+ {
186
+ "epoch": 0.63,
187
+ "learning_rate": 4.6865989720446284e-05,
188
+ "loss": 0.3856,
189
+ "step": 15000
190
+ },
191
+ {
192
+ "epoch": 0.65,
193
+ "learning_rate": 4.676152271112783e-05,
194
+ "loss": 0.3846,
195
+ "step": 15500
196
+ },
197
+ {
198
+ "epoch": 0.67,
199
+ "learning_rate": 4.665705570180937e-05,
200
+ "loss": 0.3838,
201
+ "step": 16000
202
+ },
203
+ {
204
+ "epoch": 0.69,
205
+ "learning_rate": 4.655258869249091e-05,
206
+ "loss": 0.3833,
207
+ "step": 16500
208
+ },
209
+ {
210
+ "epoch": 0.71,
211
+ "learning_rate": 4.6448121683172456e-05,
212
+ "loss": 0.3823,
213
+ "step": 17000
214
+ },
215
+ {
216
+ "epoch": 0.73,
217
+ "learning_rate": 4.6343654673854e-05,
218
+ "loss": 0.3818,
219
+ "step": 17500
220
+ },
221
+ {
222
+ "epoch": 0.75,
223
+ "learning_rate": 4.6239187664535545e-05,
224
+ "loss": 0.3809,
225
+ "step": 18000
226
+ },
227
+ {
228
+ "epoch": 0.77,
229
+ "learning_rate": 4.613472065521708e-05,
230
+ "loss": 0.3806,
231
+ "step": 18500
232
+ },
233
+ {
234
+ "epoch": 0.79,
235
+ "learning_rate": 4.603025364589863e-05,
236
+ "loss": 0.3795,
237
+ "step": 19000
238
+ },
239
+ {
240
+ "epoch": 0.81,
241
+ "learning_rate": 4.592578663658017e-05,
242
+ "loss": 0.3796,
243
+ "step": 19500
244
+ },
245
+ {
246
+ "epoch": 0.84,
247
+ "learning_rate": 4.582131962726171e-05,
248
+ "loss": 0.3787,
249
+ "step": 20000
250
+ },
251
+ {
252
+ "epoch": 0.86,
253
+ "learning_rate": 4.5716852617943256e-05,
254
+ "loss": 0.3782,
255
+ "step": 20500
256
+ },
257
+ {
258
+ "epoch": 0.88,
259
+ "learning_rate": 4.5612385608624794e-05,
260
+ "loss": 0.3777,
261
+ "step": 21000
262
+ },
263
+ {
264
+ "epoch": 0.9,
265
+ "learning_rate": 4.5507918599306345e-05,
266
+ "loss": 0.3772,
267
+ "step": 21500
268
+ },
269
+ {
270
+ "epoch": 0.92,
271
+ "learning_rate": 4.540345158998788e-05,
272
+ "loss": 0.3766,
273
+ "step": 22000
274
+ },
275
+ {
276
+ "epoch": 0.94,
277
+ "learning_rate": 4.529898458066943e-05,
278
+ "loss": 0.3761,
279
+ "step": 22500
280
+ },
281
+ {
282
+ "epoch": 0.96,
283
+ "learning_rate": 4.519451757135097e-05,
284
+ "loss": 0.3755,
285
+ "step": 23000
286
+ },
287
+ {
288
+ "epoch": 0.98,
289
+ "learning_rate": 4.509005056203251e-05,
290
+ "loss": 0.375,
291
+ "step": 23500
292
+ },
293
+ {
294
+ "epoch": 1.0,
295
+ "eval_accuracy": 0.8853351016181491,
296
+ "eval_loss": 0.36145269870758057,
297
+ "eval_runtime": 1392.8737,
298
+ "eval_samples_per_second": 274.586,
299
+ "eval_steps_per_second": 1.907,
300
+ "step": 23931
301
+ },
302
+ {
303
+ "epoch": 1.0,
304
+ "learning_rate": 4.4985583552714055e-05,
305
+ "loss": 0.3746,
306
+ "step": 24000
307
+ },
308
+ {
309
+ "epoch": 1.02,
310
+ "learning_rate": 4.488111654339559e-05,
311
+ "loss": 0.3734,
312
+ "step": 24500
313
+ },
314
+ {
315
+ "epoch": 1.04,
316
+ "learning_rate": 4.4776649534077145e-05,
317
+ "loss": 0.3734,
318
+ "step": 25000
319
+ },
320
+ {
321
+ "epoch": 1.07,
322
+ "learning_rate": 4.467218252475868e-05,
323
+ "loss": 0.3732,
324
+ "step": 25500
325
+ },
326
+ {
327
+ "epoch": 1.09,
328
+ "learning_rate": 4.456771551544023e-05,
329
+ "loss": 0.3723,
330
+ "step": 26000
331
+ },
332
+ {
333
+ "epoch": 1.11,
334
+ "learning_rate": 4.4463248506121765e-05,
335
+ "loss": 0.372,
336
+ "step": 26500
337
+ },
338
+ {
339
+ "epoch": 1.13,
340
+ "learning_rate": 4.435878149680332e-05,
341
+ "loss": 0.3717,
342
+ "step": 27000
343
+ },
344
+ {
345
+ "epoch": 1.15,
346
+ "learning_rate": 4.4254314487484855e-05,
347
+ "loss": 0.3716,
348
+ "step": 27500
349
+ },
350
+ {
351
+ "epoch": 1.17,
352
+ "learning_rate": 4.414984747816639e-05,
353
+ "loss": 0.3709,
354
+ "step": 28000
355
+ },
356
+ {
357
+ "epoch": 1.19,
358
+ "learning_rate": 4.4045380468847944e-05,
359
+ "loss": 0.3707,
360
+ "step": 28500
361
+ },
362
+ {
363
+ "epoch": 1.21,
364
+ "learning_rate": 4.394091345952948e-05,
365
+ "loss": 0.3702,
366
+ "step": 29000
367
+ },
368
+ {
369
+ "epoch": 1.23,
370
+ "learning_rate": 4.383644645021103e-05,
371
+ "loss": 0.37,
372
+ "step": 29500
373
+ },
374
+ {
375
+ "epoch": 1.25,
376
+ "learning_rate": 4.3731979440892565e-05,
377
+ "loss": 0.37,
378
+ "step": 30000
379
+ },
380
+ {
381
+ "epoch": 1.27,
382
+ "learning_rate": 4.3627512431574117e-05,
383
+ "loss": 0.3693,
384
+ "step": 30500
385
+ },
386
+ {
387
+ "epoch": 1.3,
388
+ "learning_rate": 4.3523045422255655e-05,
389
+ "loss": 0.3691,
390
+ "step": 31000
391
+ },
392
+ {
393
+ "epoch": 1.32,
394
+ "learning_rate": 4.341857841293719e-05,
395
+ "loss": 0.3685,
396
+ "step": 31500
397
+ },
398
+ {
399
+ "epoch": 1.34,
400
+ "learning_rate": 4.331411140361874e-05,
401
+ "loss": 0.368,
402
+ "step": 32000
403
+ },
404
+ {
405
+ "epoch": 1.36,
406
+ "learning_rate": 4.320964439430028e-05,
407
+ "loss": 0.368,
408
+ "step": 32500
409
+ },
410
+ {
411
+ "epoch": 1.38,
412
+ "learning_rate": 4.310517738498183e-05,
413
+ "loss": 0.3678,
414
+ "step": 33000
415
+ },
416
+ {
417
+ "epoch": 1.4,
418
+ "learning_rate": 4.3000710375663365e-05,
419
+ "loss": 0.3675,
420
+ "step": 33500
421
+ },
422
+ {
423
+ "epoch": 1.42,
424
+ "learning_rate": 4.289624336634491e-05,
425
+ "loss": 0.3672,
426
+ "step": 34000
427
+ },
428
+ {
429
+ "epoch": 1.44,
430
+ "learning_rate": 4.2791776357026454e-05,
431
+ "loss": 0.3667,
432
+ "step": 34500
433
+ },
434
+ {
435
+ "epoch": 1.46,
436
+ "learning_rate": 4.268730934770799e-05,
437
+ "loss": 0.3667,
438
+ "step": 35000
439
+ },
440
+ {
441
+ "epoch": 1.48,
442
+ "learning_rate": 4.258284233838954e-05,
443
+ "loss": 0.3667,
444
+ "step": 35500
445
+ },
446
+ {
447
+ "epoch": 1.5,
448
+ "learning_rate": 4.247837532907108e-05,
449
+ "loss": 0.3663,
450
+ "step": 36000
451
+ },
452
+ {
453
+ "epoch": 1.53,
454
+ "learning_rate": 4.2373908319752626e-05,
455
+ "loss": 0.3661,
456
+ "step": 36500
457
+ },
458
+ {
459
+ "epoch": 1.55,
460
+ "learning_rate": 4.2269441310434164e-05,
461
+ "loss": 0.3655,
462
+ "step": 37000
463
+ },
464
+ {
465
+ "epoch": 1.57,
466
+ "learning_rate": 4.216497430111571e-05,
467
+ "loss": 0.3655,
468
+ "step": 37500
469
+ },
470
+ {
471
+ "epoch": 1.59,
472
+ "learning_rate": 4.2060507291797254e-05,
473
+ "loss": 0.3654,
474
+ "step": 38000
475
+ },
476
+ {
477
+ "epoch": 1.61,
478
+ "learning_rate": 4.19560402824788e-05,
479
+ "loss": 0.3645,
480
+ "step": 38500
481
+ },
482
+ {
483
+ "epoch": 1.63,
484
+ "learning_rate": 4.1851573273160337e-05,
485
+ "loss": 0.3643,
486
+ "step": 39000
487
+ },
488
+ {
489
+ "epoch": 1.65,
490
+ "learning_rate": 4.174710626384188e-05,
491
+ "loss": 0.3642,
492
+ "step": 39500
493
+ },
494
+ {
495
+ "epoch": 1.67,
496
+ "learning_rate": 4.1642639254523426e-05,
497
+ "loss": 0.3642,
498
+ "step": 40000
499
+ },
500
+ {
501
+ "epoch": 1.69,
502
+ "learning_rate": 4.1538172245204964e-05,
503
+ "loss": 0.364,
504
+ "step": 40500
505
+ },
506
+ {
507
+ "epoch": 1.71,
508
+ "learning_rate": 4.143370523588651e-05,
509
+ "loss": 0.3636,
510
+ "step": 41000
511
+ },
512
+ {
513
+ "epoch": 1.73,
514
+ "learning_rate": 4.1329238226568054e-05,
515
+ "loss": 0.363,
516
+ "step": 41500
517
+ },
518
+ {
519
+ "epoch": 1.76,
520
+ "learning_rate": 4.12247712172496e-05,
521
+ "loss": 0.3632,
522
+ "step": 42000
523
+ },
524
+ {
525
+ "epoch": 1.78,
526
+ "learning_rate": 4.1120304207931136e-05,
527
+ "loss": 0.363,
528
+ "step": 42500
529
+ },
530
+ {
531
+ "epoch": 1.8,
532
+ "learning_rate": 4.101583719861268e-05,
533
+ "loss": 0.3626,
534
+ "step": 43000
535
+ },
536
+ {
537
+ "epoch": 1.82,
538
+ "learning_rate": 4.0911370189294226e-05,
539
+ "loss": 0.3626,
540
+ "step": 43500
541
+ },
542
+ {
543
+ "epoch": 1.84,
544
+ "learning_rate": 4.0806903179975764e-05,
545
+ "loss": 0.3623,
546
+ "step": 44000
547
+ },
548
+ {
549
+ "epoch": 1.86,
550
+ "learning_rate": 4.070243617065731e-05,
551
+ "loss": 0.3621,
552
+ "step": 44500
553
+ },
554
+ {
555
+ "epoch": 1.88,
556
+ "learning_rate": 4.059796916133885e-05,
557
+ "loss": 0.3617,
558
+ "step": 45000
559
+ },
560
+ {
561
+ "epoch": 1.9,
562
+ "learning_rate": 4.04935021520204e-05,
563
+ "loss": 0.3619,
564
+ "step": 45500
565
+ },
566
+ {
567
+ "epoch": 1.92,
568
+ "learning_rate": 4.0389035142701936e-05,
569
+ "loss": 0.3615,
570
+ "step": 46000
571
+ },
572
+ {
573
+ "epoch": 1.94,
574
+ "learning_rate": 4.0284568133383474e-05,
575
+ "loss": 0.361,
576
+ "step": 46500
577
+ },
578
+ {
579
+ "epoch": 1.96,
580
+ "learning_rate": 4.0180101124065025e-05,
581
+ "loss": 0.3611,
582
+ "step": 47000
583
+ },
584
+ {
585
+ "epoch": 1.98,
586
+ "learning_rate": 4.007563411474656e-05,
587
+ "loss": 0.3609,
588
+ "step": 47500
589
+ },
590
+ {
591
+ "epoch": 2.0,
592
+ "eval_accuracy": 0.8886839759560093,
593
+ "eval_loss": 0.34936127066612244,
594
+ "eval_runtime": 1403.4535,
595
+ "eval_samples_per_second": 272.516,
596
+ "eval_steps_per_second": 1.892,
597
+ "step": 47862
598
+ },
599
+ {
600
+ "epoch": 2.01,
601
+ "learning_rate": 3.997116710542811e-05,
602
+ "loss": 0.3604,
603
+ "step": 48000
604
+ },
605
+ {
606
+ "epoch": 2.03,
607
+ "learning_rate": 3.9866700096109646e-05,
608
+ "loss": 0.3599,
609
+ "step": 48500
610
+ },
611
+ {
612
+ "epoch": 2.05,
613
+ "learning_rate": 3.97622330867912e-05,
614
+ "loss": 0.36,
615
+ "step": 49000
616
+ },
617
+ {
618
+ "epoch": 2.07,
619
+ "learning_rate": 3.9657766077472736e-05,
620
+ "loss": 0.3593,
621
+ "step": 49500
622
+ },
623
+ {
624
+ "epoch": 2.09,
625
+ "learning_rate": 3.955329906815428e-05,
626
+ "loss": 0.3594,
627
+ "step": 50000
628
+ },
629
+ {
630
+ "epoch": 2.11,
631
+ "learning_rate": 3.9448832058835825e-05,
632
+ "loss": 0.3591,
633
+ "step": 50500
634
+ },
635
+ {
636
+ "epoch": 2.13,
637
+ "learning_rate": 3.934436504951736e-05,
638
+ "loss": 0.3591,
639
+ "step": 51000
640
+ },
641
+ {
642
+ "epoch": 2.15,
643
+ "learning_rate": 3.923989804019891e-05,
644
+ "loss": 0.3587,
645
+ "step": 51500
646
+ },
647
+ {
648
+ "epoch": 2.17,
649
+ "learning_rate": 3.9135431030880446e-05,
650
+ "loss": 0.3588,
651
+ "step": 52000
652
+ },
653
+ {
654
+ "epoch": 2.19,
655
+ "learning_rate": 3.9030964021562e-05,
656
+ "loss": 0.3584,
657
+ "step": 52500
658
+ },
659
+ {
660
+ "epoch": 2.21,
661
+ "learning_rate": 3.8926497012243535e-05,
662
+ "loss": 0.3586,
663
+ "step": 53000
664
+ },
665
+ {
666
+ "epoch": 2.24,
667
+ "learning_rate": 3.882203000292508e-05,
668
+ "loss": 0.3582,
669
+ "step": 53500
670
+ },
671
+ {
672
+ "epoch": 2.26,
673
+ "learning_rate": 3.871756299360662e-05,
674
+ "loss": 0.3581,
675
+ "step": 54000
676
+ },
677
+ {
678
+ "epoch": 2.28,
679
+ "learning_rate": 3.861309598428816e-05,
680
+ "loss": 0.3582,
681
+ "step": 54500
682
+ },
683
+ {
684
+ "epoch": 2.3,
685
+ "learning_rate": 3.850862897496971e-05,
686
+ "loss": 0.3579,
687
+ "step": 55000
688
+ },
689
+ {
690
+ "epoch": 2.32,
691
+ "learning_rate": 3.8404161965651245e-05,
692
+ "loss": 0.3577,
693
+ "step": 55500
694
+ },
695
+ {
696
+ "epoch": 2.34,
697
+ "learning_rate": 3.829969495633279e-05,
698
+ "loss": 0.3575,
699
+ "step": 56000
700
+ },
701
+ {
702
+ "epoch": 2.36,
703
+ "learning_rate": 3.8195227947014335e-05,
704
+ "loss": 0.357,
705
+ "step": 56500
706
+ },
707
+ {
708
+ "epoch": 2.38,
709
+ "learning_rate": 3.809076093769588e-05,
710
+ "loss": 0.357,
711
+ "step": 57000
712
+ },
713
+ {
714
+ "epoch": 2.4,
715
+ "learning_rate": 3.798629392837742e-05,
716
+ "loss": 0.357,
717
+ "step": 57500
718
+ },
719
+ {
720
+ "epoch": 2.42,
721
+ "learning_rate": 3.788182691905896e-05,
722
+ "loss": 0.3571,
723
+ "step": 58000
724
+ },
725
+ {
726
+ "epoch": 2.44,
727
+ "learning_rate": 3.777735990974051e-05,
728
+ "loss": 0.3568,
729
+ "step": 58500
730
+ },
731
+ {
732
+ "epoch": 2.47,
733
+ "learning_rate": 3.7672892900422045e-05,
734
+ "loss": 0.3565,
735
+ "step": 59000
736
+ },
737
+ {
738
+ "epoch": 2.49,
739
+ "learning_rate": 3.756842589110359e-05,
740
+ "loss": 0.3564,
741
+ "step": 59500
742
+ },
743
+ {
744
+ "epoch": 2.51,
745
+ "learning_rate": 3.7463958881785134e-05,
746
+ "loss": 0.3567,
747
+ "step": 60000
748
+ },
749
+ {
750
+ "epoch": 2.53,
751
+ "learning_rate": 3.735949187246668e-05,
752
+ "loss": 0.3561,
753
+ "step": 60500
754
+ },
755
+ {
756
+ "epoch": 2.55,
757
+ "learning_rate": 3.725502486314822e-05,
758
+ "loss": 0.3562,
759
+ "step": 61000
760
+ },
761
+ {
762
+ "epoch": 2.57,
763
+ "learning_rate": 3.715055785382976e-05,
764
+ "loss": 0.3558,
765
+ "step": 61500
766
+ },
767
+ {
768
+ "epoch": 2.59,
769
+ "learning_rate": 3.704609084451131e-05,
770
+ "loss": 0.3559,
771
+ "step": 62000
772
+ },
773
+ {
774
+ "epoch": 2.61,
775
+ "learning_rate": 3.6941623835192845e-05,
776
+ "loss": 0.3559,
777
+ "step": 62500
778
+ },
779
+ {
780
+ "epoch": 2.63,
781
+ "learning_rate": 3.683715682587439e-05,
782
+ "loss": 0.3558,
783
+ "step": 63000
784
+ },
785
+ {
786
+ "epoch": 2.65,
787
+ "learning_rate": 3.6732689816555934e-05,
788
+ "loss": 0.3553,
789
+ "step": 63500
790
+ },
791
+ {
792
+ "epoch": 2.67,
793
+ "learning_rate": 3.662822280723748e-05,
794
+ "loss": 0.3554,
795
+ "step": 64000
796
+ },
797
+ {
798
+ "epoch": 2.7,
799
+ "learning_rate": 3.652375579791902e-05,
800
+ "loss": 0.3553,
801
+ "step": 64500
802
+ },
803
+ {
804
+ "epoch": 2.72,
805
+ "learning_rate": 3.641928878860056e-05,
806
+ "loss": 0.3547,
807
+ "step": 65000
808
+ },
809
+ {
810
+ "epoch": 2.74,
811
+ "learning_rate": 3.6314821779282106e-05,
812
+ "loss": 0.3547,
813
+ "step": 65500
814
+ },
815
+ {
816
+ "epoch": 2.76,
817
+ "learning_rate": 3.6210354769963644e-05,
818
+ "loss": 0.3551,
819
+ "step": 66000
820
+ },
821
+ {
822
+ "epoch": 2.78,
823
+ "learning_rate": 3.610588776064519e-05,
824
+ "loss": 0.3547,
825
+ "step": 66500
826
+ },
827
+ {
828
+ "epoch": 2.8,
829
+ "learning_rate": 3.6001420751326734e-05,
830
+ "loss": 0.3546,
831
+ "step": 67000
832
+ },
833
+ {
834
+ "epoch": 2.82,
835
+ "learning_rate": 3.589695374200828e-05,
836
+ "loss": 0.3545,
837
+ "step": 67500
838
+ },
839
+ {
840
+ "epoch": 2.84,
841
+ "learning_rate": 3.5792486732689817e-05,
842
+ "loss": 0.3544,
843
+ "step": 68000
844
+ },
845
+ {
846
+ "epoch": 2.86,
847
+ "learning_rate": 3.568801972337136e-05,
848
+ "loss": 0.3544,
849
+ "step": 68500
850
+ },
851
+ {
852
+ "epoch": 2.88,
853
+ "learning_rate": 3.5583552714052906e-05,
854
+ "loss": 0.354,
855
+ "step": 69000
856
+ },
857
+ {
858
+ "epoch": 2.9,
859
+ "learning_rate": 3.5479085704734444e-05,
860
+ "loss": 0.354,
861
+ "step": 69500
862
+ },
863
+ {
864
+ "epoch": 2.93,
865
+ "learning_rate": 3.537461869541599e-05,
866
+ "loss": 0.3539,
867
+ "step": 70000
868
+ },
869
+ {
870
+ "epoch": 2.95,
871
+ "learning_rate": 3.527015168609753e-05,
872
+ "loss": 0.3539,
873
+ "step": 70500
874
+ },
875
+ {
876
+ "epoch": 2.97,
877
+ "learning_rate": 3.516568467677908e-05,
878
+ "loss": 0.3536,
879
+ "step": 71000
880
+ },
881
+ {
882
+ "epoch": 2.99,
883
+ "learning_rate": 3.5061217667460616e-05,
884
+ "loss": 0.3533,
885
+ "step": 71500
886
+ },
887
+ {
888
+ "epoch": 3.0,
889
+ "eval_accuracy": 0.8903601451222372,
890
+ "eval_loss": 0.3432493805885315,
891
+ "eval_runtime": 1415.4457,
892
+ "eval_samples_per_second": 270.207,
893
+ "eval_steps_per_second": 1.876,
894
+ "step": 71793
895
+ },
896
+ {
897
+ "epoch": 3.01,
898
+ "learning_rate": 3.495675065814216e-05,
899
+ "loss": 0.3533,
900
+ "step": 72000
901
+ },
902
+ {
903
+ "epoch": 3.03,
904
+ "learning_rate": 3.4852283648823706e-05,
905
+ "loss": 0.3527,
906
+ "step": 72500
907
+ },
908
+ {
909
+ "epoch": 3.05,
910
+ "learning_rate": 3.474781663950525e-05,
911
+ "loss": 0.3527,
912
+ "step": 73000
913
+ },
914
+ {
915
+ "epoch": 3.07,
916
+ "learning_rate": 3.464334963018679e-05,
917
+ "loss": 0.3526,
918
+ "step": 73500
919
+ },
920
+ {
921
+ "epoch": 3.09,
922
+ "learning_rate": 3.4538882620868326e-05,
923
+ "loss": 0.3524,
924
+ "step": 74000
925
+ },
926
+ {
927
+ "epoch": 3.11,
928
+ "learning_rate": 3.443441561154988e-05,
929
+ "loss": 0.3526,
930
+ "step": 74500
931
+ },
932
+ {
933
+ "epoch": 3.13,
934
+ "learning_rate": 3.4329948602231416e-05,
935
+ "loss": 0.3524,
936
+ "step": 75000
937
+ },
938
+ {
939
+ "epoch": 3.15,
940
+ "learning_rate": 3.422548159291296e-05,
941
+ "loss": 0.3522,
942
+ "step": 75500
943
+ },
944
+ {
945
+ "epoch": 3.18,
946
+ "learning_rate": 3.41210145835945e-05,
947
+ "loss": 0.3521,
948
+ "step": 76000
949
+ },
950
+ {
951
+ "epoch": 3.2,
952
+ "learning_rate": 3.401654757427605e-05,
953
+ "loss": 0.3519,
954
+ "step": 76500
955
+ },
956
+ {
957
+ "epoch": 3.22,
958
+ "learning_rate": 3.391208056495759e-05,
959
+ "loss": 0.3518,
960
+ "step": 77000
961
+ },
962
+ {
963
+ "epoch": 3.24,
964
+ "learning_rate": 3.3807613555639126e-05,
965
+ "loss": 0.3519,
966
+ "step": 77500
967
+ },
968
+ {
969
+ "epoch": 3.26,
970
+ "learning_rate": 3.370314654632068e-05,
971
+ "loss": 0.3518,
972
+ "step": 78000
973
+ },
974
+ {
975
+ "epoch": 3.28,
976
+ "learning_rate": 3.3598679537002215e-05,
977
+ "loss": 0.3516,
978
+ "step": 78500
979
+ },
980
+ {
981
+ "epoch": 3.3,
982
+ "learning_rate": 3.349421252768376e-05,
983
+ "loss": 0.3513,
984
+ "step": 79000
985
+ },
986
+ {
987
+ "epoch": 3.32,
988
+ "learning_rate": 3.33897455183653e-05,
989
+ "loss": 0.3514,
990
+ "step": 79500
991
+ },
992
+ {
993
+ "epoch": 3.34,
994
+ "learning_rate": 3.328527850904685e-05,
995
+ "loss": 0.3512,
996
+ "step": 80000
997
+ },
998
+ {
999
+ "epoch": 3.36,
1000
+ "learning_rate": 3.318081149972839e-05,
1001
+ "loss": 0.3512,
1002
+ "step": 80500
1003
+ },
1004
+ {
1005
+ "epoch": 3.38,
1006
+ "learning_rate": 3.307634449040993e-05,
1007
+ "loss": 0.3512,
1008
+ "step": 81000
1009
+ },
1010
+ {
1011
+ "epoch": 3.41,
1012
+ "learning_rate": 3.297187748109147e-05,
1013
+ "loss": 0.3515,
1014
+ "step": 81500
1015
+ },
1016
+ {
1017
+ "epoch": 3.43,
1018
+ "learning_rate": 3.2867410471773015e-05,
1019
+ "loss": 0.351,
1020
+ "step": 82000
1021
+ },
1022
+ {
1023
+ "epoch": 3.45,
1024
+ "learning_rate": 3.276294346245456e-05,
1025
+ "loss": 0.3512,
1026
+ "step": 82500
1027
+ },
1028
+ {
1029
+ "epoch": 3.47,
1030
+ "learning_rate": 3.26584764531361e-05,
1031
+ "loss": 0.3506,
1032
+ "step": 83000
1033
+ },
1034
+ {
1035
+ "epoch": 3.49,
1036
+ "learning_rate": 3.255400944381764e-05,
1037
+ "loss": 0.3508,
1038
+ "step": 83500
1039
+ },
1040
+ {
1041
+ "epoch": 3.51,
1042
+ "learning_rate": 3.244954243449919e-05,
1043
+ "loss": 0.3509,
1044
+ "step": 84000
1045
+ },
1046
+ {
1047
+ "epoch": 3.53,
1048
+ "learning_rate": 3.234507542518073e-05,
1049
+ "loss": 0.3505,
1050
+ "step": 84500
1051
+ },
1052
+ {
1053
+ "epoch": 3.55,
1054
+ "learning_rate": 3.224060841586227e-05,
1055
+ "loss": 0.3505,
1056
+ "step": 85000
1057
+ },
1058
+ {
1059
+ "epoch": 3.57,
1060
+ "learning_rate": 3.2136141406543815e-05,
1061
+ "loss": 0.3502,
1062
+ "step": 85500
1063
+ },
1064
+ {
1065
+ "epoch": 3.59,
1066
+ "learning_rate": 3.203167439722536e-05,
1067
+ "loss": 0.3502,
1068
+ "step": 86000
1069
+ },
1070
+ {
1071
+ "epoch": 3.61,
1072
+ "learning_rate": 3.19272073879069e-05,
1073
+ "loss": 0.3499,
1074
+ "step": 86500
1075
+ },
1076
+ {
1077
+ "epoch": 3.64,
1078
+ "learning_rate": 3.182274037858844e-05,
1079
+ "loss": 0.3502,
1080
+ "step": 87000
1081
+ },
1082
+ {
1083
+ "epoch": 3.66,
1084
+ "learning_rate": 3.171827336926999e-05,
1085
+ "loss": 0.3499,
1086
+ "step": 87500
1087
+ },
1088
+ {
1089
+ "epoch": 3.68,
1090
+ "learning_rate": 3.161380635995153e-05,
1091
+ "loss": 0.3496,
1092
+ "step": 88000
1093
+ },
1094
+ {
1095
+ "epoch": 3.7,
1096
+ "learning_rate": 3.150933935063307e-05,
1097
+ "loss": 0.3499,
1098
+ "step": 88500
1099
+ },
1100
+ {
1101
+ "epoch": 3.72,
1102
+ "learning_rate": 3.1404872341314614e-05,
1103
+ "loss": 0.3499,
1104
+ "step": 89000
1105
+ },
1106
+ {
1107
+ "epoch": 3.74,
1108
+ "learning_rate": 3.130040533199616e-05,
1109
+ "loss": 0.3497,
1110
+ "step": 89500
1111
+ },
1112
+ {
1113
+ "epoch": 3.76,
1114
+ "learning_rate": 3.11959383226777e-05,
1115
+ "loss": 0.3496,
1116
+ "step": 90000
1117
+ },
1118
+ {
1119
+ "epoch": 3.78,
1120
+ "learning_rate": 3.109147131335924e-05,
1121
+ "loss": 0.3494,
1122
+ "step": 90500
1123
+ },
1124
+ {
1125
+ "epoch": 3.8,
1126
+ "learning_rate": 3.098700430404079e-05,
1127
+ "loss": 0.3496,
1128
+ "step": 91000
1129
+ },
1130
+ {
1131
+ "epoch": 3.82,
1132
+ "learning_rate": 3.088253729472233e-05,
1133
+ "loss": 0.3494,
1134
+ "step": 91500
1135
+ },
1136
+ {
1137
+ "epoch": 3.84,
1138
+ "learning_rate": 3.077807028540387e-05,
1139
+ "loss": 0.3492,
1140
+ "step": 92000
1141
+ },
1142
+ {
1143
+ "epoch": 3.87,
1144
+ "learning_rate": 3.0673603276085414e-05,
1145
+ "loss": 0.3494,
1146
+ "step": 92500
1147
+ },
1148
+ {
1149
+ "epoch": 3.89,
1150
+ "learning_rate": 3.056913626676696e-05,
1151
+ "loss": 0.3494,
1152
+ "step": 93000
1153
+ },
1154
+ {
1155
+ "epoch": 3.91,
1156
+ "learning_rate": 3.0464669257448497e-05,
1157
+ "loss": 0.349,
1158
+ "step": 93500
1159
+ },
1160
+ {
1161
+ "epoch": 3.93,
1162
+ "learning_rate": 3.036020224813004e-05,
1163
+ "loss": 0.3491,
1164
+ "step": 94000
1165
+ },
1166
+ {
1167
+ "epoch": 3.95,
1168
+ "learning_rate": 3.0255735238811583e-05,
1169
+ "loss": 0.3486,
1170
+ "step": 94500
1171
+ },
1172
+ {
1173
+ "epoch": 3.97,
1174
+ "learning_rate": 3.0151268229493128e-05,
1175
+ "loss": 0.3489,
1176
+ "step": 95000
1177
+ },
1178
+ {
1179
+ "epoch": 3.99,
1180
+ "learning_rate": 3.004680122017467e-05,
1181
+ "loss": 0.3486,
1182
+ "step": 95500
1183
+ },
1184
+ {
1185
+ "epoch": 4.0,
1186
+ "eval_accuracy": 0.8913953473712201,
1187
+ "eval_loss": 0.3394201993942261,
1188
+ "eval_runtime": 1409.3785,
1189
+ "eval_samples_per_second": 271.371,
1190
+ "eval_steps_per_second": 1.885,
1191
+ "step": 95724
1192
+ },
1193
+ {
1194
+ "epoch": 4.01,
1195
+ "learning_rate": 2.9942334210856217e-05,
1196
+ "loss": 0.3483,
1197
+ "step": 96000
1198
+ },
1199
+ {
1200
+ "epoch": 4.03,
1201
+ "learning_rate": 2.9837867201537755e-05,
1202
+ "loss": 0.3479,
1203
+ "step": 96500
1204
+ },
1205
+ {
1206
+ "epoch": 4.05,
1207
+ "learning_rate": 2.9733400192219296e-05,
1208
+ "loss": 0.348,
1209
+ "step": 97000
1210
+ },
1211
+ {
1212
+ "epoch": 4.07,
1213
+ "learning_rate": 2.962893318290084e-05,
1214
+ "loss": 0.3479,
1215
+ "step": 97500
1216
+ },
1217
+ {
1218
+ "epoch": 4.1,
1219
+ "learning_rate": 2.9524466173582383e-05,
1220
+ "loss": 0.348,
1221
+ "step": 98000
1222
+ },
1223
+ {
1224
+ "epoch": 4.12,
1225
+ "learning_rate": 2.9419999164263927e-05,
1226
+ "loss": 0.3478,
1227
+ "step": 98500
1228
+ },
1229
+ {
1230
+ "epoch": 4.14,
1231
+ "learning_rate": 2.931553215494547e-05,
1232
+ "loss": 0.3477,
1233
+ "step": 99000
1234
+ },
1235
+ {
1236
+ "epoch": 4.16,
1237
+ "learning_rate": 2.9211065145627013e-05,
1238
+ "loss": 0.3477,
1239
+ "step": 99500
1240
+ },
1241
+ {
1242
+ "epoch": 4.18,
1243
+ "learning_rate": 2.9106598136308555e-05,
1244
+ "loss": 0.3479,
1245
+ "step": 100000
1246
+ },
1247
+ {
1248
+ "epoch": 4.2,
1249
+ "learning_rate": 2.9002131126990096e-05,
1250
+ "loss": 0.3479,
1251
+ "step": 100500
1252
+ },
1253
+ {
1254
+ "epoch": 4.22,
1255
+ "learning_rate": 2.889766411767164e-05,
1256
+ "loss": 0.3478,
1257
+ "step": 101000
1258
+ },
1259
+ {
1260
+ "epoch": 4.24,
1261
+ "learning_rate": 2.8793197108353182e-05,
1262
+ "loss": 0.3475,
1263
+ "step": 101500
1264
+ },
1265
+ {
1266
+ "epoch": 4.26,
1267
+ "learning_rate": 2.8688730099034727e-05,
1268
+ "loss": 0.3476,
1269
+ "step": 102000
1270
+ },
1271
+ {
1272
+ "epoch": 4.28,
1273
+ "learning_rate": 2.858426308971627e-05,
1274
+ "loss": 0.3472,
1275
+ "step": 102500
1276
+ },
1277
+ {
1278
+ "epoch": 4.3,
1279
+ "learning_rate": 2.8479796080397813e-05,
1280
+ "loss": 0.3473,
1281
+ "step": 103000
1282
+ },
1283
+ {
1284
+ "epoch": 4.32,
1285
+ "learning_rate": 2.8375329071079354e-05,
1286
+ "loss": 0.3472,
1287
+ "step": 103500
1288
+ },
1289
+ {
1290
+ "epoch": 4.35,
1291
+ "learning_rate": 2.82708620617609e-05,
1292
+ "loss": 0.347,
1293
+ "step": 104000
1294
+ },
1295
+ {
1296
+ "epoch": 4.37,
1297
+ "learning_rate": 2.816639505244244e-05,
1298
+ "loss": 0.3471,
1299
+ "step": 104500
1300
+ },
1301
+ {
1302
+ "epoch": 4.39,
1303
+ "learning_rate": 2.8061928043123982e-05,
1304
+ "loss": 0.3467,
1305
+ "step": 105000
1306
+ },
1307
+ {
1308
+ "epoch": 4.41,
1309
+ "learning_rate": 2.7957461033805527e-05,
1310
+ "loss": 0.3469,
1311
+ "step": 105500
1312
+ },
1313
+ {
1314
+ "epoch": 4.43,
1315
+ "learning_rate": 2.7852994024487068e-05,
1316
+ "loss": 0.3469,
1317
+ "step": 106000
1318
+ },
1319
+ {
1320
+ "epoch": 4.45,
1321
+ "learning_rate": 2.7748527015168613e-05,
1322
+ "loss": 0.3469,
1323
+ "step": 106500
1324
+ },
1325
+ {
1326
+ "epoch": 4.47,
1327
+ "learning_rate": 2.7644060005850154e-05,
1328
+ "loss": 0.3469,
1329
+ "step": 107000
1330
+ },
1331
+ {
1332
+ "epoch": 4.49,
1333
+ "learning_rate": 2.75395929965317e-05,
1334
+ "loss": 0.3468,
1335
+ "step": 107500
1336
+ },
1337
+ {
1338
+ "epoch": 4.51,
1339
+ "learning_rate": 2.743512598721324e-05,
1340
+ "loss": 0.3466,
1341
+ "step": 108000
1342
+ },
1343
+ {
1344
+ "epoch": 4.53,
1345
+ "learning_rate": 2.7330658977894778e-05,
1346
+ "loss": 0.3468,
1347
+ "step": 108500
1348
+ },
1349
+ {
1350
+ "epoch": 4.55,
1351
+ "learning_rate": 2.7226191968576326e-05,
1352
+ "loss": 0.3464,
1353
+ "step": 109000
1354
+ },
1355
+ {
1356
+ "epoch": 4.58,
1357
+ "learning_rate": 2.7121724959257868e-05,
1358
+ "loss": 0.3466,
1359
+ "step": 109500
1360
+ },
1361
+ {
1362
+ "epoch": 4.6,
1363
+ "learning_rate": 2.7017257949939412e-05,
1364
+ "loss": 0.3463,
1365
+ "step": 110000
1366
+ },
1367
+ {
1368
+ "epoch": 4.62,
1369
+ "learning_rate": 2.6912790940620954e-05,
1370
+ "loss": 0.3466,
1371
+ "step": 110500
1372
+ },
1373
+ {
1374
+ "epoch": 4.64,
1375
+ "learning_rate": 2.68083239313025e-05,
1376
+ "loss": 0.3462,
1377
+ "step": 111000
1378
+ },
1379
+ {
1380
+ "epoch": 4.66,
1381
+ "learning_rate": 2.670385692198404e-05,
1382
+ "loss": 0.3462,
1383
+ "step": 111500
1384
+ },
1385
+ {
1386
+ "epoch": 4.68,
1387
+ "learning_rate": 2.6599389912665578e-05,
1388
+ "loss": 0.3462,
1389
+ "step": 112000
1390
+ },
1391
+ {
1392
+ "epoch": 4.7,
1393
+ "learning_rate": 2.6494922903347126e-05,
1394
+ "loss": 0.3463,
1395
+ "step": 112500
1396
+ },
1397
+ {
1398
+ "epoch": 4.72,
1399
+ "learning_rate": 2.6390455894028664e-05,
1400
+ "loss": 0.3463,
1401
+ "step": 113000
1402
+ },
1403
+ {
1404
+ "epoch": 4.74,
1405
+ "learning_rate": 2.6285988884710212e-05,
1406
+ "loss": 0.3461,
1407
+ "step": 113500
1408
+ },
1409
+ {
1410
+ "epoch": 4.76,
1411
+ "learning_rate": 2.618152187539175e-05,
1412
+ "loss": 0.346,
1413
+ "step": 114000
1414
+ },
1415
+ {
1416
+ "epoch": 4.78,
1417
+ "learning_rate": 2.6077054866073298e-05,
1418
+ "loss": 0.3461,
1419
+ "step": 114500
1420
+ },
1421
+ {
1422
+ "epoch": 4.81,
1423
+ "learning_rate": 2.5972587856754836e-05,
1424
+ "loss": 0.3463,
1425
+ "step": 115000
1426
+ },
1427
+ {
1428
+ "epoch": 4.83,
1429
+ "learning_rate": 2.5868120847436384e-05,
1430
+ "loss": 0.3458,
1431
+ "step": 115500
1432
+ },
1433
+ {
1434
+ "epoch": 4.85,
1435
+ "learning_rate": 2.5763653838117922e-05,
1436
+ "loss": 0.3459,
1437
+ "step": 116000
1438
+ },
1439
+ {
1440
+ "epoch": 4.87,
1441
+ "learning_rate": 2.5659186828799464e-05,
1442
+ "loss": 0.3459,
1443
+ "step": 116500
1444
+ },
1445
+ {
1446
+ "epoch": 4.89,
1447
+ "learning_rate": 2.555471981948101e-05,
1448
+ "loss": 0.3455,
1449
+ "step": 117000
1450
+ },
1451
+ {
1452
+ "epoch": 4.91,
1453
+ "learning_rate": 2.545025281016255e-05,
1454
+ "loss": 0.3459,
1455
+ "step": 117500
1456
+ },
1457
+ {
1458
+ "epoch": 4.93,
1459
+ "learning_rate": 2.5345785800844098e-05,
1460
+ "loss": 0.3456,
1461
+ "step": 118000
1462
+ },
1463
+ {
1464
+ "epoch": 4.95,
1465
+ "learning_rate": 2.5241318791525636e-05,
1466
+ "loss": 0.3455,
1467
+ "step": 118500
1468
+ },
1469
+ {
1470
+ "epoch": 4.97,
1471
+ "learning_rate": 2.5136851782207184e-05,
1472
+ "loss": 0.3454,
1473
+ "step": 119000
1474
+ },
1475
+ {
1476
+ "epoch": 4.99,
1477
+ "learning_rate": 2.5032384772888722e-05,
1478
+ "loss": 0.3456,
1479
+ "step": 119500
1480
+ },
1481
+ {
1482
+ "epoch": 5.0,
1483
+ "eval_accuracy": 0.8921225304021042,
1484
+ "eval_loss": 0.33668240904808044,
1485
+ "eval_runtime": 1399.0502,
1486
+ "eval_samples_per_second": 273.374,
1487
+ "eval_steps_per_second": 1.898,
1488
+ "step": 119655
1489
+ },
1490
+ {
1491
+ "epoch": 5.01,
1492
+ "learning_rate": 2.4927917763570267e-05,
1493
+ "loss": 0.3447,
1494
+ "step": 120000
1495
+ },
1496
+ {
1497
+ "epoch": 5.04,
1498
+ "learning_rate": 2.4823450754251808e-05,
1499
+ "loss": 0.345,
1500
+ "step": 120500
1501
+ },
1502
+ {
1503
+ "epoch": 5.06,
1504
+ "learning_rate": 2.4718983744933353e-05,
1505
+ "loss": 0.345,
1506
+ "step": 121000
1507
+ },
1508
+ {
1509
+ "epoch": 5.08,
1510
+ "learning_rate": 2.4614516735614894e-05,
1511
+ "loss": 0.3447,
1512
+ "step": 121500
1513
+ },
1514
+ {
1515
+ "epoch": 5.1,
1516
+ "learning_rate": 2.4510049726296435e-05,
1517
+ "loss": 0.3448,
1518
+ "step": 122000
1519
+ },
1520
+ {
1521
+ "epoch": 5.12,
1522
+ "learning_rate": 2.440558271697798e-05,
1523
+ "loss": 0.3448,
1524
+ "step": 122500
1525
+ },
1526
+ {
1527
+ "epoch": 5.14,
1528
+ "learning_rate": 2.430111570765952e-05,
1529
+ "loss": 0.3446,
1530
+ "step": 123000
1531
+ },
1532
+ {
1533
+ "epoch": 5.16,
1534
+ "learning_rate": 2.4196648698341066e-05,
1535
+ "loss": 0.3446,
1536
+ "step": 123500
1537
+ },
1538
+ {
1539
+ "epoch": 5.18,
1540
+ "learning_rate": 2.4092181689022608e-05,
1541
+ "loss": 0.3443,
1542
+ "step": 124000
1543
+ },
1544
+ {
1545
+ "epoch": 5.2,
1546
+ "learning_rate": 2.3987714679704152e-05,
1547
+ "loss": 0.3442,
1548
+ "step": 124500
1549
+ },
1550
+ {
1551
+ "epoch": 5.22,
1552
+ "learning_rate": 2.3883247670385694e-05,
1553
+ "loss": 0.3444,
1554
+ "step": 125000
1555
+ },
1556
+ {
1557
+ "epoch": 5.24,
1558
+ "learning_rate": 2.3778780661067235e-05,
1559
+ "loss": 0.3446,
1560
+ "step": 125500
1561
+ },
1562
+ {
1563
+ "epoch": 5.27,
1564
+ "learning_rate": 2.3674313651748776e-05,
1565
+ "loss": 0.3444,
1566
+ "step": 126000
1567
+ },
1568
+ {
1569
+ "epoch": 5.29,
1570
+ "learning_rate": 2.356984664243032e-05,
1571
+ "loss": 0.3443,
1572
+ "step": 126500
1573
+ },
1574
+ {
1575
+ "epoch": 5.31,
1576
+ "learning_rate": 2.3465379633111863e-05,
1577
+ "loss": 0.3443,
1578
+ "step": 127000
1579
+ },
1580
+ {
1581
+ "epoch": 5.33,
1582
+ "learning_rate": 2.3360912623793407e-05,
1583
+ "loss": 0.3442,
1584
+ "step": 127500
1585
+ },
1586
+ {
1587
+ "epoch": 5.35,
1588
+ "learning_rate": 2.3256445614474952e-05,
1589
+ "loss": 0.3442,
1590
+ "step": 128000
1591
+ },
1592
+ {
1593
+ "epoch": 5.37,
1594
+ "learning_rate": 2.3151978605156493e-05,
1595
+ "loss": 0.3442,
1596
+ "step": 128500
1597
+ },
1598
+ {
1599
+ "epoch": 5.39,
1600
+ "learning_rate": 2.3047511595838038e-05,
1601
+ "loss": 0.3444,
1602
+ "step": 129000
1603
+ },
1604
+ {
1605
+ "epoch": 5.41,
1606
+ "learning_rate": 2.2943044586519576e-05,
1607
+ "loss": 0.3441,
1608
+ "step": 129500
1609
+ },
1610
+ {
1611
+ "epoch": 5.43,
1612
+ "learning_rate": 2.283857757720112e-05,
1613
+ "loss": 0.3442,
1614
+ "step": 130000
1615
+ },
1616
+ {
1617
+ "epoch": 5.45,
1618
+ "learning_rate": 2.2734110567882662e-05,
1619
+ "loss": 0.3438,
1620
+ "step": 130500
1621
+ },
1622
+ {
1623
+ "epoch": 5.47,
1624
+ "learning_rate": 2.2629643558564207e-05,
1625
+ "loss": 0.344,
1626
+ "step": 131000
1627
+ },
1628
+ {
1629
+ "epoch": 5.49,
1630
+ "learning_rate": 2.2525176549245748e-05,
1631
+ "loss": 0.3439,
1632
+ "step": 131500
1633
+ },
1634
+ {
1635
+ "epoch": 5.52,
1636
+ "learning_rate": 2.2420709539927293e-05,
1637
+ "loss": 0.3437,
1638
+ "step": 132000
1639
+ },
1640
+ {
1641
+ "epoch": 5.54,
1642
+ "learning_rate": 2.2316242530608834e-05,
1643
+ "loss": 0.3435,
1644
+ "step": 132500
1645
+ },
1646
+ {
1647
+ "epoch": 5.56,
1648
+ "learning_rate": 2.221177552129038e-05,
1649
+ "loss": 0.3435,
1650
+ "step": 133000
1651
+ },
1652
+ {
1653
+ "epoch": 5.58,
1654
+ "learning_rate": 2.210730851197192e-05,
1655
+ "loss": 0.3438,
1656
+ "step": 133500
1657
+ },
1658
+ {
1659
+ "epoch": 5.6,
1660
+ "learning_rate": 2.2002841502653462e-05,
1661
+ "loss": 0.3436,
1662
+ "step": 134000
1663
+ },
1664
+ {
1665
+ "epoch": 5.62,
1666
+ "learning_rate": 2.1898374493335007e-05,
1667
+ "loss": 0.3437,
1668
+ "step": 134500
1669
+ },
1670
+ {
1671
+ "epoch": 5.64,
1672
+ "learning_rate": 2.1793907484016548e-05,
1673
+ "loss": 0.3436,
1674
+ "step": 135000
1675
+ },
1676
+ {
1677
+ "epoch": 5.66,
1678
+ "learning_rate": 2.1689440474698093e-05,
1679
+ "loss": 0.3436,
1680
+ "step": 135500
1681
+ },
1682
+ {
1683
+ "epoch": 5.68,
1684
+ "learning_rate": 2.1584973465379634e-05,
1685
+ "loss": 0.3434,
1686
+ "step": 136000
1687
+ },
1688
+ {
1689
+ "epoch": 5.7,
1690
+ "learning_rate": 2.148050645606118e-05,
1691
+ "loss": 0.3435,
1692
+ "step": 136500
1693
+ },
1694
+ {
1695
+ "epoch": 5.72,
1696
+ "learning_rate": 2.1376039446742717e-05,
1697
+ "loss": 0.3435,
1698
+ "step": 137000
1699
+ },
1700
+ {
1701
+ "epoch": 5.75,
1702
+ "learning_rate": 2.127157243742426e-05,
1703
+ "loss": 0.3437,
1704
+ "step": 137500
1705
+ },
1706
+ {
1707
+ "epoch": 5.77,
1708
+ "learning_rate": 2.1167105428105806e-05,
1709
+ "loss": 0.3434,
1710
+ "step": 138000
1711
+ },
1712
+ {
1713
+ "epoch": 5.79,
1714
+ "learning_rate": 2.1062638418787348e-05,
1715
+ "loss": 0.3434,
1716
+ "step": 138500
1717
+ },
1718
+ {
1719
+ "epoch": 5.81,
1720
+ "learning_rate": 2.0958171409468892e-05,
1721
+ "loss": 0.3433,
1722
+ "step": 139000
1723
+ },
1724
+ {
1725
+ "epoch": 5.83,
1726
+ "learning_rate": 2.0853704400150434e-05,
1727
+ "loss": 0.3434,
1728
+ "step": 139500
1729
+ },
1730
+ {
1731
+ "epoch": 5.85,
1732
+ "learning_rate": 2.074923739083198e-05,
1733
+ "loss": 0.3434,
1734
+ "step": 140000
1735
+ },
1736
+ {
1737
+ "epoch": 5.87,
1738
+ "learning_rate": 2.064477038151352e-05,
1739
+ "loss": 0.3431,
1740
+ "step": 140500
1741
+ },
1742
+ {
1743
+ "epoch": 5.89,
1744
+ "learning_rate": 2.054030337219506e-05,
1745
+ "loss": 0.3432,
1746
+ "step": 141000
1747
+ },
1748
+ {
1749
+ "epoch": 5.91,
1750
+ "learning_rate": 2.0435836362876602e-05,
1751
+ "loss": 0.3432,
1752
+ "step": 141500
1753
+ },
1754
+ {
1755
+ "epoch": 5.93,
1756
+ "learning_rate": 2.0331369353558147e-05,
1757
+ "loss": 0.3432,
1758
+ "step": 142000
1759
+ },
1760
+ {
1761
+ "epoch": 5.95,
1762
+ "learning_rate": 2.022690234423969e-05,
1763
+ "loss": 0.3433,
1764
+ "step": 142500
1765
+ },
1766
+ {
1767
+ "epoch": 5.98,
1768
+ "learning_rate": 2.0122435334921233e-05,
1769
+ "loss": 0.3431,
1770
+ "step": 143000
1771
+ },
1772
+ {
1773
+ "epoch": 6.0,
1774
+ "learning_rate": 2.0017968325602775e-05,
1775
+ "loss": 0.3432,
1776
+ "step": 143500
1777
+ },
1778
+ {
1779
+ "epoch": 6.0,
1780
+ "eval_accuracy": 0.8926879801904946,
1781
+ "eval_loss": 0.33457934856414795,
1782
+ "eval_runtime": 1401.0979,
1783
+ "eval_samples_per_second": 272.975,
1784
+ "eval_steps_per_second": 1.896,
1785
+ "step": 143586
1786
+ },
1787
+ {
1788
+ "epoch": 6.02,
1789
+ "learning_rate": 1.991350131628432e-05,
1790
+ "loss": 0.3427,
1791
+ "step": 144000
1792
+ },
1793
+ {
1794
+ "epoch": 6.04,
1795
+ "learning_rate": 1.980903430696586e-05,
1796
+ "loss": 0.3424,
1797
+ "step": 144500
1798
+ },
1799
+ {
1800
+ "epoch": 6.06,
1801
+ "learning_rate": 1.9704567297647402e-05,
1802
+ "loss": 0.3425,
1803
+ "step": 145000
1804
+ },
1805
+ {
1806
+ "epoch": 6.08,
1807
+ "learning_rate": 1.9600100288328947e-05,
1808
+ "loss": 0.3421,
1809
+ "step": 145500
1810
+ },
1811
+ {
1812
+ "epoch": 6.1,
1813
+ "learning_rate": 1.9495633279010488e-05,
1814
+ "loss": 0.3423,
1815
+ "step": 146000
1816
+ },
1817
+ {
1818
+ "epoch": 6.12,
1819
+ "learning_rate": 1.9391166269692033e-05,
1820
+ "loss": 0.3425,
1821
+ "step": 146500
1822
+ },
1823
+ {
1824
+ "epoch": 6.14,
1825
+ "learning_rate": 1.9286699260373574e-05,
1826
+ "loss": 0.3425,
1827
+ "step": 147000
1828
+ },
1829
+ {
1830
+ "epoch": 6.16,
1831
+ "learning_rate": 1.918223225105512e-05,
1832
+ "loss": 0.342,
1833
+ "step": 147500
1834
+ },
1835
+ {
1836
+ "epoch": 6.18,
1837
+ "learning_rate": 1.907776524173666e-05,
1838
+ "loss": 0.3423,
1839
+ "step": 148000
1840
+ },
1841
+ {
1842
+ "epoch": 6.21,
1843
+ "learning_rate": 1.8973298232418205e-05,
1844
+ "loss": 0.3424,
1845
+ "step": 148500
1846
+ },
1847
+ {
1848
+ "epoch": 6.23,
1849
+ "learning_rate": 1.8868831223099747e-05,
1850
+ "loss": 0.3421,
1851
+ "step": 149000
1852
+ },
1853
+ {
1854
+ "epoch": 6.25,
1855
+ "learning_rate": 1.8764364213781288e-05,
1856
+ "loss": 0.342,
1857
+ "step": 149500
1858
+ },
1859
+ {
1860
+ "epoch": 6.27,
1861
+ "learning_rate": 1.8659897204462833e-05,
1862
+ "loss": 0.3423,
1863
+ "step": 150000
1864
+ },
1865
+ {
1866
+ "epoch": 6.29,
1867
+ "learning_rate": 1.8555430195144374e-05,
1868
+ "loss": 0.3423,
1869
+ "step": 150500
1870
+ },
1871
+ {
1872
+ "epoch": 6.31,
1873
+ "learning_rate": 1.845096318582592e-05,
1874
+ "loss": 0.342,
1875
+ "step": 151000
1876
+ },
1877
+ {
1878
+ "epoch": 6.33,
1879
+ "learning_rate": 1.834649617650746e-05,
1880
+ "loss": 0.3421,
1881
+ "step": 151500
1882
+ },
1883
+ {
1884
+ "epoch": 6.35,
1885
+ "learning_rate": 1.8242029167189005e-05,
1886
+ "loss": 0.3421,
1887
+ "step": 152000
1888
+ },
1889
+ {
1890
+ "epoch": 6.37,
1891
+ "learning_rate": 1.8137562157870543e-05,
1892
+ "loss": 0.342,
1893
+ "step": 152500
1894
+ },
1895
+ {
1896
+ "epoch": 6.39,
1897
+ "learning_rate": 1.8033095148552088e-05,
1898
+ "loss": 0.3421,
1899
+ "step": 153000
1900
+ },
1901
+ {
1902
+ "epoch": 6.41,
1903
+ "learning_rate": 1.792862813923363e-05,
1904
+ "loss": 0.342,
1905
+ "step": 153500
1906
+ },
1907
+ {
1908
+ "epoch": 6.44,
1909
+ "learning_rate": 1.7824161129915174e-05,
1910
+ "loss": 0.3418,
1911
+ "step": 154000
1912
+ },
1913
+ {
1914
+ "epoch": 6.46,
1915
+ "learning_rate": 1.7719694120596715e-05,
1916
+ "loss": 0.342,
1917
+ "step": 154500
1918
+ },
1919
+ {
1920
+ "epoch": 6.48,
1921
+ "learning_rate": 1.761522711127826e-05,
1922
+ "loss": 0.3419,
1923
+ "step": 155000
1924
+ },
1925
+ {
1926
+ "epoch": 6.5,
1927
+ "learning_rate": 1.75107601019598e-05,
1928
+ "loss": 0.3419,
1929
+ "step": 155500
1930
+ },
1931
+ {
1932
+ "epoch": 6.52,
1933
+ "learning_rate": 1.7406293092641346e-05,
1934
+ "loss": 0.3415,
1935
+ "step": 156000
1936
+ },
1937
+ {
1938
+ "epoch": 6.54,
1939
+ "learning_rate": 1.7301826083322887e-05,
1940
+ "loss": 0.3417,
1941
+ "step": 156500
1942
+ },
1943
+ {
1944
+ "epoch": 6.56,
1945
+ "learning_rate": 1.719735907400443e-05,
1946
+ "loss": 0.3417,
1947
+ "step": 157000
1948
+ },
1949
+ {
1950
+ "epoch": 6.58,
1951
+ "learning_rate": 1.7092892064685973e-05,
1952
+ "loss": 0.3421,
1953
+ "step": 157500
1954
+ },
1955
+ {
1956
+ "epoch": 6.6,
1957
+ "learning_rate": 1.6988425055367515e-05,
1958
+ "loss": 0.3415,
1959
+ "step": 158000
1960
+ },
1961
+ {
1962
+ "epoch": 6.62,
1963
+ "learning_rate": 1.688395804604906e-05,
1964
+ "loss": 0.3415,
1965
+ "step": 158500
1966
+ },
1967
+ {
1968
+ "epoch": 6.64,
1969
+ "learning_rate": 1.67794910367306e-05,
1970
+ "loss": 0.3416,
1971
+ "step": 159000
1972
+ },
1973
+ {
1974
+ "epoch": 6.66,
1975
+ "learning_rate": 1.6675024027412146e-05,
1976
+ "loss": 0.3414,
1977
+ "step": 159500
1978
+ },
1979
+ {
1980
+ "epoch": 6.69,
1981
+ "learning_rate": 1.6570557018093687e-05,
1982
+ "loss": 0.3415,
1983
+ "step": 160000
1984
+ },
1985
+ {
1986
+ "epoch": 6.71,
1987
+ "learning_rate": 1.6466090008775228e-05,
1988
+ "loss": 0.3416,
1989
+ "step": 160500
1990
+ },
1991
+ {
1992
+ "epoch": 6.73,
1993
+ "learning_rate": 1.6361622999456773e-05,
1994
+ "loss": 0.3415,
1995
+ "step": 161000
1996
+ },
1997
+ {
1998
+ "epoch": 6.75,
1999
+ "learning_rate": 1.6257155990138314e-05,
2000
+ "loss": 0.3414,
2001
+ "step": 161500
2002
+ },
2003
+ {
2004
+ "epoch": 6.77,
2005
+ "learning_rate": 1.615268898081986e-05,
2006
+ "loss": 0.3414,
2007
+ "step": 162000
2008
+ },
2009
+ {
2010
+ "epoch": 6.79,
2011
+ "learning_rate": 1.60482219715014e-05,
2012
+ "loss": 0.3414,
2013
+ "step": 162500
2014
+ },
2015
+ {
2016
+ "epoch": 6.81,
2017
+ "learning_rate": 1.5943754962182945e-05,
2018
+ "loss": 0.3413,
2019
+ "step": 163000
2020
+ },
2021
+ {
2022
+ "epoch": 6.83,
2023
+ "learning_rate": 1.5839287952864487e-05,
2024
+ "loss": 0.3417,
2025
+ "step": 163500
2026
+ },
2027
+ {
2028
+ "epoch": 6.85,
2029
+ "learning_rate": 1.5734820943546028e-05,
2030
+ "loss": 0.3414,
2031
+ "step": 164000
2032
+ },
2033
+ {
2034
+ "epoch": 6.87,
2035
+ "learning_rate": 1.563035393422757e-05,
2036
+ "loss": 0.3413,
2037
+ "step": 164500
2038
+ },
2039
+ {
2040
+ "epoch": 6.89,
2041
+ "learning_rate": 1.5525886924909114e-05,
2042
+ "loss": 0.3412,
2043
+ "step": 165000
2044
+ },
2045
+ {
2046
+ "epoch": 6.92,
2047
+ "learning_rate": 1.5421419915590655e-05,
2048
+ "loss": 0.3413,
2049
+ "step": 165500
2050
+ },
2051
+ {
2052
+ "epoch": 6.94,
2053
+ "learning_rate": 1.53169529062722e-05,
2054
+ "loss": 0.3412,
2055
+ "step": 166000
2056
+ },
2057
+ {
2058
+ "epoch": 6.96,
2059
+ "learning_rate": 1.5212485896953743e-05,
2060
+ "loss": 0.3412,
2061
+ "step": 166500
2062
+ },
2063
+ {
2064
+ "epoch": 6.98,
2065
+ "learning_rate": 1.5108018887635286e-05,
2066
+ "loss": 0.3412,
2067
+ "step": 167000
2068
+ },
2069
+ {
2070
+ "epoch": 7.0,
2071
+ "learning_rate": 1.500355187831683e-05,
2072
+ "loss": 0.3412,
2073
+ "step": 167500
2074
+ },
2075
+ {
2076
+ "epoch": 7.0,
2077
+ "eval_accuracy": 0.8930347020010898,
2078
+ "eval_loss": 0.33325716853141785,
2079
+ "eval_runtime": 1406.6403,
2080
+ "eval_samples_per_second": 271.899,
2081
+ "eval_steps_per_second": 1.888,
2082
+ "step": 167517
2083
+ },
2084
+ {
2085
+ "epoch": 7.02,
2086
+ "learning_rate": 1.489908486899837e-05,
2087
+ "loss": 0.3407,
2088
+ "step": 168000
2089
+ },
2090
+ {
2091
+ "epoch": 7.04,
2092
+ "learning_rate": 1.4794617859679914e-05,
2093
+ "loss": 0.3407,
2094
+ "step": 168500
2095
+ },
2096
+ {
2097
+ "epoch": 7.06,
2098
+ "learning_rate": 1.4690150850361457e-05,
2099
+ "loss": 0.3406,
2100
+ "step": 169000
2101
+ },
2102
+ {
2103
+ "epoch": 7.08,
2104
+ "learning_rate": 1.4585683841043e-05,
2105
+ "loss": 0.3407,
2106
+ "step": 169500
2107
+ },
2108
+ {
2109
+ "epoch": 7.1,
2110
+ "learning_rate": 1.4481216831724543e-05,
2111
+ "loss": 0.3407,
2112
+ "step": 170000
2113
+ },
2114
+ {
2115
+ "epoch": 7.12,
2116
+ "learning_rate": 1.4376749822406086e-05,
2117
+ "loss": 0.3406,
2118
+ "step": 170500
2119
+ },
2120
+ {
2121
+ "epoch": 7.15,
2122
+ "learning_rate": 1.4272282813087629e-05,
2123
+ "loss": 0.3406,
2124
+ "step": 171000
2125
+ },
2126
+ {
2127
+ "epoch": 7.17,
2128
+ "learning_rate": 1.4167815803769172e-05,
2129
+ "loss": 0.3406,
2130
+ "step": 171500
2131
+ },
2132
+ {
2133
+ "epoch": 7.19,
2134
+ "learning_rate": 1.4063348794450712e-05,
2135
+ "loss": 0.3407,
2136
+ "step": 172000
2137
+ },
2138
+ {
2139
+ "epoch": 7.21,
2140
+ "learning_rate": 1.3958881785132255e-05,
2141
+ "loss": 0.3406,
2142
+ "step": 172500
2143
+ },
2144
+ {
2145
+ "epoch": 7.23,
2146
+ "learning_rate": 1.3854414775813798e-05,
2147
+ "loss": 0.3404,
2148
+ "step": 173000
2149
+ },
2150
+ {
2151
+ "epoch": 7.25,
2152
+ "learning_rate": 1.374994776649534e-05,
2153
+ "loss": 0.3406,
2154
+ "step": 173500
2155
+ },
2156
+ {
2157
+ "epoch": 7.27,
2158
+ "learning_rate": 1.3645480757176884e-05,
2159
+ "loss": 0.3405,
2160
+ "step": 174000
2161
+ },
2162
+ {
2163
+ "epoch": 7.29,
2164
+ "learning_rate": 1.3541013747858427e-05,
2165
+ "loss": 0.3405,
2166
+ "step": 174500
2167
+ },
2168
+ {
2169
+ "epoch": 7.31,
2170
+ "learning_rate": 1.343654673853997e-05,
2171
+ "loss": 0.3402,
2172
+ "step": 175000
2173
+ },
2174
+ {
2175
+ "epoch": 7.33,
2176
+ "learning_rate": 1.3332079729221515e-05,
2177
+ "loss": 0.3404,
2178
+ "step": 175500
2179
+ },
2180
+ {
2181
+ "epoch": 7.35,
2182
+ "learning_rate": 1.3227612719903054e-05,
2183
+ "loss": 0.3404,
2184
+ "step": 176000
2185
+ },
2186
+ {
2187
+ "epoch": 7.38,
2188
+ "learning_rate": 1.3123145710584597e-05,
2189
+ "loss": 0.3403,
2190
+ "step": 176500
2191
+ },
2192
+ {
2193
+ "epoch": 7.4,
2194
+ "learning_rate": 1.301867870126614e-05,
2195
+ "loss": 0.3405,
2196
+ "step": 177000
2197
+ },
2198
+ {
2199
+ "epoch": 7.42,
2200
+ "learning_rate": 1.2914211691947683e-05,
2201
+ "loss": 0.3404,
2202
+ "step": 177500
2203
+ },
2204
+ {
2205
+ "epoch": 7.44,
2206
+ "learning_rate": 1.2809744682629226e-05,
2207
+ "loss": 0.3405,
2208
+ "step": 178000
2209
+ },
2210
+ {
2211
+ "epoch": 7.46,
2212
+ "learning_rate": 1.270527767331077e-05,
2213
+ "loss": 0.3403,
2214
+ "step": 178500
2215
+ },
2216
+ {
2217
+ "epoch": 7.48,
2218
+ "learning_rate": 1.2600810663992313e-05,
2219
+ "loss": 0.3401,
2220
+ "step": 179000
2221
+ },
2222
+ {
2223
+ "epoch": 7.5,
2224
+ "learning_rate": 1.2496343654673854e-05,
2225
+ "loss": 0.3404,
2226
+ "step": 179500
2227
+ },
2228
+ {
2229
+ "epoch": 7.52,
2230
+ "learning_rate": 1.2391876645355397e-05,
2231
+ "loss": 0.34,
2232
+ "step": 180000
2233
+ },
2234
+ {
2235
+ "epoch": 7.54,
2236
+ "learning_rate": 1.228740963603694e-05,
2237
+ "loss": 0.3402,
2238
+ "step": 180500
2239
+ },
2240
+ {
2241
+ "epoch": 7.56,
2242
+ "learning_rate": 1.2182942626718483e-05,
2243
+ "loss": 0.3402,
2244
+ "step": 181000
2245
+ },
2246
+ {
2247
+ "epoch": 7.58,
2248
+ "learning_rate": 1.2078475617400026e-05,
2249
+ "loss": 0.3401,
2250
+ "step": 181500
2251
+ },
2252
+ {
2253
+ "epoch": 7.61,
2254
+ "learning_rate": 1.197400860808157e-05,
2255
+ "loss": 0.3402,
2256
+ "step": 182000
2257
+ },
2258
+ {
2259
+ "epoch": 7.63,
2260
+ "learning_rate": 1.186954159876311e-05,
2261
+ "loss": 0.3403,
2262
+ "step": 182500
2263
+ },
2264
+ {
2265
+ "epoch": 7.65,
2266
+ "learning_rate": 1.1765074589444654e-05,
2267
+ "loss": 0.3401,
2268
+ "step": 183000
2269
+ },
2270
+ {
2271
+ "epoch": 7.67,
2272
+ "learning_rate": 1.1660607580126197e-05,
2273
+ "loss": 0.3401,
2274
+ "step": 183500
2275
+ },
2276
+ {
2277
+ "epoch": 7.69,
2278
+ "learning_rate": 1.155614057080774e-05,
2279
+ "loss": 0.3399,
2280
+ "step": 184000
2281
+ },
2282
+ {
2283
+ "epoch": 7.71,
2284
+ "learning_rate": 1.1451673561489281e-05,
2285
+ "loss": 0.34,
2286
+ "step": 184500
2287
+ },
2288
+ {
2289
+ "epoch": 7.73,
2290
+ "learning_rate": 1.1347206552170824e-05,
2291
+ "loss": 0.3402,
2292
+ "step": 185000
2293
+ },
2294
+ {
2295
+ "epoch": 7.75,
2296
+ "learning_rate": 1.1242739542852367e-05,
2297
+ "loss": 0.34,
2298
+ "step": 185500
2299
+ },
2300
+ {
2301
+ "epoch": 7.77,
2302
+ "learning_rate": 1.1138272533533912e-05,
2303
+ "loss": 0.34,
2304
+ "step": 186000
2305
+ },
2306
+ {
2307
+ "epoch": 7.79,
2308
+ "learning_rate": 1.1033805524215453e-05,
2309
+ "loss": 0.34,
2310
+ "step": 186500
2311
+ },
2312
+ {
2313
+ "epoch": 7.81,
2314
+ "learning_rate": 1.0929338514896996e-05,
2315
+ "loss": 0.3402,
2316
+ "step": 187000
2317
+ },
2318
+ {
2319
+ "epoch": 7.84,
2320
+ "learning_rate": 1.082487150557854e-05,
2321
+ "loss": 0.34,
2322
+ "step": 187500
2323
+ },
2324
+ {
2325
+ "epoch": 7.86,
2326
+ "learning_rate": 1.0720404496260082e-05,
2327
+ "loss": 0.3397,
2328
+ "step": 188000
2329
+ },
2330
+ {
2331
+ "epoch": 7.88,
2332
+ "learning_rate": 1.0615937486941624e-05,
2333
+ "loss": 0.3398,
2334
+ "step": 188500
2335
+ },
2336
+ {
2337
+ "epoch": 7.9,
2338
+ "learning_rate": 1.0511470477623167e-05,
2339
+ "loss": 0.3399,
2340
+ "step": 189000
2341
+ },
2342
+ {
2343
+ "epoch": 7.92,
2344
+ "learning_rate": 1.040700346830471e-05,
2345
+ "loss": 0.3398,
2346
+ "step": 189500
2347
+ },
2348
+ {
2349
+ "epoch": 7.94,
2350
+ "learning_rate": 1.0302536458986253e-05,
2351
+ "loss": 0.3399,
2352
+ "step": 190000
2353
+ },
2354
+ {
2355
+ "epoch": 7.96,
2356
+ "learning_rate": 1.0198069449667794e-05,
2357
+ "loss": 0.3399,
2358
+ "step": 190500
2359
+ },
2360
+ {
2361
+ "epoch": 7.98,
2362
+ "learning_rate": 1.0093602440349337e-05,
2363
+ "loss": 0.3397,
2364
+ "step": 191000
2365
+ },
2366
+ {
2367
+ "epoch": 8.0,
2368
+ "eval_accuracy": 0.8933395795756411,
2369
+ "eval_loss": 0.33218324184417725,
2370
+ "eval_runtime": 1414.7217,
2371
+ "eval_samples_per_second": 270.346,
2372
+ "eval_steps_per_second": 1.877,
2373
+ "step": 191448
2374
+ },
2375
+ {
2376
+ "epoch": 8.0,
2377
+ "learning_rate": 9.989135431030882e-06,
2378
+ "loss": 0.3397,
2379
+ "step": 191500
2380
+ },
2381
+ {
2382
+ "epoch": 8.02,
2383
+ "learning_rate": 9.884668421712423e-06,
2384
+ "loss": 0.3393,
2385
+ "step": 192000
2386
+ },
2387
+ {
2388
+ "epoch": 8.04,
2389
+ "learning_rate": 9.780201412393966e-06,
2390
+ "loss": 0.3395,
2391
+ "step": 192500
2392
+ },
2393
+ {
2394
+ "epoch": 8.06,
2395
+ "learning_rate": 9.67573440307551e-06,
2396
+ "loss": 0.3394,
2397
+ "step": 193000
2398
+ },
2399
+ {
2400
+ "epoch": 8.09,
2401
+ "learning_rate": 9.571267393757053e-06,
2402
+ "loss": 0.3394,
2403
+ "step": 193500
2404
+ },
2405
+ {
2406
+ "epoch": 8.11,
2407
+ "learning_rate": 9.466800384438594e-06,
2408
+ "loss": 0.3396,
2409
+ "step": 194000
2410
+ },
2411
+ {
2412
+ "epoch": 8.13,
2413
+ "learning_rate": 9.362333375120137e-06,
2414
+ "loss": 0.3391,
2415
+ "step": 194500
2416
+ },
2417
+ {
2418
+ "epoch": 8.15,
2419
+ "learning_rate": 9.25786636580168e-06,
2420
+ "loss": 0.3393,
2421
+ "step": 195000
2422
+ },
2423
+ {
2424
+ "epoch": 8.17,
2425
+ "learning_rate": 9.153399356483223e-06,
2426
+ "loss": 0.3393,
2427
+ "step": 195500
2428
+ },
2429
+ {
2430
+ "epoch": 8.19,
2431
+ "learning_rate": 9.048932347164764e-06,
2432
+ "loss": 0.3392,
2433
+ "step": 196000
2434
+ },
2435
+ {
2436
+ "epoch": 8.21,
2437
+ "learning_rate": 8.94446533784631e-06,
2438
+ "loss": 0.3393,
2439
+ "step": 196500
2440
+ },
2441
+ {
2442
+ "epoch": 8.23,
2443
+ "learning_rate": 8.839998328527852e-06,
2444
+ "loss": 0.3395,
2445
+ "step": 197000
2446
+ },
2447
+ {
2448
+ "epoch": 8.25,
2449
+ "learning_rate": 8.735531319209395e-06,
2450
+ "loss": 0.3393,
2451
+ "step": 197500
2452
+ },
2453
+ {
2454
+ "epoch": 8.27,
2455
+ "learning_rate": 8.631064309890937e-06,
2456
+ "loss": 0.3394,
2457
+ "step": 198000
2458
+ },
2459
+ {
2460
+ "epoch": 8.29,
2461
+ "learning_rate": 8.52659730057248e-06,
2462
+ "loss": 0.3391,
2463
+ "step": 198500
2464
+ },
2465
+ {
2466
+ "epoch": 8.32,
2467
+ "learning_rate": 8.422130291254023e-06,
2468
+ "loss": 0.3392,
2469
+ "step": 199000
2470
+ },
2471
+ {
2472
+ "epoch": 8.34,
2473
+ "learning_rate": 8.317663281935566e-06,
2474
+ "loss": 0.3394,
2475
+ "step": 199500
2476
+ },
2477
+ {
2478
+ "epoch": 8.36,
2479
+ "learning_rate": 8.213196272617107e-06,
2480
+ "loss": 0.3389,
2481
+ "step": 200000
2482
+ },
2483
+ {
2484
+ "epoch": 8.38,
2485
+ "learning_rate": 8.10872926329865e-06,
2486
+ "loss": 0.3392,
2487
+ "step": 200500
2488
+ },
2489
+ {
2490
+ "epoch": 8.4,
2491
+ "learning_rate": 8.004262253980193e-06,
2492
+ "loss": 0.3394,
2493
+ "step": 201000
2494
+ },
2495
+ {
2496
+ "epoch": 8.42,
2497
+ "learning_rate": 7.899795244661736e-06,
2498
+ "loss": 0.3389,
2499
+ "step": 201500
2500
+ },
2501
+ {
2502
+ "epoch": 8.44,
2503
+ "learning_rate": 7.79532823534328e-06,
2504
+ "loss": 0.3389,
2505
+ "step": 202000
2506
+ },
2507
+ {
2508
+ "epoch": 8.46,
2509
+ "learning_rate": 7.690861226024822e-06,
2510
+ "loss": 0.339,
2511
+ "step": 202500
2512
+ },
2513
+ {
2514
+ "epoch": 8.48,
2515
+ "learning_rate": 7.586394216706365e-06,
2516
+ "loss": 0.3392,
2517
+ "step": 203000
2518
+ },
2519
+ {
2520
+ "epoch": 8.5,
2521
+ "learning_rate": 7.481927207387908e-06,
2522
+ "loss": 0.339,
2523
+ "step": 203500
2524
+ },
2525
+ {
2526
+ "epoch": 8.52,
2527
+ "learning_rate": 7.37746019806945e-06,
2528
+ "loss": 0.3391,
2529
+ "step": 204000
2530
+ },
2531
+ {
2532
+ "epoch": 8.55,
2533
+ "learning_rate": 7.272993188750993e-06,
2534
+ "loss": 0.339,
2535
+ "step": 204500
2536
+ },
2537
+ {
2538
+ "epoch": 8.57,
2539
+ "learning_rate": 7.168526179432536e-06,
2540
+ "loss": 0.3391,
2541
+ "step": 205000
2542
+ },
2543
+ {
2544
+ "epoch": 8.59,
2545
+ "learning_rate": 7.064059170114077e-06,
2546
+ "loss": 0.339,
2547
+ "step": 205500
2548
+ },
2549
+ {
2550
+ "epoch": 8.61,
2551
+ "learning_rate": 6.95959216079562e-06,
2552
+ "loss": 0.339,
2553
+ "step": 206000
2554
+ },
2555
+ {
2556
+ "epoch": 8.63,
2557
+ "learning_rate": 6.855125151477164e-06,
2558
+ "loss": 0.3391,
2559
+ "step": 206500
2560
+ },
2561
+ {
2562
+ "epoch": 8.65,
2563
+ "learning_rate": 6.750658142158707e-06,
2564
+ "loss": 0.339,
2565
+ "step": 207000
2566
+ },
2567
+ {
2568
+ "epoch": 8.67,
2569
+ "learning_rate": 6.646191132840249e-06,
2570
+ "loss": 0.3389,
2571
+ "step": 207500
2572
+ },
2573
+ {
2574
+ "epoch": 8.69,
2575
+ "learning_rate": 6.541724123521792e-06,
2576
+ "loss": 0.3391,
2577
+ "step": 208000
2578
+ },
2579
+ {
2580
+ "epoch": 8.71,
2581
+ "learning_rate": 6.437257114203335e-06,
2582
+ "loss": 0.339,
2583
+ "step": 208500
2584
+ },
2585
+ {
2586
+ "epoch": 8.73,
2587
+ "learning_rate": 6.332790104884878e-06,
2588
+ "loss": 0.339,
2589
+ "step": 209000
2590
+ },
2591
+ {
2592
+ "epoch": 8.75,
2593
+ "learning_rate": 6.228323095566421e-06,
2594
+ "loss": 0.3391,
2595
+ "step": 209500
2596
+ },
2597
+ {
2598
+ "epoch": 8.78,
2599
+ "learning_rate": 6.123856086247963e-06,
2600
+ "loss": 0.3388,
2601
+ "step": 210000
2602
+ },
2603
+ {
2604
+ "epoch": 8.8,
2605
+ "learning_rate": 6.019389076929506e-06,
2606
+ "loss": 0.3393,
2607
+ "step": 210500
2608
+ },
2609
+ {
2610
+ "epoch": 8.82,
2611
+ "learning_rate": 5.914922067611048e-06,
2612
+ "loss": 0.3389,
2613
+ "step": 211000
2614
+ },
2615
+ {
2616
+ "epoch": 8.84,
2617
+ "learning_rate": 5.810455058292591e-06,
2618
+ "loss": 0.3392,
2619
+ "step": 211500
2620
+ },
2621
+ {
2622
+ "epoch": 8.86,
2623
+ "learning_rate": 5.705988048974134e-06,
2624
+ "loss": 0.3391,
2625
+ "step": 212000
2626
+ },
2627
+ {
2628
+ "epoch": 8.88,
2629
+ "learning_rate": 5.6015210396556775e-06,
2630
+ "loss": 0.339,
2631
+ "step": 212500
2632
+ },
2633
+ {
2634
+ "epoch": 8.9,
2635
+ "learning_rate": 5.49705403033722e-06,
2636
+ "loss": 0.3389,
2637
+ "step": 213000
2638
+ },
2639
+ {
2640
+ "epoch": 8.92,
2641
+ "learning_rate": 5.392587021018763e-06,
2642
+ "loss": 0.3389,
2643
+ "step": 213500
2644
+ },
2645
+ {
2646
+ "epoch": 8.94,
2647
+ "learning_rate": 5.288120011700305e-06,
2648
+ "loss": 0.339,
2649
+ "step": 214000
2650
+ },
2651
+ {
2652
+ "epoch": 8.96,
2653
+ "learning_rate": 5.183653002381848e-06,
2654
+ "loss": 0.3386,
2655
+ "step": 214500
2656
+ },
2657
+ {
2658
+ "epoch": 8.98,
2659
+ "learning_rate": 5.079185993063391e-06,
2660
+ "loss": 0.339,
2661
+ "step": 215000
2662
+ },
2663
+ {
2664
+ "epoch": 9.0,
2665
+ "eval_accuracy": 0.8935473987846363,
2666
+ "eval_loss": 0.33142516016960144,
2667
+ "eval_runtime": 1417.4586,
2668
+ "eval_samples_per_second": 269.824,
2669
+ "eval_steps_per_second": 1.874,
2670
+ "step": 215379
2671
+ },
2672
+ {
2673
+ "epoch": 9.01,
2674
+ "learning_rate": 4.974718983744934e-06,
2675
+ "loss": 0.3389,
2676
+ "step": 215500
2677
+ },
2678
+ {
2679
+ "epoch": 9.03,
2680
+ "learning_rate": 4.870251974426476e-06,
2681
+ "loss": 0.3384,
2682
+ "step": 216000
2683
+ },
2684
+ {
2685
+ "epoch": 9.05,
2686
+ "learning_rate": 4.765784965108019e-06,
2687
+ "loss": 0.3383,
2688
+ "step": 216500
2689
+ },
2690
+ {
2691
+ "epoch": 9.07,
2692
+ "learning_rate": 4.6613179557895615e-06,
2693
+ "loss": 0.3385,
2694
+ "step": 217000
2695
+ },
2696
+ {
2697
+ "epoch": 9.09,
2698
+ "learning_rate": 4.5568509464711046e-06,
2699
+ "loss": 0.3383,
2700
+ "step": 217500
2701
+ },
2702
+ {
2703
+ "epoch": 9.11,
2704
+ "learning_rate": 4.452383937152648e-06,
2705
+ "loss": 0.3387,
2706
+ "step": 218000
2707
+ },
2708
+ {
2709
+ "epoch": 9.13,
2710
+ "learning_rate": 4.34791692783419e-06,
2711
+ "loss": 0.3386,
2712
+ "step": 218500
2713
+ },
2714
+ {
2715
+ "epoch": 9.15,
2716
+ "learning_rate": 4.243449918515733e-06,
2717
+ "loss": 0.3382,
2718
+ "step": 219000
2719
+ },
2720
+ {
2721
+ "epoch": 9.17,
2722
+ "learning_rate": 4.138982909197275e-06,
2723
+ "loss": 0.3385,
2724
+ "step": 219500
2725
+ },
2726
+ {
2727
+ "epoch": 9.19,
2728
+ "learning_rate": 4.034515899878818e-06,
2729
+ "loss": 0.3384,
2730
+ "step": 220000
2731
+ },
2732
+ {
2733
+ "epoch": 9.21,
2734
+ "learning_rate": 3.930048890560361e-06,
2735
+ "loss": 0.3386,
2736
+ "step": 220500
2737
+ },
2738
+ {
2739
+ "epoch": 9.23,
2740
+ "learning_rate": 3.825581881241904e-06,
2741
+ "loss": 0.3387,
2742
+ "step": 221000
2743
+ },
2744
+ {
2745
+ "epoch": 9.26,
2746
+ "learning_rate": 3.7211148719234464e-06,
2747
+ "loss": 0.3383,
2748
+ "step": 221500
2749
+ },
2750
+ {
2751
+ "epoch": 9.28,
2752
+ "learning_rate": 3.6166478626049895e-06,
2753
+ "loss": 0.3383,
2754
+ "step": 222000
2755
+ },
2756
+ {
2757
+ "epoch": 9.3,
2758
+ "learning_rate": 3.512180853286532e-06,
2759
+ "loss": 0.3384,
2760
+ "step": 222500
2761
+ },
2762
+ {
2763
+ "epoch": 9.32,
2764
+ "learning_rate": 3.407713843968075e-06,
2765
+ "loss": 0.3385,
2766
+ "step": 223000
2767
+ },
2768
+ {
2769
+ "epoch": 9.34,
2770
+ "learning_rate": 3.3032468346496178e-06,
2771
+ "loss": 0.3386,
2772
+ "step": 223500
2773
+ },
2774
+ {
2775
+ "epoch": 9.36,
2776
+ "learning_rate": 3.198779825331161e-06,
2777
+ "loss": 0.3385,
2778
+ "step": 224000
2779
+ },
2780
+ {
2781
+ "epoch": 9.38,
2782
+ "learning_rate": 3.0943128160127035e-06,
2783
+ "loss": 0.3387,
2784
+ "step": 224500
2785
+ },
2786
+ {
2787
+ "epoch": 9.4,
2788
+ "learning_rate": 2.989845806694246e-06,
2789
+ "loss": 0.3384,
2790
+ "step": 225000
2791
+ },
2792
+ {
2793
+ "epoch": 9.42,
2794
+ "learning_rate": 2.8853787973757887e-06,
2795
+ "loss": 0.3381,
2796
+ "step": 225500
2797
+ },
2798
+ {
2799
+ "epoch": 9.44,
2800
+ "learning_rate": 2.7809117880573313e-06,
2801
+ "loss": 0.3384,
2802
+ "step": 226000
2803
+ },
2804
+ {
2805
+ "epoch": 9.46,
2806
+ "learning_rate": 2.6764447787388744e-06,
2807
+ "loss": 0.3386,
2808
+ "step": 226500
2809
+ },
2810
+ {
2811
+ "epoch": 9.49,
2812
+ "learning_rate": 2.571977769420417e-06,
2813
+ "loss": 0.3384,
2814
+ "step": 227000
2815
+ },
2816
+ {
2817
+ "epoch": 9.51,
2818
+ "learning_rate": 2.4675107601019596e-06,
2819
+ "loss": 0.3381,
2820
+ "step": 227500
2821
+ },
2822
+ {
2823
+ "epoch": 9.53,
2824
+ "learning_rate": 2.3630437507835027e-06,
2825
+ "loss": 0.3383,
2826
+ "step": 228000
2827
+ },
2828
+ {
2829
+ "epoch": 9.55,
2830
+ "learning_rate": 2.2585767414650453e-06,
2831
+ "loss": 0.3385,
2832
+ "step": 228500
2833
+ },
2834
+ {
2835
+ "epoch": 9.57,
2836
+ "learning_rate": 2.154109732146588e-06,
2837
+ "loss": 0.3383,
2838
+ "step": 229000
2839
+ },
2840
+ {
2841
+ "epoch": 9.59,
2842
+ "learning_rate": 2.049642722828131e-06,
2843
+ "loss": 0.3385,
2844
+ "step": 229500
2845
+ },
2846
+ {
2847
+ "epoch": 9.61,
2848
+ "learning_rate": 1.9451757135096736e-06,
2849
+ "loss": 0.3383,
2850
+ "step": 230000
2851
+ },
2852
+ {
2853
+ "epoch": 9.63,
2854
+ "learning_rate": 1.8407087041912165e-06,
2855
+ "loss": 0.3384,
2856
+ "step": 230500
2857
+ },
2858
+ {
2859
+ "epoch": 9.65,
2860
+ "learning_rate": 1.7362416948727593e-06,
2861
+ "loss": 0.3384,
2862
+ "step": 231000
2863
+ },
2864
+ {
2865
+ "epoch": 9.67,
2866
+ "learning_rate": 1.631774685554302e-06,
2867
+ "loss": 0.3383,
2868
+ "step": 231500
2869
+ },
2870
+ {
2871
+ "epoch": 9.69,
2872
+ "learning_rate": 1.5273076762358448e-06,
2873
+ "loss": 0.3384,
2874
+ "step": 232000
2875
+ },
2876
+ {
2877
+ "epoch": 9.72,
2878
+ "learning_rate": 1.4228406669173876e-06,
2879
+ "loss": 0.3384,
2880
+ "step": 232500
2881
+ },
2882
+ {
2883
+ "epoch": 9.74,
2884
+ "learning_rate": 1.3183736575989304e-06,
2885
+ "loss": 0.3385,
2886
+ "step": 233000
2887
+ },
2888
+ {
2889
+ "epoch": 9.76,
2890
+ "learning_rate": 1.213906648280473e-06,
2891
+ "loss": 0.3383,
2892
+ "step": 233500
2893
+ },
2894
+ {
2895
+ "epoch": 9.78,
2896
+ "learning_rate": 1.1094396389620159e-06,
2897
+ "loss": 0.3388,
2898
+ "step": 234000
2899
+ },
2900
+ {
2901
+ "epoch": 9.8,
2902
+ "learning_rate": 1.0049726296435587e-06,
2903
+ "loss": 0.3383,
2904
+ "step": 234500
2905
+ },
2906
+ {
2907
+ "epoch": 9.82,
2908
+ "learning_rate": 9.005056203251015e-07,
2909
+ "loss": 0.3383,
2910
+ "step": 235000
2911
+ },
2912
+ {
2913
+ "epoch": 9.84,
2914
+ "learning_rate": 7.960386110066441e-07,
2915
+ "loss": 0.3382,
2916
+ "step": 235500
2917
+ },
2918
+ {
2919
+ "epoch": 9.86,
2920
+ "learning_rate": 6.915716016881869e-07,
2921
+ "loss": 0.338,
2922
+ "step": 236000
2923
+ },
2924
+ {
2925
+ "epoch": 9.88,
2926
+ "learning_rate": 5.871045923697297e-07,
2927
+ "loss": 0.3382,
2928
+ "step": 236500
2929
+ },
2930
+ {
2931
+ "epoch": 9.9,
2932
+ "learning_rate": 4.826375830512725e-07,
2933
+ "loss": 0.3383,
2934
+ "step": 237000
2935
+ },
2936
+ {
2937
+ "epoch": 9.92,
2938
+ "learning_rate": 3.7817057373281523e-07,
2939
+ "loss": 0.3383,
2940
+ "step": 237500
2941
+ },
2942
+ {
2943
+ "epoch": 9.95,
2944
+ "learning_rate": 2.7370356441435796e-07,
2945
+ "loss": 0.3386,
2946
+ "step": 238000
2947
+ },
2948
+ {
2949
+ "epoch": 9.97,
2950
+ "learning_rate": 1.6923655509590072e-07,
2951
+ "loss": 0.3382,
2952
+ "step": 238500
2953
+ },
2954
+ {
2955
+ "epoch": 9.99,
2956
+ "learning_rate": 6.476954577744348e-08,
2957
+ "loss": 0.3383,
2958
+ "step": 239000
2959
+ },
2960
+ {
2961
+ "epoch": 10.0,
2962
+ "eval_accuracy": 0.8936249984035948,
2963
+ "eval_loss": 0.3311329483985901,
2964
+ "eval_runtime": 1436.3126,
2965
+ "eval_samples_per_second": 266.282,
2966
+ "eval_steps_per_second": 1.849,
2967
+ "step": 239310
2968
+ },
2969
+ {
2970
+ "epoch": 10.0,
2971
+ "step": 239310,
2972
+ "total_flos": 9.004290600882668e+18,
2973
+ "train_loss": 0.35271068150736834,
2974
+ "train_runtime": 202889.1723,
2975
+ "train_samples_per_second": 169.846,
2976
+ "train_steps_per_second": 1.18
2977
+ }
2978
+ ],
2979
+ "logging_steps": 500,
2980
+ "max_steps": 239310,
2981
+ "num_input_tokens_seen": 0,
2982
+ "num_train_epochs": 10,
2983
+ "save_steps": 500,
2984
+ "total_flos": 9.004290600882668e+18,
2985
+ "train_batch_size": 24,
2986
+ "trial_name": null,
2987
+ "trial_params": null
2988
+ }