kaveh commited on
Commit
f867c2e
1 Parent(s): f4ea2cb

trained on clip14 and cxrbert

Browse files
README.md CHANGED
@@ -2,31 +2,30 @@
2
  tags:
3
  - generated_from_trainer
4
  model-index:
5
- - name: output
6
  results: []
7
  ---
8
 
9
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
10
  should probably proofread and complete it, then remove this comment. -->
11
 
12
- # output
13
 
14
- This model is a fine-tuned version of [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32) as Vision model and [allenai/scibert_scivocab_uncased](https://huggingface.co/allenai/scibert_scivocab_uncased) as Text model on ROCO dataset.
15
  It achieves the following results on the evaluation set:
16
- - Loss: 0.6386
17
 
18
  ## Model description
19
 
20
- Fine tuning CLIP model on Radiology images and their captions
21
 
22
  ## Intended uses & limitations
23
 
24
- - Zero-shot classification
25
- - Image Retrieval
26
 
27
  ## Training and evaluation data
28
 
29
- ROCO dataset
30
 
31
  ## Training procedure
32
 
@@ -34,24 +33,105 @@ ROCO dataset
34
 
35
  The following hyperparameters were used during training:
36
  - learning_rate: 5e-05
37
- - train_batch_size: 96
38
- - eval_batch_size: 96
39
  - seed: 42
40
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
41
  - lr_scheduler_type: cosine
42
  - lr_scheduler_warmup_steps: 500
43
- - num_epochs: 5.0
44
 
45
  ### Training results
46
 
47
- | Training Loss | Epoch | Step | Validation Loss |
48
- |:-------------:|:-----:|:----:|:---------------:|
49
- | 1.7414 | 0.73 | 500 | 1.2403 |
50
- | 1.0226 | 1.47 | 1000 | 0.9722 |
51
- | 0.788 | 2.2 | 1500 | 0.8564 |
52
- | 0.5693 | 2.94 | 2000 | 0.7434 |
53
- | 0.3736 | 3.67 | 2500 | 0.6783 |
54
- | 0.265 | 4.41 | 3000 | 0.6500 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
56
 
57
  ### Framework versions
 
2
  tags:
3
  - generated_from_trainer
4
  model-index:
5
+ - name: output_8_clip14_cxrbert
6
  results: []
7
  ---
8
 
9
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
10
  should probably proofread and complete it, then remove this comment. -->
11
 
12
+ # output_8_clip14_cxrbert
13
 
14
+ This model is a fine-tuned version of [pretrained_weights/clip14-cxrbert](https://huggingface.co/pretrained_weights/clip14-cxrbert) on an unknown dataset.
15
  It achieves the following results on the evaluation set:
16
+ - Loss: 0.3388
17
 
18
  ## Model description
19
 
20
+ More information needed
21
 
22
  ## Intended uses & limitations
23
 
24
+ More information needed
 
25
 
26
  ## Training and evaluation data
27
 
28
+ More information needed
29
 
30
  ## Training procedure
31
 
 
33
 
34
  The following hyperparameters were used during training:
35
  - learning_rate: 5e-05
36
+ - train_batch_size: 24
37
+ - eval_batch_size: 24
38
  - seed: 42
39
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
40
  - lr_scheduler_type: cosine
41
  - lr_scheduler_warmup_steps: 500
42
+ - num_epochs: 8.0
43
 
44
  ### Training results
45
 
46
+ | Training Loss | Epoch | Step | Validation Loss |
47
+ |:-------------:|:-----:|:-----:|:---------------:|
48
+ | 0.7951 | 0.09 | 500 | 1.1912 |
49
+ | 0.5887 | 0.18 | 1000 | 0.9833 |
50
+ | 0.5023 | 0.28 | 1500 | 0.8459 |
51
+ | 0.4709 | 0.37 | 2000 | 0.8479 |
52
+ | 0.4484 | 0.46 | 2500 | 0.7667 |
53
+ | 0.4319 | 0.55 | 3000 | 0.8092 |
54
+ | 0.4181 | 0.64 | 3500 | 0.6964 |
55
+ | 0.4107 | 0.73 | 4000 | 0.6463 |
56
+ | 0.3723 | 0.83 | 4500 | 0.7893 |
57
+ | 0.3746 | 0.92 | 5000 | 0.6863 |
58
+ | 0.3667 | 1.01 | 5500 | 0.6910 |
59
+ | 0.3253 | 1.1 | 6000 | 0.6863 |
60
+ | 0.3274 | 1.19 | 6500 | 0.6445 |
61
+ | 0.3065 | 1.28 | 7000 | 0.5908 |
62
+ | 0.2834 | 1.38 | 7500 | 0.6138 |
63
+ | 0.293 | 1.47 | 8000 | 0.6515 |
64
+ | 0.303 | 1.56 | 8500 | 0.5806 |
65
+ | 0.2638 | 1.65 | 9000 | 0.5587 |
66
+ | 0.2593 | 1.74 | 9500 | 0.5216 |
67
+ | 0.2451 | 1.83 | 10000 | 0.5283 |
68
+ | 0.2468 | 1.93 | 10500 | 0.5001 |
69
+ | 0.2295 | 2.02 | 11000 | 0.4975 |
70
+ | 0.1953 | 2.11 | 11500 | 0.4750 |
71
+ | 0.1954 | 2.2 | 12000 | 0.4572 |
72
+ | 0.1737 | 2.29 | 12500 | 0.4731 |
73
+ | 0.175 | 2.38 | 13000 | 0.4526 |
74
+ | 0.1873 | 2.48 | 13500 | 0.4890 |
75
+ | 0.1809 | 2.57 | 14000 | 0.4210 |
76
+ | 0.1711 | 2.66 | 14500 | 0.4197 |
77
+ | 0.1457 | 2.75 | 15000 | 0.3998 |
78
+ | 0.1583 | 2.84 | 15500 | 0.3923 |
79
+ | 0.1579 | 2.94 | 16000 | 0.3823 |
80
+ | 0.1339 | 3.03 | 16500 | 0.3654 |
81
+ | 0.1164 | 3.12 | 17000 | 0.3592 |
82
+ | 0.1217 | 3.21 | 17500 | 0.3641 |
83
+ | 0.119 | 3.3 | 18000 | 0.3553 |
84
+ | 0.1151 | 3.39 | 18500 | 0.3524 |
85
+ | 0.119 | 3.49 | 19000 | 0.3452 |
86
+ | 0.102 | 3.58 | 19500 | 0.3439 |
87
+ | 0.1085 | 3.67 | 20000 | 0.3422 |
88
+ | 0.1142 | 3.76 | 20500 | 0.3396 |
89
+ | 0.1038 | 3.85 | 21000 | 0.3392 |
90
+ | 0.1143 | 3.94 | 21500 | 0.3390 |
91
+ | 0.0983 | 4.04 | 22000 | 0.3390 |
92
+ | 0.0974 | 4.13 | 22500 | 0.3388 |
93
+ | 0.1007 | 4.22 | 23000 | 0.3389 |
94
+ | 0.0903 | 4.31 | 23500 | 0.3396 |
95
+ | 0.095 | 4.4 | 24000 | 0.3394 |
96
+ | 0.0955 | 4.49 | 24500 | 0.3436 |
97
+ | 0.1032 | 4.59 | 25000 | 0.3426 |
98
+ | 0.1037 | 4.68 | 25500 | 0.3485 |
99
+ | 0.103 | 4.77 | 26000 | 0.3547 |
100
+ | 0.0987 | 4.86 | 26500 | 0.3552 |
101
+ | 0.1076 | 4.95 | 27000 | 0.3537 |
102
+ | 0.1134 | 5.04 | 27500 | 0.3549 |
103
+ | 0.1044 | 5.14 | 28000 | 0.3622 |
104
+ | 0.1099 | 5.23 | 28500 | 0.3774 |
105
+ | 0.1129 | 5.32 | 29000 | 0.3872 |
106
+ | 0.1235 | 5.41 | 29500 | 0.3767 |
107
+ | 0.1099 | 5.5 | 30000 | 0.3880 |
108
+ | 0.1331 | 5.6 | 30500 | 0.4181 |
109
+ | 0.134 | 5.69 | 31000 | 0.4090 |
110
+ | 0.142 | 5.78 | 31500 | 0.4045 |
111
+ | 0.1441 | 5.87 | 32000 | 0.4176 |
112
+ | 0.1577 | 5.96 | 32500 | 0.4377 |
113
+ | 0.1539 | 6.05 | 33000 | 0.4327 |
114
+ | 0.1475 | 6.15 | 33500 | 0.4587 |
115
+ | 0.1616 | 6.24 | 34000 | 0.4709 |
116
+ | 0.1671 | 6.33 | 34500 | 0.4920 |
117
+ | 0.1792 | 6.42 | 35000 | 0.4803 |
118
+ | 0.2025 | 6.51 | 35500 | 0.5275 |
119
+ | 0.1823 | 6.6 | 36000 | 0.5115 |
120
+ | 0.2123 | 6.7 | 36500 | 0.4975 |
121
+ | 0.2043 | 6.79 | 37000 | 0.4890 |
122
+ | 0.2086 | 6.88 | 37500 | 0.5374 |
123
+ | 0.2299 | 6.97 | 38000 | 0.5565 |
124
+ | 0.2151 | 7.06 | 38500 | 0.6073 |
125
+ | 0.222 | 7.15 | 39000 | 0.5468 |
126
+ | 0.236 | 7.25 | 39500 | 0.5504 |
127
+ | 0.2031 | 7.34 | 40000 | 0.5549 |
128
+ | 0.2251 | 7.43 | 40500 | 0.5905 |
129
+ | 0.2251 | 7.52 | 41000 | 0.6012 |
130
+ | 0.2464 | 7.61 | 41500 | 0.5931 |
131
+ | 0.2451 | 7.71 | 42000 | 0.6499 |
132
+ | 0.2463 | 7.8 | 42500 | 0.5696 |
133
+ | 0.2385 | 7.89 | 43000 | 0.5360 |
134
+ | 0.2353 | 7.98 | 43500 | 0.5490 |
135
 
136
 
137
  ### Framework versions
all_results.json CHANGED
@@ -1,11 +1,11 @@
1
  {
2
- "epoch": 5.0,
3
- "eval_loss": 0.6386235952377319,
4
- "eval_runtime": 46.0178,
5
- "eval_samples_per_second": 177.627,
6
- "eval_steps_per_second": 1.869,
7
- "train_loss": 0.7240408047355394,
8
- "train_runtime": 3383.1338,
9
- "train_samples_per_second": 96.687,
10
- "train_steps_per_second": 1.006
11
  }
 
1
  {
2
+ "epoch": 8.0,
3
+ "eval_loss": 0.3388192057609558,
4
+ "eval_runtime": 139.1474,
5
+ "eval_samples_per_second": 58.743,
6
+ "eval_steps_per_second": 2.451,
7
+ "train_loss": 0.21080181559736064,
8
+ "train_runtime": 44575.4239,
9
+ "train_samples_per_second": 11.741,
10
+ "train_steps_per_second": 0.978
11
  }
config.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "_commit_hash": null,
3
- "_name_or_path": "pretrained_weights/clip-allenai",
4
  "architectures": [
5
  "VisionTextDualEncoderModel"
6
  ],
@@ -8,9 +8,11 @@
8
  "model_type": "vision-text-dual-encoder",
9
  "projection_dim": 512,
10
  "text_config": {
11
- "_name_or_path": "allenai/scibert_scivocab_uncased",
12
  "add_cross_attention": false,
13
- "architectures": null,
 
 
14
  "attention_probs_dropout_prob": 0.1,
15
  "bad_words_ids": null,
16
  "begin_suppress_tokens": null,
@@ -84,12 +86,12 @@
84
  "typical_p": 1.0,
85
  "use_bfloat16": false,
86
  "use_cache": true,
87
- "vocab_size": 31090
88
  },
89
  "torch_dtype": "float32",
90
  "transformers_version": null,
91
  "vision_config": {
92
- "_name_or_path": "openai/clip-vit-base-patch32",
93
  "add_cross_attention": false,
94
  "architectures": null,
95
  "attention_dropout": 0.0,
@@ -110,7 +112,7 @@
110
  "forced_bos_token_id": null,
111
  "forced_eos_token_id": null,
112
  "hidden_act": "quick_gelu",
113
- "hidden_size": 768,
114
  "id2label": {
115
  "0": "LABEL_0",
116
  "1": "LABEL_1"
@@ -118,7 +120,7 @@
118
  "image_size": 224,
119
  "initializer_factor": 1.0,
120
  "initializer_range": 0.02,
121
- "intermediate_size": 3072,
122
  "is_decoder": false,
123
  "is_encoder_decoder": false,
124
  "label2id": {
@@ -131,20 +133,20 @@
131
  "min_length": 0,
132
  "model_type": "clip_vision_model",
133
  "no_repeat_ngram_size": 0,
134
- "num_attention_heads": 12,
135
  "num_beam_groups": 1,
136
  "num_beams": 1,
137
  "num_channels": 3,
138
- "num_hidden_layers": 12,
139
  "num_return_sequences": 1,
140
  "output_attentions": false,
141
  "output_hidden_states": false,
142
  "output_scores": false,
143
  "pad_token_id": null,
144
- "patch_size": 32,
145
  "prefix": null,
146
  "problem_type": null,
147
- "projection_dim": 512,
148
  "pruned_heads": {},
149
  "remove_invalid_values": false,
150
  "repetition_penalty": 1.0,
 
1
  {
2
  "_commit_hash": null,
3
+ "_name_or_path": "pretrained_weights/clip14-cxrbert",
4
  "architectures": [
5
  "VisionTextDualEncoderModel"
6
  ],
 
8
  "model_type": "vision-text-dual-encoder",
9
  "projection_dim": 512,
10
  "text_config": {
11
+ "_name_or_path": "microsoft/BiomedVLP-CXR-BERT-general",
12
  "add_cross_attention": false,
13
+ "architectures": [
14
+ "BertForMaskedLM"
15
+ ],
16
  "attention_probs_dropout_prob": 0.1,
17
  "bad_words_ids": null,
18
  "begin_suppress_tokens": null,
 
86
  "typical_p": 1.0,
87
  "use_bfloat16": false,
88
  "use_cache": true,
89
+ "vocab_size": 30522
90
  },
91
  "torch_dtype": "float32",
92
  "transformers_version": null,
93
  "vision_config": {
94
+ "_name_or_path": "openai/clip-vit-large-patch14",
95
  "add_cross_attention": false,
96
  "architectures": null,
97
  "attention_dropout": 0.0,
 
112
  "forced_bos_token_id": null,
113
  "forced_eos_token_id": null,
114
  "hidden_act": "quick_gelu",
115
+ "hidden_size": 1024,
116
  "id2label": {
117
  "0": "LABEL_0",
118
  "1": "LABEL_1"
 
120
  "image_size": 224,
121
  "initializer_factor": 1.0,
122
  "initializer_range": 0.02,
123
+ "intermediate_size": 4096,
124
  "is_decoder": false,
125
  "is_encoder_decoder": false,
126
  "label2id": {
 
133
  "min_length": 0,
134
  "model_type": "clip_vision_model",
135
  "no_repeat_ngram_size": 0,
136
+ "num_attention_heads": 16,
137
  "num_beam_groups": 1,
138
  "num_beams": 1,
139
  "num_channels": 3,
140
+ "num_hidden_layers": 24,
141
  "num_return_sequences": 1,
142
  "output_attentions": false,
143
  "output_hidden_states": false,
144
  "output_scores": false,
145
  "pad_token_id": null,
146
+ "patch_size": 14,
147
  "prefix": null,
148
  "problem_type": null,
149
+ "projection_dim": 768,
150
  "pruned_heads": {},
151
  "remove_invalid_values": false,
152
  "repetition_penalty": 1.0,
eval_results.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
- "epoch": 5.0,
3
- "eval_loss": 0.6386235952377319,
4
- "eval_runtime": 46.0178,
5
- "eval_samples_per_second": 177.627,
6
- "eval_steps_per_second": 1.869
7
  }
 
1
  {
2
+ "epoch": 8.0,
3
+ "eval_loss": 0.3388192057609558,
4
+ "eval_runtime": 139.1474,
5
+ "eval_samples_per_second": 58.743,
6
+ "eval_steps_per_second": 2.451
7
  }
heatmap.png ADDED
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e744e104d89831300bfc10d892010ef847c85329e2c16a16f2707c5cfad086e2
3
- size 792784785
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:445915de1c92c4ae37a45cc24c1bf8ab0ed6f8be400e1c11c1a6c547c3327710
3
+ size 1654528401
runs/Jul07_03-20-36_pop-os/events.out.tfevents.1688696444.pop-os.586616.0 DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:90dbb79765d47658ca4c94cd3d2c278ac6b3909023eb28140e1774790d45fac5
3
- size 9270
 
 
 
 
runs/{Jul07_00-39-43_pop-os/events.out.tfevents.1688686792.pop-os.142684.0 → Jul08_01-50-48_pop-os/events.out.tfevents.1688777456.pop-os.224570.0} RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b88f91ad12c5cd9df07af32511879be7f66cef214518fb249b892a715c25be66
3
- size 10983
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc0246bec59f3c94b8c51e5a40440d678f8bd0af0d5e981478d4bed7234b3bd7
3
+ size 54222
runs/{Jul07_00-39-43_pop-os/events.out.tfevents.1688690222.pop-os.142684.1 → Jul08_01-50-48_pop-os/events.out.tfevents.1688822179.pop-os.224570.1} RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:594541324e650715a8354cec1d1e5daccc3060cb06922075ceea489d24901cd3
3
- size 359
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:97e679368a0c9b568c036dbf45af5c02f847eb4330a3943b1ef4cea2827cbd43
3
+ size 364
tokenizer.json CHANGED
The diff for this file is too large to render. See raw diff
 
train_results.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
- "epoch": 5.0,
3
- "train_loss": 0.7240408047355394,
4
- "train_runtime": 3383.1338,
5
- "train_samples_per_second": 96.687,
6
- "train_steps_per_second": 1.006
7
  }
 
1
  {
2
+ "epoch": 8.0,
3
+ "train_loss": 0.21080181559736064,
4
+ "train_runtime": 44575.4239,
5
+ "train_samples_per_second": 11.741,
6
+ "train_steps_per_second": 0.978
7
  }
trainer_state.json CHANGED
@@ -1,109 +1,1243 @@
1
  {
2
- "best_metric": null,
3
- "best_model_checkpoint": null,
4
- "epoch": 5.0,
5
- "global_step": 3405,
6
  "is_hyper_param_search": false,
7
  "is_local_process_zero": true,
8
  "is_world_process_zero": true,
9
  "log_history": [
10
  {
11
- "epoch": 0.73,
12
- "learning_rate": 5e-05,
13
- "loss": 1.7414,
14
  "step": 500
15
  },
16
  {
17
- "epoch": 0.73,
18
- "eval_loss": 1.2403221130371094,
19
- "eval_runtime": 46.0291,
20
- "eval_samples_per_second": 177.583,
21
- "eval_steps_per_second": 1.868,
22
  "step": 500
23
  },
24
  {
25
- "epoch": 1.47,
26
- "learning_rate": 4.643343608987585e-05,
27
- "loss": 1.0226,
28
  "step": 1000
29
  },
30
  {
31
- "epoch": 1.47,
32
- "eval_loss": 0.9721790552139282,
33
- "eval_runtime": 46.5376,
34
- "eval_samples_per_second": 175.643,
35
- "eval_steps_per_second": 1.848,
36
  "step": 1000
37
  },
38
  {
39
- "epoch": 2.2,
40
- "learning_rate": 3.67513746095034e-05,
41
- "loss": 0.788,
42
  "step": 1500
43
  },
44
  {
45
- "epoch": 2.2,
46
- "eval_loss": 0.8563552498817444,
47
- "eval_runtime": 46.4494,
48
- "eval_samples_per_second": 175.976,
49
- "eval_steps_per_second": 1.851,
50
  "step": 1500
51
  },
52
  {
53
- "epoch": 2.94,
54
- "learning_rate": 2.3716350843002614e-05,
55
- "loss": 0.5693,
56
  "step": 2000
57
  },
58
  {
59
- "epoch": 2.94,
60
- "eval_loss": 0.7433565855026245,
61
- "eval_runtime": 63.0881,
62
- "eval_samples_per_second": 129.565,
63
- "eval_steps_per_second": 1.363,
64
  "step": 2000
65
  },
66
  {
67
- "epoch": 3.67,
68
- "learning_rate": 1.104758441703049e-05,
69
- "loss": 0.3736,
70
  "step": 2500
71
  },
72
  {
73
- "epoch": 3.67,
74
- "eval_loss": 0.6783401966094971,
75
- "eval_runtime": 46.4362,
76
- "eval_samples_per_second": 176.027,
77
- "eval_steps_per_second": 1.852,
78
  "step": 2500
79
  },
80
  {
81
- "epoch": 4.41,
82
- "learning_rate": 2.3597925412401912e-06,
83
- "loss": 0.265,
84
  "step": 3000
85
  },
86
  {
87
- "epoch": 4.41,
88
- "eval_loss": 0.6500362753868103,
89
- "eval_runtime": 46.0769,
90
- "eval_samples_per_second": 177.399,
91
- "eval_steps_per_second": 1.866,
92
  "step": 3000
93
  },
94
  {
95
- "epoch": 5.0,
96
- "step": 3405,
97
- "total_flos": 4.36441284864e+16,
98
- "train_loss": 0.7240408047355394,
99
- "train_runtime": 3383.1338,
100
- "train_samples_per_second": 96.687,
101
- "train_steps_per_second": 1.006
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
  }
103
  ],
104
- "max_steps": 3405,
105
- "num_train_epochs": 5,
106
- "total_flos": 4.36441284864e+16,
107
  "trial_name": null,
108
  "trial_params": null
109
  }
 
1
  {
2
+ "best_metric": 0.3388192057609558,
3
+ "best_model_checkpoint": "outputs/output_8_clip14_cxrbert/checkpoint-22500",
4
+ "epoch": 8.0,
5
+ "global_step": 43608,
6
  "is_hyper_param_search": false,
7
  "is_local_process_zero": true,
8
  "is_world_process_zero": true,
9
  "log_history": [
10
  {
11
+ "epoch": 0.09,
12
+ "learning_rate": 4.999999755266707e-05,
13
+ "loss": 0.7951,
14
  "step": 500
15
  },
16
  {
17
+ "epoch": 0.09,
18
+ "eval_loss": 1.1912389993667603,
19
+ "eval_runtime": 139.1386,
20
+ "eval_samples_per_second": 58.747,
21
+ "eval_steps_per_second": 2.451,
22
  "step": 500
23
  },
24
  {
25
+ "epoch": 0.18,
26
+ "learning_rate": 4.993123185382302e-05,
27
+ "loss": 0.5887,
28
  "step": 1000
29
  },
30
  {
31
+ "epoch": 0.18,
32
+ "eval_loss": 0.9833270907402039,
33
+ "eval_runtime": 139.0379,
34
+ "eval_samples_per_second": 58.79,
35
+ "eval_steps_per_second": 2.453,
36
  "step": 1000
37
  },
38
  {
39
+ "epoch": 0.28,
40
+ "learning_rate": 4.972693864808811e-05,
41
+ "loss": 0.5023,
42
  "step": 1500
43
  },
44
  {
45
+ "epoch": 0.28,
46
+ "eval_loss": 0.8458877205848694,
47
+ "eval_runtime": 139.1851,
48
+ "eval_samples_per_second": 58.728,
49
+ "eval_steps_per_second": 2.45,
50
  "step": 1500
51
  },
52
  {
53
+ "epoch": 0.37,
54
+ "learning_rate": 4.938822848423147e-05,
55
+ "loss": 0.4709,
56
  "step": 2000
57
  },
58
  {
59
+ "epoch": 0.37,
60
+ "eval_loss": 0.8479061126708984,
61
+ "eval_runtime": 138.6519,
62
+ "eval_samples_per_second": 58.953,
63
+ "eval_steps_per_second": 2.459,
64
  "step": 2000
65
  },
66
  {
67
+ "epoch": 0.46,
68
+ "learning_rate": 4.891694260878015e-05,
69
+ "loss": 0.4484,
70
  "step": 2500
71
  },
72
  {
73
+ "epoch": 0.46,
74
+ "eval_loss": 0.766708493232727,
75
+ "eval_runtime": 138.9466,
76
+ "eval_samples_per_second": 58.828,
77
+ "eval_steps_per_second": 2.454,
78
  "step": 2500
79
  },
80
  {
81
+ "epoch": 0.55,
82
+ "learning_rate": 4.831564295690475e-05,
83
+ "loss": 0.4319,
84
  "step": 3000
85
  },
86
  {
87
+ "epoch": 0.55,
88
+ "eval_loss": 0.8092461228370667,
89
+ "eval_runtime": 138.9741,
90
+ "eval_samples_per_second": 58.817,
91
+ "eval_steps_per_second": 2.454,
92
  "step": 3000
93
  },
94
  {
95
+ "epoch": 0.64,
96
+ "learning_rate": 4.7587598225603125e-05,
97
+ "loss": 0.4181,
98
+ "step": 3500
99
+ },
100
+ {
101
+ "epoch": 0.64,
102
+ "eval_loss": 0.6963649392127991,
103
+ "eval_runtime": 138.9254,
104
+ "eval_samples_per_second": 58.837,
105
+ "eval_steps_per_second": 2.455,
106
+ "step": 3500
107
+ },
108
+ {
109
+ "epoch": 0.73,
110
+ "learning_rate": 4.673676610488902e-05,
111
+ "loss": 0.4107,
112
+ "step": 4000
113
+ },
114
+ {
115
+ "epoch": 0.73,
116
+ "eval_loss": 0.6463401913642883,
117
+ "eval_runtime": 138.8758,
118
+ "eval_samples_per_second": 58.858,
119
+ "eval_steps_per_second": 2.455,
120
+ "step": 4000
121
+ },
122
+ {
123
+ "epoch": 0.83,
124
+ "learning_rate": 4.576777176357795e-05,
125
+ "loss": 0.3723,
126
+ "step": 4500
127
+ },
128
+ {
129
+ "epoch": 0.83,
130
+ "eval_loss": 0.7892907857894897,
131
+ "eval_runtime": 138.7167,
132
+ "eval_samples_per_second": 58.926,
133
+ "eval_steps_per_second": 2.458,
134
+ "step": 4500
135
+ },
136
+ {
137
+ "epoch": 0.92,
138
+ "learning_rate": 4.468588270662272e-05,
139
+ "loss": 0.3746,
140
+ "step": 5000
141
+ },
142
+ {
143
+ "epoch": 0.92,
144
+ "eval_loss": 0.686305582523346,
145
+ "eval_runtime": 139.0126,
146
+ "eval_samples_per_second": 58.8,
147
+ "eval_steps_per_second": 2.453,
148
+ "step": 5000
149
+ },
150
+ {
151
+ "epoch": 1.01,
152
+ "learning_rate": 4.349698014067534e-05,
153
+ "loss": 0.3667,
154
+ "step": 5500
155
+ },
156
+ {
157
+ "epoch": 1.01,
158
+ "eval_loss": 0.6910073161125183,
159
+ "eval_runtime": 138.8546,
160
+ "eval_samples_per_second": 58.867,
161
+ "eval_steps_per_second": 2.456,
162
+ "step": 5500
163
+ },
164
+ {
165
+ "epoch": 1.1,
166
+ "learning_rate": 4.220752700353382e-05,
167
+ "loss": 0.3253,
168
+ "step": 6000
169
+ },
170
+ {
171
+ "epoch": 1.1,
172
+ "eval_loss": 0.6863256096839905,
173
+ "eval_runtime": 138.8547,
174
+ "eval_samples_per_second": 58.867,
175
+ "eval_steps_per_second": 2.456,
176
+ "step": 6000
177
+ },
178
+ {
179
+ "epoch": 1.19,
180
+ "learning_rate": 4.082453283126738e-05,
181
+ "loss": 0.3274,
182
+ "step": 6500
183
+ },
184
+ {
185
+ "epoch": 1.19,
186
+ "eval_loss": 0.6445034146308899,
187
+ "eval_runtime": 138.846,
188
+ "eval_samples_per_second": 58.871,
189
+ "eval_steps_per_second": 2.456,
190
+ "step": 6500
191
+ },
192
+ {
193
+ "epoch": 1.28,
194
+ "learning_rate": 3.935551565400428e-05,
195
+ "loss": 0.3065,
196
+ "step": 7000
197
+ },
198
+ {
199
+ "epoch": 1.28,
200
+ "eval_loss": 0.5908203125,
201
+ "eval_runtime": 138.7486,
202
+ "eval_samples_per_second": 58.912,
203
+ "eval_steps_per_second": 2.458,
204
+ "step": 7000
205
+ },
206
+ {
207
+ "epoch": 1.38,
208
+ "learning_rate": 3.7808461127518854e-05,
209
+ "loss": 0.2834,
210
+ "step": 7500
211
+ },
212
+ {
213
+ "epoch": 1.38,
214
+ "eval_loss": 0.6137728691101074,
215
+ "eval_runtime": 139.0095,
216
+ "eval_samples_per_second": 58.802,
217
+ "eval_steps_per_second": 2.453,
218
+ "step": 7500
219
+ },
220
+ {
221
+ "epoch": 1.47,
222
+ "learning_rate": 3.6191779122780486e-05,
223
+ "loss": 0.293,
224
+ "step": 8000
225
+ },
226
+ {
227
+ "epoch": 1.47,
228
+ "eval_loss": 0.6515378355979919,
229
+ "eval_runtime": 139.1162,
230
+ "eval_samples_per_second": 58.757,
231
+ "eval_steps_per_second": 2.451,
232
+ "step": 8000
233
+ },
234
+ {
235
+ "epoch": 1.56,
236
+ "learning_rate": 3.4514258009446234e-05,
237
+ "loss": 0.303,
238
+ "step": 8500
239
+ },
240
+ {
241
+ "epoch": 1.56,
242
+ "eval_loss": 0.5806155800819397,
243
+ "eval_runtime": 138.924,
244
+ "eval_samples_per_second": 58.838,
245
+ "eval_steps_per_second": 2.455,
246
+ "step": 8500
247
+ },
248
+ {
249
+ "epoch": 1.65,
250
+ "learning_rate": 3.278501688181439e-05,
251
+ "loss": 0.2638,
252
+ "step": 9000
253
+ },
254
+ {
255
+ "epoch": 1.65,
256
+ "eval_loss": 0.5586961507797241,
257
+ "eval_runtime": 139.1301,
258
+ "eval_samples_per_second": 58.751,
259
+ "eval_steps_per_second": 2.451,
260
+ "step": 9000
261
+ },
262
+ {
263
+ "epoch": 1.74,
264
+ "learning_rate": 3.101345598694112e-05,
265
+ "loss": 0.2593,
266
+ "step": 9500
267
+ },
268
+ {
269
+ "epoch": 1.74,
270
+ "eval_loss": 0.5215563178062439,
271
+ "eval_runtime": 138.9957,
272
+ "eval_samples_per_second": 58.808,
273
+ "eval_steps_per_second": 2.453,
274
+ "step": 9500
275
+ },
276
+ {
277
+ "epoch": 1.83,
278
+ "learning_rate": 2.9209205624395885e-05,
279
+ "loss": 0.2451,
280
+ "step": 10000
281
+ },
282
+ {
283
+ "epoch": 1.83,
284
+ "eval_loss": 0.5282728672027588,
285
+ "eval_runtime": 138.8608,
286
+ "eval_samples_per_second": 58.865,
287
+ "eval_steps_per_second": 2.456,
288
+ "step": 10000
289
+ },
290
+ {
291
+ "epoch": 1.93,
292
+ "learning_rate": 2.7382073795438957e-05,
293
+ "loss": 0.2468,
294
+ "step": 10500
295
+ },
296
+ {
297
+ "epoch": 1.93,
298
+ "eval_loss": 0.5001487135887146,
299
+ "eval_runtime": 138.9207,
300
+ "eval_samples_per_second": 58.839,
301
+ "eval_steps_per_second": 2.455,
302
+ "step": 10500
303
+ },
304
+ {
305
+ "epoch": 2.02,
306
+ "learning_rate": 2.5541992886203175e-05,
307
+ "loss": 0.2295,
308
+ "step": 11000
309
+ },
310
+ {
311
+ "epoch": 2.02,
312
+ "eval_loss": 0.49750879406929016,
313
+ "eval_runtime": 138.9502,
314
+ "eval_samples_per_second": 58.827,
315
+ "eval_steps_per_second": 2.454,
316
+ "step": 11000
317
+ },
318
+ {
319
+ "epoch": 2.11,
320
+ "learning_rate": 2.3698965674712838e-05,
321
+ "loss": 0.1953,
322
+ "step": 11500
323
+ },
324
+ {
325
+ "epoch": 2.11,
326
+ "eval_loss": 0.4750489890575409,
327
+ "eval_runtime": 138.8668,
328
+ "eval_samples_per_second": 58.862,
329
+ "eval_steps_per_second": 2.456,
330
+ "step": 11500
331
+ },
332
+ {
333
+ "epoch": 2.2,
334
+ "learning_rate": 2.1863010955248543e-05,
335
+ "loss": 0.1954,
336
+ "step": 12000
337
+ },
338
+ {
339
+ "epoch": 2.2,
340
+ "eval_loss": 0.45723679661750793,
341
+ "eval_runtime": 139.0817,
342
+ "eval_samples_per_second": 58.771,
343
+ "eval_steps_per_second": 2.452,
344
+ "step": 12000
345
+ },
346
+ {
347
+ "epoch": 2.29,
348
+ "learning_rate": 2.0044109075646793e-05,
349
+ "loss": 0.1737,
350
+ "step": 12500
351
+ },
352
+ {
353
+ "epoch": 2.29,
354
+ "eval_loss": 0.4731180667877197,
355
+ "eval_runtime": 139.0822,
356
+ "eval_samples_per_second": 58.771,
357
+ "eval_steps_per_second": 2.452,
358
+ "step": 12500
359
+ },
360
+ {
361
+ "epoch": 2.38,
362
+ "learning_rate": 1.8252147683596503e-05,
363
+ "loss": 0.175,
364
+ "step": 13000
365
+ },
366
+ {
367
+ "epoch": 2.38,
368
+ "eval_loss": 0.4526049494743347,
369
+ "eval_runtime": 139.059,
370
+ "eval_samples_per_second": 58.781,
371
+ "eval_steps_per_second": 2.452,
372
+ "step": 13000
373
+ },
374
+ {
375
+ "epoch": 2.48,
376
+ "learning_rate": 1.6496867976858525e-05,
377
+ "loss": 0.1873,
378
+ "step": 13500
379
+ },
380
+ {
381
+ "epoch": 2.48,
382
+ "eval_loss": 0.4890150725841522,
383
+ "eval_runtime": 138.99,
384
+ "eval_samples_per_second": 58.81,
385
+ "eval_steps_per_second": 2.453,
386
+ "step": 13500
387
+ },
388
+ {
389
+ "epoch": 2.57,
390
+ "learning_rate": 1.4787811749594674e-05,
391
+ "loss": 0.1809,
392
+ "step": 14000
393
+ },
394
+ {
395
+ "epoch": 2.57,
396
+ "eval_loss": 0.4210197627544403,
397
+ "eval_runtime": 139.0599,
398
+ "eval_samples_per_second": 58.78,
399
+ "eval_steps_per_second": 2.452,
400
+ "step": 14000
401
+ },
402
+ {
403
+ "epoch": 2.66,
404
+ "learning_rate": 1.3134269522665521e-05,
405
+ "loss": 0.1711,
406
+ "step": 14500
407
+ },
408
+ {
409
+ "epoch": 2.66,
410
+ "eval_loss": 0.4197298586368561,
411
+ "eval_runtime": 139.0776,
412
+ "eval_samples_per_second": 58.773,
413
+ "eval_steps_per_second": 2.452,
414
+ "step": 14500
415
+ },
416
+ {
417
+ "epoch": 2.75,
418
+ "learning_rate": 1.1545230039863117e-05,
419
+ "loss": 0.1457,
420
+ "step": 15000
421
+ },
422
+ {
423
+ "epoch": 2.75,
424
+ "eval_loss": 0.3998343348503113,
425
+ "eval_runtime": 138.7787,
426
+ "eval_samples_per_second": 58.9,
427
+ "eval_steps_per_second": 2.457,
428
+ "step": 15000
429
+ },
430
+ {
431
+ "epoch": 2.84,
432
+ "learning_rate": 1.0029331404620077e-05,
433
+ "loss": 0.1583,
434
+ "step": 15500
435
+ },
436
+ {
437
+ "epoch": 2.84,
438
+ "eval_loss": 0.392282098531723,
439
+ "eval_runtime": 139.0688,
440
+ "eval_samples_per_second": 58.777,
441
+ "eval_steps_per_second": 2.452,
442
+ "step": 15500
443
+ },
444
+ {
445
+ "epoch": 2.94,
446
+ "learning_rate": 8.59481412281825e-06,
447
+ "loss": 0.1579,
448
+ "step": 16000
449
+ },
450
+ {
451
+ "epoch": 2.94,
452
+ "eval_loss": 0.3823428750038147,
453
+ "eval_runtime": 139.0069,
454
+ "eval_samples_per_second": 58.803,
455
+ "eval_steps_per_second": 2.453,
456
+ "step": 16000
457
+ },
458
+ {
459
+ "epoch": 3.03,
460
+ "learning_rate": 7.249476306959052e-06,
461
+ "loss": 0.1339,
462
+ "step": 16500
463
+ },
464
+ {
465
+ "epoch": 3.03,
466
+ "eval_loss": 0.3654000163078308,
467
+ "eval_runtime": 139.085,
468
+ "eval_samples_per_second": 58.77,
469
+ "eval_steps_per_second": 2.452,
470
+ "step": 16500
471
+ },
472
+ {
473
+ "epoch": 3.12,
474
+ "learning_rate": 6.00063128520765e-06,
475
+ "loss": 0.1164,
476
+ "step": 17000
477
+ },
478
+ {
479
+ "epoch": 3.12,
480
+ "eval_loss": 0.3591544032096863,
481
+ "eval_runtime": 139.2185,
482
+ "eval_samples_per_second": 58.713,
483
+ "eval_steps_per_second": 2.449,
484
+ "step": 17000
485
+ },
486
+ {
487
+ "epoch": 3.21,
488
+ "learning_rate": 4.855067845750841e-06,
489
+ "loss": 0.1217,
490
+ "step": 17500
491
+ },
492
+ {
493
+ "epoch": 3.21,
494
+ "eval_loss": 0.3641490936279297,
495
+ "eval_runtime": 139.1903,
496
+ "eval_samples_per_second": 58.725,
497
+ "eval_steps_per_second": 2.45,
498
+ "step": 17500
499
+ },
500
+ {
501
+ "epoch": 3.3,
502
+ "learning_rate": 3.8190133325820834e-06,
503
+ "loss": 0.119,
504
+ "step": 18000
505
+ },
506
+ {
507
+ "epoch": 3.3,
508
+ "eval_loss": 0.3553272783756256,
509
+ "eval_runtime": 139.0893,
510
+ "eval_samples_per_second": 58.768,
511
+ "eval_steps_per_second": 2.452,
512
+ "step": 18000
513
+ },
514
+ {
515
+ "epoch": 3.39,
516
+ "learning_rate": 2.8980997933272802e-06,
517
+ "loss": 0.1151,
518
+ "step": 18500
519
+ },
520
+ {
521
+ "epoch": 3.39,
522
+ "eval_loss": 0.35238373279571533,
523
+ "eval_runtime": 139.0702,
524
+ "eval_samples_per_second": 58.776,
525
+ "eval_steps_per_second": 2.452,
526
+ "step": 18500
527
+ },
528
+ {
529
+ "epoch": 3.49,
530
+ "learning_rate": 2.0973333631332525e-06,
531
+ "loss": 0.119,
532
+ "step": 19000
533
+ },
534
+ {
535
+ "epoch": 3.49,
536
+ "eval_loss": 0.3452140688896179,
537
+ "eval_runtime": 138.934,
538
+ "eval_samples_per_second": 58.834,
539
+ "eval_steps_per_second": 2.454,
540
+ "step": 19000
541
+ },
542
+ {
543
+ "epoch": 3.58,
544
+ "learning_rate": 1.4210670510499595e-06,
545
+ "loss": 0.102,
546
+ "step": 19500
547
+ },
548
+ {
549
+ "epoch": 3.58,
550
+ "eval_loss": 0.34390997886657715,
551
+ "eval_runtime": 139.1805,
552
+ "eval_samples_per_second": 58.729,
553
+ "eval_steps_per_second": 2.45,
554
+ "step": 19500
555
+ },
556
+ {
557
+ "epoch": 3.67,
558
+ "learning_rate": 8.729770768409501e-07,
559
+ "loss": 0.1085,
560
+ "step": 20000
561
+ },
562
+ {
563
+ "epoch": 3.67,
564
+ "eval_loss": 0.3422289192676544,
565
+ "eval_runtime": 139.0942,
566
+ "eval_samples_per_second": 58.766,
567
+ "eval_steps_per_second": 2.452,
568
+ "step": 20000
569
+ },
570
+ {
571
+ "epoch": 3.76,
572
+ "learning_rate": 4.5604288685657804e-07,
573
+ "loss": 0.1142,
574
+ "step": 20500
575
+ },
576
+ {
577
+ "epoch": 3.76,
578
+ "eval_loss": 0.33955371379852295,
579
+ "eval_runtime": 138.9826,
580
+ "eval_samples_per_second": 58.813,
581
+ "eval_steps_per_second": 2.454,
582
+ "step": 20500
583
+ },
584
+ {
585
+ "epoch": 3.85,
586
+ "learning_rate": 1.7253095760459415e-07,
587
+ "loss": 0.1038,
588
+ "step": 21000
589
+ },
590
+ {
591
+ "epoch": 3.85,
592
+ "eval_loss": 0.33917009830474854,
593
+ "eval_runtime": 139.1121,
594
+ "eval_samples_per_second": 58.758,
595
+ "eval_steps_per_second": 2.451,
596
+ "step": 21000
597
+ },
598
+ {
599
+ "epoch": 3.94,
600
+ "learning_rate": 2.3982475062916954e-08,
601
+ "loss": 0.1143,
602
+ "step": 21500
603
+ },
604
+ {
605
+ "epoch": 3.94,
606
+ "eval_loss": 0.33897778391838074,
607
+ "eval_runtime": 139.1906,
608
+ "eval_samples_per_second": 58.725,
609
+ "eval_steps_per_second": 2.45,
610
+ "step": 21500
611
+ },
612
+ {
613
+ "epoch": 4.04,
614
+ "learning_rate": 1.1204956710403336e-08,
615
+ "loss": 0.0983,
616
+ "step": 22000
617
+ },
618
+ {
619
+ "epoch": 4.04,
620
+ "eval_loss": 0.3389684855937958,
621
+ "eval_runtime": 139.2217,
622
+ "eval_samples_per_second": 58.712,
623
+ "eval_steps_per_second": 2.449,
624
+ "step": 22000
625
+ },
626
+ {
627
+ "epoch": 4.13,
628
+ "learning_rate": 1.3426786181872375e-07,
629
+ "loss": 0.0974,
630
+ "step": 22500
631
+ },
632
+ {
633
+ "epoch": 4.13,
634
+ "eval_loss": 0.3388192057609558,
635
+ "eval_runtime": 139.1103,
636
+ "eval_samples_per_second": 58.759,
637
+ "eval_steps_per_second": 2.451,
638
+ "step": 22500
639
+ },
640
+ {
641
+ "epoch": 4.22,
642
+ "learning_rate": 3.925022138680762e-07,
643
+ "loss": 0.1007,
644
+ "step": 23000
645
+ },
646
+ {
647
+ "epoch": 4.22,
648
+ "eval_loss": 0.33886849880218506,
649
+ "eval_runtime": 139.2186,
650
+ "eval_samples_per_second": 58.713,
651
+ "eval_steps_per_second": 2.449,
652
+ "step": 23000
653
+ },
654
+ {
655
+ "epoch": 4.31,
656
+ "learning_rate": 7.845042371392303e-07,
657
+ "loss": 0.0903,
658
+ "step": 23500
659
+ },
660
+ {
661
+ "epoch": 4.31,
662
+ "eval_loss": 0.33964774012565613,
663
+ "eval_runtime": 139.1917,
664
+ "eval_samples_per_second": 58.725,
665
+ "eval_steps_per_second": 2.45,
666
+ "step": 23500
667
+ },
668
+ {
669
+ "epoch": 4.4,
670
+ "learning_rate": 1.308142987713265e-06,
671
+ "loss": 0.095,
672
+ "step": 24000
673
+ },
674
+ {
675
+ "epoch": 4.4,
676
+ "eval_loss": 0.3394069969654083,
677
+ "eval_runtime": 139.0407,
678
+ "eval_samples_per_second": 58.789,
679
+ "eval_steps_per_second": 2.453,
680
+ "step": 24000
681
+ },
682
+ {
683
+ "epoch": 4.49,
684
+ "learning_rate": 1.960571937396438e-06,
685
+ "loss": 0.0955,
686
+ "step": 24500
687
+ },
688
+ {
689
+ "epoch": 4.49,
690
+ "eval_loss": 0.3435823619365692,
691
+ "eval_runtime": 138.8656,
692
+ "eval_samples_per_second": 58.863,
693
+ "eval_steps_per_second": 2.456,
694
+ "step": 24500
695
+ },
696
+ {
697
+ "epoch": 4.59,
698
+ "learning_rate": 2.7382444475993473e-06,
699
+ "loss": 0.1032,
700
+ "step": 25000
701
+ },
702
+ {
703
+ "epoch": 4.59,
704
+ "eval_loss": 0.3425971269607544,
705
+ "eval_runtime": 139.1498,
706
+ "eval_samples_per_second": 58.742,
707
+ "eval_steps_per_second": 2.451,
708
+ "step": 25000
709
+ },
710
+ {
711
+ "epoch": 4.68,
712
+ "learning_rate": 3.636933049053598e-06,
713
+ "loss": 0.1037,
714
+ "step": 25500
715
+ },
716
+ {
717
+ "epoch": 4.68,
718
+ "eval_loss": 0.3484514653682709,
719
+ "eval_runtime": 139.2414,
720
+ "eval_samples_per_second": 58.704,
721
+ "eval_steps_per_second": 2.449,
722
+ "step": 25500
723
+ },
724
+ {
725
+ "epoch": 4.77,
726
+ "learning_rate": 4.651752422560337e-06,
727
+ "loss": 0.103,
728
+ "step": 26000
729
+ },
730
+ {
731
+ "epoch": 4.77,
732
+ "eval_loss": 0.35472801327705383,
733
+ "eval_runtime": 139.171,
734
+ "eval_samples_per_second": 58.733,
735
+ "eval_steps_per_second": 2.45,
736
+ "step": 26000
737
+ },
738
+ {
739
+ "epoch": 4.86,
740
+ "learning_rate": 5.777185955846176e-06,
741
+ "loss": 0.0987,
742
+ "step": 26500
743
+ },
744
+ {
745
+ "epoch": 4.86,
746
+ "eval_loss": 0.355197936296463,
747
+ "eval_runtime": 139.255,
748
+ "eval_samples_per_second": 58.698,
749
+ "eval_steps_per_second": 2.449,
750
+ "step": 26500
751
+ },
752
+ {
753
+ "epoch": 4.95,
754
+ "learning_rate": 7.007115732161859e-06,
755
+ "loss": 0.1076,
756
+ "step": 27000
757
+ },
758
+ {
759
+ "epoch": 4.95,
760
+ "eval_loss": 0.35372602939605713,
761
+ "eval_runtime": 139.256,
762
+ "eval_samples_per_second": 58.698,
763
+ "eval_steps_per_second": 2.449,
764
+ "step": 27000
765
+ },
766
+ {
767
+ "epoch": 5.04,
768
+ "learning_rate": 8.334855787604286e-06,
769
+ "loss": 0.1134,
770
+ "step": 27500
771
+ },
772
+ {
773
+ "epoch": 5.04,
774
+ "eval_loss": 0.35491758584976196,
775
+ "eval_runtime": 139.157,
776
+ "eval_samples_per_second": 58.739,
777
+ "eval_steps_per_second": 2.45,
778
+ "step": 27500
779
+ },
780
+ {
781
+ "epoch": 5.14,
782
+ "learning_rate": 9.753188456373041e-06,
783
+ "loss": 0.1044,
784
+ "step": 28000
785
+ },
786
+ {
787
+ "epoch": 5.14,
788
+ "eval_loss": 0.362209677696228,
789
+ "eval_runtime": 139.1058,
790
+ "eval_samples_per_second": 58.761,
791
+ "eval_steps_per_second": 2.451,
792
+ "step": 28000
793
+ },
794
+ {
795
+ "epoch": 5.23,
796
+ "learning_rate": 1.1254403606386926e-05,
797
+ "loss": 0.1099,
798
+ "step": 28500
799
+ },
800
+ {
801
+ "epoch": 5.23,
802
+ "eval_loss": 0.37740227580070496,
803
+ "eval_runtime": 139.2551,
804
+ "eval_samples_per_second": 58.698,
805
+ "eval_steps_per_second": 2.449,
806
+ "step": 28500
807
+ },
808
+ {
809
+ "epoch": 5.32,
810
+ "learning_rate": 1.2830340551973424e-05,
811
+ "loss": 0.1129,
812
+ "step": 29000
813
+ },
814
+ {
815
+ "epoch": 5.32,
816
+ "eval_loss": 0.387184202671051,
817
+ "eval_runtime": 138.9995,
818
+ "eval_samples_per_second": 58.806,
819
+ "eval_steps_per_second": 2.453,
820
+ "step": 29000
821
+ },
822
+ {
823
+ "epoch": 5.41,
824
+ "learning_rate": 1.4472432415791445e-05,
825
+ "loss": 0.1235,
826
+ "step": 29500
827
+ },
828
+ {
829
+ "epoch": 5.41,
830
+ "eval_loss": 0.3766579329967499,
831
+ "eval_runtime": 139.1988,
832
+ "eval_samples_per_second": 58.722,
833
+ "eval_steps_per_second": 2.45,
834
+ "step": 29500
835
+ },
836
+ {
837
+ "epoch": 5.5,
838
+ "learning_rate": 1.6171752698833968e-05,
839
+ "loss": 0.1099,
840
+ "step": 30000
841
+ },
842
+ {
843
+ "epoch": 5.5,
844
+ "eval_loss": 0.3879966139793396,
845
+ "eval_runtime": 139.1987,
846
+ "eval_samples_per_second": 58.722,
847
+ "eval_steps_per_second": 2.45,
848
+ "step": 30000
849
+ },
850
+ {
851
+ "epoch": 5.6,
852
+ "learning_rate": 1.7919063805352744e-05,
853
+ "loss": 0.1331,
854
+ "step": 30500
855
+ },
856
+ {
857
+ "epoch": 5.6,
858
+ "eval_loss": 0.41808027029037476,
859
+ "eval_runtime": 139.3796,
860
+ "eval_samples_per_second": 58.646,
861
+ "eval_steps_per_second": 2.447,
862
+ "step": 30500
863
+ },
864
+ {
865
+ "epoch": 5.69,
866
+ "learning_rate": 1.9704867258922042e-05,
867
+ "loss": 0.134,
868
+ "step": 31000
869
+ },
870
+ {
871
+ "epoch": 5.69,
872
+ "eval_loss": 0.4090297818183899,
873
+ "eval_runtime": 139.2797,
874
+ "eval_samples_per_second": 58.688,
875
+ "eval_steps_per_second": 2.448,
876
+ "step": 31000
877
+ },
878
+ {
879
+ "epoch": 5.78,
880
+ "learning_rate": 2.1519455336663182e-05,
881
+ "loss": 0.142,
882
+ "step": 31500
883
+ },
884
+ {
885
+ "epoch": 5.78,
886
+ "eval_loss": 0.4044671654701233,
887
+ "eval_runtime": 139.2242,
888
+ "eval_samples_per_second": 58.711,
889
+ "eval_steps_per_second": 2.449,
890
+ "step": 31500
891
+ },
892
+ {
893
+ "epoch": 5.87,
894
+ "learning_rate": 2.335296384094446e-05,
895
+ "loss": 0.1441,
896
+ "step": 32000
897
+ },
898
+ {
899
+ "epoch": 5.87,
900
+ "eval_loss": 0.41757142543792725,
901
+ "eval_runtime": 139.1671,
902
+ "eval_samples_per_second": 58.735,
903
+ "eval_steps_per_second": 2.45,
904
+ "step": 32000
905
+ },
906
+ {
907
+ "epoch": 5.96,
908
+ "learning_rate": 2.51954257216856e-05,
909
+ "loss": 0.1577,
910
+ "step": 32500
911
+ },
912
+ {
913
+ "epoch": 5.96,
914
+ "eval_loss": 0.43774479627609253,
915
+ "eval_runtime": 139.219,
916
+ "eval_samples_per_second": 58.713,
917
+ "eval_steps_per_second": 2.449,
918
+ "step": 32500
919
+ },
920
+ {
921
+ "epoch": 6.05,
922
+ "learning_rate": 2.703682525777417e-05,
923
+ "loss": 0.1539,
924
+ "step": 33000
925
+ },
926
+ {
927
+ "epoch": 6.05,
928
+ "eval_loss": 0.43269890546798706,
929
+ "eval_runtime": 139.3068,
930
+ "eval_samples_per_second": 58.676,
931
+ "eval_steps_per_second": 2.448,
932
+ "step": 33000
933
+ },
934
+ {
935
+ "epoch": 6.15,
936
+ "learning_rate": 2.8867152503059856e-05,
937
+ "loss": 0.1475,
938
+ "step": 33500
939
+ },
940
+ {
941
+ "epoch": 6.15,
942
+ "eval_loss": 0.4586590826511383,
943
+ "eval_runtime": 139.2759,
944
+ "eval_samples_per_second": 58.689,
945
+ "eval_steps_per_second": 2.448,
946
+ "step": 33500
947
+ },
948
+ {
949
+ "epoch": 6.24,
950
+ "learning_rate": 3.0676457700956226e-05,
951
+ "loss": 0.1616,
952
+ "step": 34000
953
+ },
954
+ {
955
+ "epoch": 6.24,
956
+ "eval_loss": 0.47090479731559753,
957
+ "eval_runtime": 139.1928,
958
+ "eval_samples_per_second": 58.724,
959
+ "eval_steps_per_second": 2.45,
960
+ "step": 34000
961
+ },
962
+ {
963
+ "epoch": 6.33,
964
+ "learning_rate": 3.2454905371848176e-05,
965
+ "loss": 0.1671,
966
+ "step": 34500
967
+ },
968
+ {
969
+ "epoch": 6.33,
970
+ "eval_loss": 0.49197548627853394,
971
+ "eval_runtime": 139.1637,
972
+ "eval_samples_per_second": 58.737,
973
+ "eval_steps_per_second": 2.45,
974
+ "step": 34500
975
+ },
976
+ {
977
+ "epoch": 6.42,
978
+ "learning_rate": 3.4192827779284355e-05,
979
+ "loss": 0.1792,
980
+ "step": 35000
981
+ },
982
+ {
983
+ "epoch": 6.42,
984
+ "eval_loss": 0.48025813698768616,
985
+ "eval_runtime": 139.2895,
986
+ "eval_samples_per_second": 58.684,
987
+ "eval_steps_per_second": 2.448,
988
+ "step": 35000
989
+ },
990
+ {
991
+ "epoch": 6.51,
992
+ "learning_rate": 3.588077748430818e-05,
993
+ "loss": 0.2025,
994
+ "step": 35500
995
+ },
996
+ {
997
+ "epoch": 6.51,
998
+ "eval_loss": 0.5274905562400818,
999
+ "eval_runtime": 138.8747,
1000
+ "eval_samples_per_second": 58.859,
1001
+ "eval_steps_per_second": 2.455,
1002
+ "step": 35500
1003
+ },
1004
+ {
1005
+ "epoch": 6.6,
1006
+ "learning_rate": 3.7509578702240475e-05,
1007
+ "loss": 0.1823,
1008
+ "step": 36000
1009
+ },
1010
+ {
1011
+ "epoch": 6.6,
1012
+ "eval_loss": 0.5114786028862,
1013
+ "eval_runtime": 139.2682,
1014
+ "eval_samples_per_second": 58.692,
1015
+ "eval_steps_per_second": 2.449,
1016
+ "step": 36000
1017
+ },
1018
+ {
1019
+ "epoch": 6.7,
1020
+ "learning_rate": 3.9070377182734444e-05,
1021
+ "loss": 0.2123,
1022
+ "step": 36500
1023
+ },
1024
+ {
1025
+ "epoch": 6.7,
1026
+ "eval_loss": 0.4975065290927887,
1027
+ "eval_runtime": 138.9217,
1028
+ "eval_samples_per_second": 58.839,
1029
+ "eval_steps_per_second": 2.455,
1030
+ "step": 36500
1031
+ },
1032
+ {
1033
+ "epoch": 6.79,
1034
+ "learning_rate": 4.0554688341953205e-05,
1035
+ "loss": 0.2043,
1036
+ "step": 37000
1037
+ },
1038
+ {
1039
+ "epoch": 6.79,
1040
+ "eval_loss": 0.48896968364715576,
1041
+ "eval_runtime": 139.3258,
1042
+ "eval_samples_per_second": 58.668,
1043
+ "eval_steps_per_second": 2.448,
1044
+ "step": 37000
1045
+ },
1046
+ {
1047
+ "epoch": 6.88,
1048
+ "learning_rate": 4.19544433852203e-05,
1049
+ "loss": 0.2086,
1050
+ "step": 37500
1051
+ },
1052
+ {
1053
+ "epoch": 6.88,
1054
+ "eval_loss": 0.5374048352241516,
1055
+ "eval_runtime": 139.1786,
1056
+ "eval_samples_per_second": 58.73,
1057
+ "eval_steps_per_second": 2.45,
1058
+ "step": 37500
1059
+ },
1060
+ {
1061
+ "epoch": 6.97,
1062
+ "learning_rate": 4.326203316941825e-05,
1063
+ "loss": 0.2299,
1064
+ "step": 38000
1065
+ },
1066
+ {
1067
+ "epoch": 6.97,
1068
+ "eval_loss": 0.5565398335456848,
1069
+ "eval_runtime": 139.0129,
1070
+ "eval_samples_per_second": 58.8,
1071
+ "eval_steps_per_second": 2.453,
1072
+ "step": 38000
1073
+ },
1074
+ {
1075
+ "epoch": 7.06,
1076
+ "learning_rate": 4.44703495666965e-05,
1077
+ "loss": 0.2151,
1078
+ "step": 38500
1079
+ },
1080
+ {
1081
+ "epoch": 7.06,
1082
+ "eval_loss": 0.6073034405708313,
1083
+ "eval_runtime": 139.179,
1084
+ "eval_samples_per_second": 58.73,
1085
+ "eval_steps_per_second": 2.45,
1086
+ "step": 38500
1087
+ },
1088
+ {
1089
+ "epoch": 7.15,
1090
+ "learning_rate": 4.5572824104633835e-05,
1091
+ "loss": 0.222,
1092
+ "step": 39000
1093
+ },
1094
+ {
1095
+ "epoch": 7.15,
1096
+ "eval_loss": 0.5468436479568481,
1097
+ "eval_runtime": 139.1916,
1098
+ "eval_samples_per_second": 58.725,
1099
+ "eval_steps_per_second": 2.45,
1100
+ "step": 39000
1101
+ },
1102
+ {
1103
+ "epoch": 7.25,
1104
+ "learning_rate": 4.656346367280503e-05,
1105
+ "loss": 0.236,
1106
+ "step": 39500
1107
+ },
1108
+ {
1109
+ "epoch": 7.25,
1110
+ "eval_loss": 0.5504103899002075,
1111
+ "eval_runtime": 139.2016,
1112
+ "eval_samples_per_second": 58.721,
1113
+ "eval_steps_per_second": 2.45,
1114
+ "step": 39500
1115
+ },
1116
+ {
1117
+ "epoch": 7.34,
1118
+ "learning_rate": 4.743688310164889e-05,
1119
+ "loss": 0.2031,
1120
+ "step": 40000
1121
+ },
1122
+ {
1123
+ "epoch": 7.34,
1124
+ "eval_loss": 0.5548919439315796,
1125
+ "eval_runtime": 139.2056,
1126
+ "eval_samples_per_second": 58.719,
1127
+ "eval_steps_per_second": 2.45,
1128
+ "step": 40000
1129
+ },
1130
+ {
1131
+ "epoch": 7.43,
1132
+ "learning_rate": 4.818833443653748e-05,
1133
+ "loss": 0.2251,
1134
+ "step": 40500
1135
+ },
1136
+ {
1137
+ "epoch": 7.43,
1138
+ "eval_loss": 0.5905419588088989,
1139
+ "eval_runtime": 139.1367,
1140
+ "eval_samples_per_second": 58.748,
1141
+ "eval_steps_per_second": 2.451,
1142
+ "step": 40500
1143
+ },
1144
+ {
1145
+ "epoch": 7.52,
1146
+ "learning_rate": 4.881373274791077e-05,
1147
+ "loss": 0.2251,
1148
+ "step": 41000
1149
+ },
1150
+ {
1151
+ "epoch": 7.52,
1152
+ "eval_loss": 0.6011632680892944,
1153
+ "eval_runtime": 139.1129,
1154
+ "eval_samples_per_second": 58.758,
1155
+ "eval_steps_per_second": 2.451,
1156
+ "step": 41000
1157
+ },
1158
+ {
1159
+ "epoch": 7.61,
1160
+ "learning_rate": 4.9309678337171785e-05,
1161
+ "loss": 0.2464,
1162
+ "step": 41500
1163
+ },
1164
+ {
1165
+ "epoch": 7.61,
1166
+ "eval_loss": 0.5931146740913391,
1167
+ "eval_runtime": 139.1046,
1168
+ "eval_samples_per_second": 58.762,
1169
+ "eval_steps_per_second": 2.451,
1170
+ "step": 41500
1171
+ },
1172
+ {
1173
+ "epoch": 7.71,
1174
+ "learning_rate": 4.9673475217629615e-05,
1175
+ "loss": 0.2451,
1176
+ "step": 42000
1177
+ },
1178
+ {
1179
+ "epoch": 7.71,
1180
+ "eval_loss": 0.6498579978942871,
1181
+ "eval_runtime": 139.0889,
1182
+ "eval_samples_per_second": 58.768,
1183
+ "eval_steps_per_second": 2.452,
1184
+ "step": 42000
1185
+ },
1186
+ {
1187
+ "epoch": 7.8,
1188
+ "learning_rate": 4.990314577002693e-05,
1189
+ "loss": 0.2463,
1190
+ "step": 42500
1191
+ },
1192
+ {
1193
+ "epoch": 7.8,
1194
+ "eval_loss": 0.5696046948432922,
1195
+ "eval_runtime": 139.0384,
1196
+ "eval_samples_per_second": 58.79,
1197
+ "eval_steps_per_second": 2.453,
1198
+ "step": 42500
1199
+ },
1200
+ {
1201
+ "epoch": 7.89,
1202
+ "learning_rate": 4.999744149298381e-05,
1203
+ "loss": 0.2385,
1204
+ "step": 43000
1205
+ },
1206
+ {
1207
+ "epoch": 7.89,
1208
+ "eval_loss": 0.5360204577445984,
1209
+ "eval_runtime": 139.0432,
1210
+ "eval_samples_per_second": 58.787,
1211
+ "eval_steps_per_second": 2.452,
1212
+ "step": 43000
1213
+ },
1214
+ {
1215
+ "epoch": 7.98,
1216
+ "learning_rate": 4.995584978991786e-05,
1217
+ "loss": 0.2353,
1218
+ "step": 43500
1219
+ },
1220
+ {
1221
+ "epoch": 7.98,
1222
+ "eval_loss": 0.5489608645439148,
1223
+ "eval_runtime": 139.1804,
1224
+ "eval_samples_per_second": 58.73,
1225
+ "eval_steps_per_second": 2.45,
1226
+ "step": 43500
1227
+ },
1228
+ {
1229
+ "epoch": 8.0,
1230
+ "step": 43608,
1231
+ "total_flos": 1.5655045448788992e+17,
1232
+ "train_loss": 0.21080181559736064,
1233
+ "train_runtime": 44575.4239,
1234
+ "train_samples_per_second": 11.741,
1235
+ "train_steps_per_second": 0.978
1236
  }
1237
  ],
1238
+ "max_steps": 43608,
1239
+ "num_train_epochs": 8,
1240
+ "total_flos": 1.5655045448788992e+17,
1241
  "trial_name": null,
1242
  "trial_params": null
1243
  }
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:30f3c1d4ceb7be6ac3c7be9b9e5cf5e45c26648236900f5c940e0066980c9625
3
- size 4027
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5774ed6a6ee7d12e490235aaa201c6682bceb991af75ae978177e58f9bbfb1a7
3
+ size 4091
vocab.txt CHANGED
The diff for this file is too large to render. See raw diff