alexredna commited on
Commit
b2d35fb
1 Parent(s): a187e09

Training in progress, step 150

Browse files
Files changed (37) hide show
  1. README.md +33 -12
  2. adapter_config.json +29 -0
  3. adapter_model.safetensors +3 -0
  4. all_results.json +11 -11
  5. eval_results.json +6 -6
  6. model.safetensors +1 -1
  7. runs/Jan11_18-28-24_98f107f1aa39/events.out.tfevents.1704997763.98f107f1aa39.216186.0 +2 -2
  8. runs/Jan11_21-28-54_98f107f1aa39/events.out.tfevents.1705008598.98f107f1aa39.15082.0 +3 -0
  9. runs/Jan11_21-34-47_98f107f1aa39/events.out.tfevents.1705008958.98f107f1aa39.18703.0 +3 -0
  10. runs/Jan11_21-36-47_98f107f1aa39/events.out.tfevents.1705009077.98f107f1aa39.20189.0 +3 -0
  11. runs/Jan11_21-38-38_98f107f1aa39/events.out.tfevents.1705009197.98f107f1aa39.21596.0 +3 -0
  12. runs/Jan11_21-40-30_98f107f1aa39/events.out.tfevents.1705009300.98f107f1aa39.23004.0 +3 -0
  13. runs/Jan11_21-44-58_98f107f1aa39/events.out.tfevents.1705009550.98f107f1aa39.25868.0 +3 -0
  14. runs/Jan11_21-46-54_98f107f1aa39/events.out.tfevents.1705009660.98f107f1aa39.27299.0 +3 -0
  15. runs/Jan11_21-49-06_98f107f1aa39/events.out.tfevents.1705009795.98f107f1aa39.29314.0 +3 -0
  16. runs/Jan11_21-51-55_98f107f1aa39/events.out.tfevents.1705009965.98f107f1aa39.32004.0 +3 -0
  17. runs/Jan11_21-53-23_98f107f1aa39/events.out.tfevents.1705010052.98f107f1aa39.34285.0 +3 -0
  18. runs/Jan11_21-54-48_98f107f1aa39/events.out.tfevents.1705010137.98f107f1aa39.36490.0 +3 -0
  19. runs/Jan11_21-54-48_98f107f1aa39/events.out.tfevents.1705011051.98f107f1aa39.36490.1 +3 -0
  20. runs/Jan11_22-21-42_98f107f1aa39/events.out.tfevents.1705011749.98f107f1aa39.72772.0 +3 -0
  21. runs/Jan11_22-24-16_98f107f1aa39/events.out.tfevents.1705011906.98f107f1aa39.76528.0 +3 -0
  22. runs/Jan11_22-26-44_98f107f1aa39/events.out.tfevents.1705012049.98f107f1aa39.80091.0 +3 -0
  23. runs/Jan11_22-32-25_98f107f1aa39/events.out.tfevents.1705012389.98f107f1aa39.87865.0 +3 -0
  24. runs/Jan11_22-32-25_98f107f1aa39/events.out.tfevents.1705012890.98f107f1aa39.87865.1 +3 -0
  25. runs/Jan12_09-13-28_98f107f1aa39/events.out.tfevents.1705050917.98f107f1aa39.4962.0 +3 -0
  26. runs/Jan12_09-22-09_98f107f1aa39/events.out.tfevents.1705051431.98f107f1aa39.10530.0 +3 -0
  27. runs/Jan12_09-22-09_98f107f1aa39/events.out.tfevents.1705053028.98f107f1aa39.10530.1 +3 -0
  28. runs/Jan12_11-08-05_98f107f1aa39/events.out.tfevents.1705060868.98f107f1aa39.71724.0 +3 -0
  29. runs/Jan12_12-28-50_98f107f1aa39/events.out.tfevents.1705065451.98f107f1aa39.132106.0 +3 -0
  30. runs/Jan12_15-06-53_98f107f1aa39/events.out.tfevents.1705074452.98f107f1aa39.241703.0 +3 -0
  31. runs/Jan12_16-13-49_98f107f1aa39/events.out.tfevents.1705077215.98f107f1aa39.305319.0 +3 -0
  32. runs/Jan12_16-35-22_98f107f1aa39/events.out.tfevents.1705077372.98f107f1aa39.336626.0 +3 -0
  33. runs/Jan12_19-50-30_98f107f1aa39/events.out.tfevents.1705089913.98f107f1aa39.624217.0 +3 -0
  34. runs/Jan12_21-29-42_98f107f1aa39/events.out.tfevents.1705095280.98f107f1aa39.697537.0 +3 -0
  35. train_results.json +6 -6
  36. trainer_state.json +61 -345
  37. training_args.bin +1 -1
README.md CHANGED
@@ -1,10 +1,13 @@
1
  ---
2
  license: apache-2.0
3
- base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
4
  tags:
5
  - trl
6
  - sft
7
  - generated_from_trainer
 
 
 
8
  model-index:
9
  - name: Tukan-1.1B-Chat-v0.1
10
  results: []
@@ -15,9 +18,9 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  # Tukan-1.1B-Chat-v0.1
17
 
18
- This model is a fine-tuned version of [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) on the None dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 1.5518
21
 
22
  ## Model description
23
 
@@ -36,24 +39,23 @@ More information needed
36
  ### Training hyperparameters
37
 
38
  The following hyperparameters were used during training:
39
- - learning_rate: 4e-05
40
- - train_batch_size: 1
41
- - eval_batch_size: 1
42
  - seed: 42
43
  - distributed_type: multi-GPU
44
- - gradient_accumulation_steps: 40
45
- - total_train_batch_size: 40
46
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
  - lr_scheduler_type: cosine
48
- - num_epochs: 3
49
 
50
  ### Training results
51
 
52
  | Training Loss | Epoch | Step | Validation Loss |
53
  |:-------------:|:-----:|:----:|:---------------:|
54
- | 1.4434 | 1.0 | 366 | 1.3898 |
55
- | 0.9304 | 2.0 | 733 | 1.4106 |
56
- | 0.5651 | 2.99 | 1098 | 1.5518 |
57
 
58
 
59
  ### Framework versions
@@ -62,3 +64,22 @@ The following hyperparameters were used during training:
62
  - Pytorch 2.2.0a0+gitd925d94
63
  - Datasets 2.14.6
64
  - Tokenizers 0.15.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ library_name: peft
4
  tags:
5
  - trl
6
  - sft
7
  - generated_from_trainer
8
+ datasets:
9
+ - generator
10
+ base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
11
  model-index:
12
  - name: Tukan-1.1B-Chat-v0.1
13
  results: []
 
18
 
19
  # Tukan-1.1B-Chat-v0.1
20
 
21
+ This model is a fine-tuned version of [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) on the generator dataset.
22
  It achieves the following results on the evaluation set:
23
+ - Loss: 1.2478
24
 
25
  ## Model description
26
 
 
39
  ### Training hyperparameters
40
 
41
  The following hyperparameters were used during training:
42
+ - learning_rate: 2e-05
43
+ - train_batch_size: 2
44
+ - eval_batch_size: 2
45
  - seed: 42
46
  - distributed_type: multi-GPU
47
+ - gradient_accumulation_steps: 25
48
+ - total_train_batch_size: 50
49
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
50
  - lr_scheduler_type: cosine
51
+ - num_epochs: 1
52
 
53
  ### Training results
54
 
55
  | Training Loss | Epoch | Step | Validation Loss |
56
  |:-------------:|:-----:|:----:|:---------------:|
57
+ | 1.3274 | 0.49 | 20 | 1.2587 |
58
+ | 1.3066 | 0.99 | 40 | 1.2478 |
 
59
 
60
 
61
  ### Framework versions
 
64
  - Pytorch 2.2.0a0+gitd925d94
65
  - Datasets 2.14.6
66
  - Tokenizers 0.15.0
67
+ ## Training procedure
68
+
69
+
70
+ The following `bitsandbytes` quantization config was used during training:
71
+ - quant_method: bitsandbytes
72
+ - load_in_8bit: False
73
+ - load_in_4bit: True
74
+ - llm_int8_threshold: 6.0
75
+ - llm_int8_skip_modules: None
76
+ - llm_int8_enable_fp32_cpu_offload: False
77
+ - llm_int8_has_fp16_weight: False
78
+ - bnb_4bit_quant_type: nf4
79
+ - bnb_4bit_use_double_quant: False
80
+ - bnb_4bit_compute_dtype: float16
81
+
82
+ ### Framework versions
83
+
84
+
85
+ - PEFT 0.6.1
adapter_config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layers_pattern": null,
10
+ "layers_to_transform": null,
11
+ "lora_alpha": 256,
12
+ "lora_dropout": 0.1,
13
+ "modules_to_save": null,
14
+ "peft_type": "LORA",
15
+ "r": 128,
16
+ "rank_pattern": {},
17
+ "revision": null,
18
+ "target_modules": [
19
+ "lm_head",
20
+ "up_proj",
21
+ "down_proj",
22
+ "gate_proj",
23
+ "o_proj",
24
+ "v_proj",
25
+ "q_proj",
26
+ "k_proj"
27
+ ],
28
+ "task_type": "CAUSAL_LM"
29
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d8eaa4c9a40159f160b328d27da9d86690717d2d35b3e1f6d30319e24afd9f86
3
+ size 210609288
all_results.json CHANGED
@@ -1,13 +1,13 @@
1
  {
2
- "epoch": 2.99,
3
- "eval_loss": 1.5518141984939575,
4
- "eval_runtime": 10.9934,
5
- "eval_samples": 300,
6
- "eval_samples_per_second": 27.289,
7
- "eval_steps_per_second": 27.289,
8
- "train_loss": 0.9967951453231506,
9
- "train_runtime": 11703.2983,
10
- "train_samples": 14671,
11
- "train_samples_per_second": 3.761,
12
- "train_steps_per_second": 0.094
13
  }
 
1
  {
2
+ "epoch": 0.99,
3
+ "eval_loss": 1.2477926015853882,
4
+ "eval_runtime": 2.1689,
5
+ "eval_samples": 91,
6
+ "eval_samples_per_second": 4.611,
7
+ "eval_steps_per_second": 2.305,
8
+ "train_loss": 1.3475643575191498,
9
+ "train_runtime": 1594.7957,
10
+ "train_samples": 15296,
11
+ "train_samples_per_second": 1.268,
12
+ "train_steps_per_second": 0.025
13
  }
eval_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
- "epoch": 2.99,
3
- "eval_loss": 1.5518141984939575,
4
- "eval_runtime": 10.9934,
5
- "eval_samples": 300,
6
- "eval_samples_per_second": 27.289,
7
- "eval_steps_per_second": 27.289
8
  }
 
1
  {
2
+ "epoch": 0.99,
3
+ "eval_loss": 1.2477926015853882,
4
+ "eval_runtime": 2.1689,
5
+ "eval_samples": 91,
6
+ "eval_samples_per_second": 4.611,
7
+ "eval_steps_per_second": 2.305
8
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9569d170203a666fb31ac8047261c7df03f5accdc1faaa973802e97e731f8326
3
  size 4400216536
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:51cc53ade716e569cc36a3712785ea0967b1ac9121437876226f25fe74cd364d
3
  size 4400216536
runs/Jan11_18-28-24_98f107f1aa39/events.out.tfevents.1704997763.98f107f1aa39.216186.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:189c7284a6afb9464ce6e0417078f99f3ad54cafd1009e62a36b2457cd2eca9d
3
- size 4953
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:677724e85e3788013b9956db69a9f135ce02419c71659ccc2b607d34776f310c
3
+ size 8195
runs/Jan11_21-28-54_98f107f1aa39/events.out.tfevents.1705008598.98f107f1aa39.15082.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f70b12b1298faf337f744254d8bf103182d86c04f6570826d39b87e46b469924
3
+ size 4995
runs/Jan11_21-34-47_98f107f1aa39/events.out.tfevents.1705008958.98f107f1aa39.18703.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c98de5f7d050214136b73e07247b3971b4ad18279ff896289e1963ee83e55aa
3
+ size 4843
runs/Jan11_21-36-47_98f107f1aa39/events.out.tfevents.1705009077.98f107f1aa39.20189.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d19cf57ab7f7bd419dd878fef6a2b9a34542958c6ea1e1f959ebe83267943e26
3
+ size 4843
runs/Jan11_21-38-38_98f107f1aa39/events.out.tfevents.1705009197.98f107f1aa39.21596.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2ee83aaf85f059701c2ad5447b8a5f891e7d16aaf66799e25871704cb89efef5
3
+ size 4842
runs/Jan11_21-40-30_98f107f1aa39/events.out.tfevents.1705009300.98f107f1aa39.23004.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:572826d99eec1c89b6c0c73b2e74c8d8bdbbc226ab1f3c356b290a2a2438f0fe
3
+ size 4995
runs/Jan11_21-44-58_98f107f1aa39/events.out.tfevents.1705009550.98f107f1aa39.25868.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:168bef11ed6a0e6fe76339697e22a477dfc96b340e1e458561588e2007184c90
3
+ size 4996
runs/Jan11_21-46-54_98f107f1aa39/events.out.tfevents.1705009660.98f107f1aa39.27299.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eec75e9353818e6983a0cdde292571c4202ef37d1fcfd24bb3b20d2b07b97b13
3
+ size 4608
runs/Jan11_21-49-06_98f107f1aa39/events.out.tfevents.1705009795.98f107f1aa39.29314.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee210c1636427d4f452754754075955dc5337aba2a5e33e635fff0e1d243ba1c
3
+ size 4608
runs/Jan11_21-51-55_98f107f1aa39/events.out.tfevents.1705009965.98f107f1aa39.32004.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7fe5b842531fb0952801ee93b4a19347c529d8ecee31bb4aa2cdf15dcbd5f1dd
3
+ size 4455
runs/Jan11_21-53-23_98f107f1aa39/events.out.tfevents.1705010052.98f107f1aa39.34285.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:498eceaeb0fa4fc16e3596e28fc996da50c591b7cc2328c139563dc9c10082db
3
+ size 4455
runs/Jan11_21-54-48_98f107f1aa39/events.out.tfevents.1705010137.98f107f1aa39.36490.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:09cbb83b4785f26dcf641428f567bd32e1e33424eeaccc69603aa4593c53c7a4
3
+ size 5797
runs/Jan11_21-54-48_98f107f1aa39/events.out.tfevents.1705011051.98f107f1aa39.36490.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:44d0b064c458805a5a7e3b4a48466830afa426dc498ecb4863017a9ded40b81d
3
+ size 354
runs/Jan11_22-21-42_98f107f1aa39/events.out.tfevents.1705011749.98f107f1aa39.72772.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c1d7e68b9e108dbeaaf8cf9ff819fe1ab0b1884a12e248ba58029a6a2c50b2b1
3
+ size 4837
runs/Jan11_22-24-16_98f107f1aa39/events.out.tfevents.1705011906.98f107f1aa39.76528.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b43bdce68e02691934bff30da0c9086405ba85f36cf1cb117021940f0f72bba
3
+ size 4866
runs/Jan11_22-26-44_98f107f1aa39/events.out.tfevents.1705012049.98f107f1aa39.80091.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8222521efd116cb8fec8be47d9d400fb3aa985642d7eebd187dacf2db5b4b44a
3
+ size 4837
runs/Jan11_22-32-25_98f107f1aa39/events.out.tfevents.1705012389.98f107f1aa39.87865.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d8505f491612a6693a29650a3bad357366cde69d8b3577bc2855ef2b1f59210f
3
+ size 5105
runs/Jan11_22-32-25_98f107f1aa39/events.out.tfevents.1705012890.98f107f1aa39.87865.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bf733dceb337a23b43551cbd7cde0f5d46d9273b54fc22e26631a8df23bde1e0
3
+ size 354
runs/Jan12_09-13-28_98f107f1aa39/events.out.tfevents.1705050917.98f107f1aa39.4962.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:db07073acd9b5adcb05ba6d136a9c020a7f1d23cba796f92faade96c79238f28
3
+ size 5145
runs/Jan12_09-22-09_98f107f1aa39/events.out.tfevents.1705051431.98f107f1aa39.10530.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8d7897aab0dde5394f09286a94e29543423b08d2dc9c10bbc288f068dea59f23
3
+ size 7107
runs/Jan12_09-22-09_98f107f1aa39/events.out.tfevents.1705053028.98f107f1aa39.10530.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5ec382a40f497e8f243e3b1c81f3605f1480ae2b166100dd10b48f914c67e996
3
+ size 354
runs/Jan12_11-08-05_98f107f1aa39/events.out.tfevents.1705060868.98f107f1aa39.71724.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:98f675a1b2fa7c0bc560a5bcc479b90834532920e1d12db98311c390bf593963
3
+ size 4855
runs/Jan12_12-28-50_98f107f1aa39/events.out.tfevents.1705065451.98f107f1aa39.132106.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1d7614cd14ccfa1667097e7f98dac34b09e3e98471f201cee4ad0ef22c2f1f4a
3
+ size 5471
runs/Jan12_15-06-53_98f107f1aa39/events.out.tfevents.1705074452.98f107f1aa39.241703.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9ea6b7f8d3b1733e70e104f39f76457a61cc06033af82b8b41a4994c6354359e
3
+ size 5317
runs/Jan12_16-13-49_98f107f1aa39/events.out.tfevents.1705077215.98f107f1aa39.305319.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:86accd726de83d12ff45660e44130b9f14c6002b9fc20b1dad43c5e971bca90f
3
+ size 4855
runs/Jan12_16-35-22_98f107f1aa39/events.out.tfevents.1705077372.98f107f1aa39.336626.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:710a538e82b928340ea36f8bca496d8cc76bbb43f7c803793cd4b01fe159b60f
3
+ size 8509
runs/Jan12_19-50-30_98f107f1aa39/events.out.tfevents.1705089913.98f107f1aa39.624217.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f6103cc8f8862308be333a2fbf4ac3af4ac973ba3d859833282967ec78eaec3a
3
+ size 6095
runs/Jan12_21-29-42_98f107f1aa39/events.out.tfevents.1705095280.98f107f1aa39.697537.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0c50b15d601b11a72f8fde357b8245cdee96b6d388fe16f5acd13bf343710759
3
+ size 6409
train_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
- "epoch": 2.99,
3
- "train_loss": 0.9967951453231506,
4
- "train_runtime": 11703.2983,
5
- "train_samples": 14671,
6
- "train_samples_per_second": 3.761,
7
- "train_steps_per_second": 0.094
8
  }
 
1
  {
2
+ "epoch": 0.99,
3
+ "train_loss": 1.3475643575191498,
4
+ "train_runtime": 1594.7957,
5
+ "train_samples": 15296,
6
+ "train_samples_per_second": 1.268,
7
+ "train_steps_per_second": 0.025
8
  }
trainer_state.json CHANGED
@@ -1,384 +1,100 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 2.993660963806148,
5
- "eval_steps": 500,
6
- "global_step": 1098,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
- "epoch": 0.0,
13
- "learning_rate": 3.999991813565924e-05,
14
- "loss": 2.2897,
15
  "step": 1
16
  },
17
  {
18
- "epoch": 0.05,
19
- "learning_rate": 3.996726317608652e-05,
20
- "loss": 1.6172,
21
- "step": 20
22
- },
23
- {
24
- "epoch": 0.11,
25
- "learning_rate": 3.986915987431006e-05,
26
- "loss": 1.5144,
27
- "step": 40
28
  },
29
  {
30
- "epoch": 0.16,
31
- "learning_rate": 3.970601125372218e-05,
32
- "loss": 1.5003,
33
- "step": 60
34
  },
35
  {
36
- "epoch": 0.22,
37
- "learning_rate": 3.947835141108928e-05,
38
- "loss": 1.4788,
39
- "step": 80
40
- },
41
- {
42
- "epoch": 0.27,
43
- "learning_rate": 3.9186925632429396e-05,
44
- "loss": 1.4834,
45
- "step": 100
46
- },
47
- {
48
- "epoch": 0.33,
49
- "learning_rate": 3.883268795318252e-05,
50
- "loss": 1.4782,
51
- "step": 120
52
- },
53
- {
54
- "epoch": 0.38,
55
- "learning_rate": 3.8416798035001545e-05,
56
- "loss": 1.4776,
57
- "step": 140
58
- },
59
- {
60
- "epoch": 0.44,
61
- "learning_rate": 3.794061736938837e-05,
62
- "loss": 1.4813,
63
- "step": 160
64
  },
65
  {
66
  "epoch": 0.49,
67
- "learning_rate": 3.740570482060311e-05,
68
- "loss": 1.4974,
69
- "step": 180
70
- },
71
- {
72
- "epoch": 0.55,
73
- "learning_rate": 3.681381152243763e-05,
74
- "loss": 1.4778,
75
- "step": 200
76
- },
77
- {
78
- "epoch": 0.6,
79
- "learning_rate": 3.6166875145559684e-05,
80
- "loss": 1.5029,
81
- "step": 220
82
- },
83
- {
84
- "epoch": 0.65,
85
- "learning_rate": 3.54670135541946e-05,
86
- "loss": 1.5029,
87
- "step": 240
88
  },
89
  {
90
- "epoch": 0.71,
91
- "learning_rate": 3.4716517872910405e-05,
92
- "loss": 1.4741,
93
- "step": 260
 
 
94
  },
95
  {
96
- "epoch": 0.76,
97
- "learning_rate": 3.391784498620369e-05,
98
- "loss": 1.4563,
99
- "step": 280
100
  },
101
  {
102
- "epoch": 0.82,
103
- "learning_rate": 3.307360949544012e-05,
104
- "loss": 1.4634,
105
- "step": 300
106
  },
107
  {
108
  "epoch": 0.87,
109
- "learning_rate": 3.2186575159479966e-05,
110
- "loss": 1.4616,
111
- "step": 320
112
- },
113
- {
114
- "epoch": 0.93,
115
- "learning_rate": 3.1259645847009384e-05,
116
- "loss": 1.4308,
117
- "step": 340
118
- },
119
- {
120
- "epoch": 0.98,
121
- "learning_rate": 3.0295856030196618e-05,
122
- "loss": 1.4434,
123
- "step": 360
124
- },
125
- {
126
- "epoch": 1.0,
127
- "eval_loss": 1.3897957801818848,
128
- "eval_runtime": 11.4488,
129
- "eval_samples_per_second": 26.204,
130
- "eval_steps_per_second": 26.204,
131
- "step": 366
132
- },
133
- {
134
- "epoch": 1.04,
135
- "learning_rate": 2.9298360850793944e-05,
136
- "loss": 1.1296,
137
- "step": 380
138
- },
139
- {
140
- "epoch": 1.09,
141
- "learning_rate": 2.827042579120562e-05,
142
- "loss": 0.9657,
143
- "step": 400
144
- },
145
- {
146
- "epoch": 1.15,
147
- "learning_rate": 2.721541598433567e-05,
148
- "loss": 0.9303,
149
- "step": 420
150
- },
151
- {
152
- "epoch": 1.2,
153
- "learning_rate": 2.613678519721155e-05,
154
- "loss": 0.9411,
155
- "step": 440
156
- },
157
- {
158
- "epoch": 1.25,
159
- "learning_rate": 2.5038064524447827e-05,
160
- "loss": 0.9468,
161
- "step": 460
162
- },
163
- {
164
- "epoch": 1.31,
165
- "learning_rate": 2.392285082856394e-05,
166
- "loss": 0.938,
167
- "step": 480
168
- },
169
- {
170
- "epoch": 1.36,
171
- "learning_rate": 2.2794794964998705e-05,
172
- "loss": 0.938,
173
- "step": 500
174
- },
175
- {
176
- "epoch": 1.42,
177
- "learning_rate": 2.1657589830369113e-05,
178
- "loss": 0.9383,
179
- "step": 520
180
- },
181
- {
182
- "epoch": 1.47,
183
- "learning_rate": 2.0514958273099778e-05,
184
- "loss": 0.9431,
185
- "step": 540
186
- },
187
- {
188
- "epoch": 1.53,
189
- "learning_rate": 1.93706409059995e-05,
190
- "loss": 0.937,
191
- "step": 560
192
- },
193
- {
194
- "epoch": 1.58,
195
- "learning_rate": 1.82283838606831e-05,
196
- "loss": 0.9408,
197
- "step": 580
198
- },
199
- {
200
- "epoch": 1.64,
201
- "learning_rate": 1.7091926523926205e-05,
202
- "loss": 0.9567,
203
- "step": 600
204
  },
205
  {
206
- "epoch": 1.69,
207
- "learning_rate": 1.5964989296100682e-05,
208
- "loss": 0.9302,
209
- "step": 620
210
- },
211
- {
212
- "epoch": 1.74,
213
- "learning_rate": 1.4851261411765414e-05,
214
- "loss": 0.9309,
215
- "step": 640
216
- },
217
- {
218
- "epoch": 1.8,
219
- "learning_rate": 1.375438886228411e-05,
220
- "loss": 0.9354,
221
- "step": 660
222
- },
223
- {
224
- "epoch": 1.85,
225
- "learning_rate": 1.2677962460007555e-05,
226
- "loss": 0.9429,
227
- "step": 680
228
- },
229
- {
230
- "epoch": 1.91,
231
- "learning_rate": 1.162550608309446e-05,
232
- "loss": 0.9209,
233
- "step": 700
234
- },
235
- {
236
- "epoch": 1.96,
237
- "learning_rate": 1.060046513945361e-05,
238
- "loss": 0.9304,
239
- "step": 720
240
- },
241
- {
242
- "epoch": 2.0,
243
- "eval_loss": 1.4105572700500488,
244
- "eval_runtime": 11.4541,
245
- "eval_samples_per_second": 26.191,
246
- "eval_steps_per_second": 26.191,
247
- "step": 733
248
- },
249
- {
250
- "epoch": 2.02,
251
- "learning_rate": 9.606195287572577e-06,
252
- "loss": 0.7909,
253
- "step": 740
254
- },
255
- {
256
- "epoch": 2.07,
257
- "learning_rate": 8.645951451157741e-06,
258
- "loss": 0.5917,
259
- "step": 760
260
- },
261
- {
262
- "epoch": 2.13,
263
- "learning_rate": 7.72287716354776e-06,
264
- "loss": 0.5678,
265
- "step": 780
266
- },
267
- {
268
- "epoch": 2.18,
269
- "learning_rate": 6.8399942767839075e-06,
270
- "loss": 0.5837,
271
- "step": 800
272
- },
273
- {
274
- "epoch": 2.24,
275
- "learning_rate": 6.000193069026181e-06,
276
- "loss": 0.5701,
277
- "step": 820
278
- },
279
- {
280
- "epoch": 2.29,
281
- "learning_rate": 5.206222782700667e-06,
282
- "loss": 0.5467,
283
- "step": 840
284
- },
285
- {
286
- "epoch": 2.34,
287
- "learning_rate": 4.460682624352952e-06,
288
- "loss": 0.5695,
289
- "step": 860
290
- },
291
- {
292
- "epoch": 2.4,
293
- "learning_rate": 3.766013255671479e-06,
294
- "loss": 0.5557,
295
- "step": 880
296
- },
297
- {
298
- "epoch": 2.45,
299
- "learning_rate": 3.1244888035362875e-06,
300
- "loss": 0.5468,
301
- "step": 900
302
- },
303
- {
304
- "epoch": 2.51,
305
- "learning_rate": 2.5382094152499705e-06,
306
- "loss": 0.5793,
307
- "step": 920
308
- },
309
- {
310
- "epoch": 2.56,
311
- "learning_rate": 2.009094383322356e-06,
312
- "loss": 0.5462,
313
- "step": 940
314
- },
315
- {
316
- "epoch": 2.62,
317
- "learning_rate": 1.5388758623164802e-06,
318
- "loss": 0.5617,
319
- "step": 960
320
- },
321
- {
322
- "epoch": 2.67,
323
- "learning_rate": 1.1290931983246334e-06,
324
- "loss": 0.5574,
325
- "step": 980
326
- },
327
- {
328
- "epoch": 2.73,
329
- "learning_rate": 7.810878896382101e-07,
330
- "loss": 0.5632,
331
- "step": 1000
332
- },
333
- {
334
- "epoch": 2.78,
335
- "learning_rate": 4.959991951083498e-07,
336
- "loss": 0.57,
337
- "step": 1020
338
- },
339
- {
340
- "epoch": 2.84,
341
- "learning_rate": 2.747604045743102e-07,
342
- "loss": 0.5498,
343
- "step": 1040
344
- },
345
- {
346
- "epoch": 2.89,
347
- "learning_rate": 1.180957835689478e-07,
348
- "loss": 0.5369,
349
- "step": 1060
350
- },
351
- {
352
- "epoch": 2.94,
353
- "learning_rate": 2.651820230338942e-08,
354
- "loss": 0.5651,
355
- "step": 1080
356
  },
357
  {
358
- "epoch": 2.99,
359
- "eval_loss": 1.5518141984939575,
360
- "eval_runtime": 11.4288,
361
- "eval_samples_per_second": 26.25,
362
- "eval_steps_per_second": 26.25,
363
- "step": 1098
364
  },
365
  {
366
- "epoch": 2.99,
367
- "step": 1098,
368
- "total_flos": 6.035394717233971e+16,
369
- "train_loss": 0.9967951453231506,
370
- "train_runtime": 11703.2983,
371
- "train_samples_per_second": 3.761,
372
- "train_steps_per_second": 0.094
373
  }
374
  ],
375
- "logging_steps": 20,
376
- "max_steps": 1098,
377
  "num_input_tokens_seen": 0,
378
- "num_train_epochs": 3,
379
- "save_steps": 20,
380
- "total_flos": 6.035394717233971e+16,
381
- "train_batch_size": 1,
382
  "trial_name": null,
383
  "trial_params": null
384
  }
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 0.9891196834817013,
5
+ "eval_steps": 20,
6
+ "global_step": 40,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
+ "epoch": 0.02,
13
+ "learning_rate": 1.9969173337331283e-05,
14
+ "loss": 1.6723,
15
  "step": 1
16
  },
17
  {
18
+ "epoch": 0.12,
19
+ "learning_rate": 1.9238795325112867e-05,
20
+ "loss": 1.4829,
21
+ "step": 5
 
 
 
 
 
 
22
  },
23
  {
24
+ "epoch": 0.25,
25
+ "learning_rate": 1.7071067811865477e-05,
26
+ "loss": 1.3734,
27
+ "step": 10
28
  },
29
  {
30
+ "epoch": 0.37,
31
+ "learning_rate": 1.3826834323650899e-05,
32
+ "loss": 1.3486,
33
+ "step": 15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  },
35
  {
36
  "epoch": 0.49,
37
+ "learning_rate": 1e-05,
38
+ "loss": 1.3274,
39
+ "step": 20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
  },
41
  {
42
+ "epoch": 0.49,
43
+ "eval_loss": 1.258691668510437,
44
+ "eval_runtime": 2.1716,
45
+ "eval_samples_per_second": 4.605,
46
+ "eval_steps_per_second": 2.302,
47
+ "step": 20
48
  },
49
  {
50
+ "epoch": 0.62,
51
+ "learning_rate": 6.173165676349103e-06,
52
+ "loss": 1.2978,
53
+ "step": 25
54
  },
55
  {
56
+ "epoch": 0.74,
57
+ "learning_rate": 2.9289321881345257e-06,
58
+ "loss": 1.3259,
59
+ "step": 30
60
  },
61
  {
62
  "epoch": 0.87,
63
+ "learning_rate": 7.612046748871327e-07,
64
+ "loss": 1.2801,
65
+ "step": 35
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
  },
67
  {
68
+ "epoch": 0.99,
69
+ "learning_rate": 0.0,
70
+ "loss": 1.3066,
71
+ "step": 40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
  },
73
  {
74
+ "epoch": 0.99,
75
+ "eval_loss": 1.2477926015853882,
76
+ "eval_runtime": 2.1661,
77
+ "eval_samples_per_second": 4.617,
78
+ "eval_steps_per_second": 2.308,
79
+ "step": 40
80
  },
81
  {
82
+ "epoch": 0.99,
83
+ "step": 40,
84
+ "total_flos": 2.6717900760940544e+16,
85
+ "train_loss": 1.3475643575191498,
86
+ "train_runtime": 1594.7957,
87
+ "train_samples_per_second": 1.268,
88
+ "train_steps_per_second": 0.025
89
  }
90
  ],
91
+ "logging_steps": 5,
92
+ "max_steps": 40,
93
  "num_input_tokens_seen": 0,
94
+ "num_train_epochs": 1,
95
+ "save_steps": 50,
96
+ "total_flos": 2.6717900760940544e+16,
97
+ "train_batch_size": 2,
98
  "trial_name": null,
99
  "trial_params": null
100
  }
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5155e401b22d60f288ef4b9dbcc0137ad8db5e6b11882ee86900641b7be4d32a
3
  size 4728
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4de88ab3e4ac613739269a27c3e99895152741695b9ef3d3402002d4cdf97523
3
  size 4728