End of training

Browse files

Files changed (12) hide show

README.md +39 -36
adapter_config.json +4 -4
adapter_model.bin +1 -1
adapter_model.safetensors +1 -1
last-checkpoint/adapter_config.json +4 -4
last-checkpoint/adapter_model.safetensors +1 -1
last-checkpoint/optimizer.pt +1 -1
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +1 -1
last-checkpoint/trainer_state.json +622 -202
last-checkpoint/training_args.bin +2 -2
training_args.bin +2 -2

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ tags:
 - axolotl
 - generated_from_trainer
 model-index:
-- name: 3d9c9a86-41a1-463a-a26d-1a82887c0da8
   results: []
 ---
@@ -18,6 +18,12 @@ should probably proofread and complete it, then remove this comment. -->
 axolotl version: `0.4.1`
 ```yaml
 adapter: lora
 base_model: unsloth/SmolLM2-360M
 bf16: auto
@@ -40,40 +46,37 @@ datasets:
 debug: null
 deepspeed: null
 device_map: auto
-do_eval: true
 early_stopping_patience: null
-eval_batch_size: 2
 eval_max_new_tokens: 128
-eval_steps: null
 eval_table_size: null
 evals_per_epoch: 4
-flash_attention: true
-fp16: false
 fsdp: null
 fsdp_config: null
-gradient_accumulation_steps: 4
-gradient_checkpointing: false
-group_by_length: true
 hub_model_id: null
 hub_repo: null
 hub_strategy: checkpoint
 hub_token: null
 learning_rate: 0.0001
-load_in_4bit: false
-load_in_8bit: false
 local_rank: null
-logging_steps: 5
 lora_alpha: 16
 lora_dropout: 0.05
 lora_fan_in_fan_out: null
 lora_model_dir: null
 lora_r: 8
 lora_target_linear: true
 lr_scheduler: cosine
-max_grad_norm: 1.0
 max_memory:
-  0: 75GB
-max_steps: 200
 micro_batch_size: 2
 mlflow_experiment_name: /tmp/a87830592f0aef9a_train_data.json
 model_type: AutoModelForCausalLM
@@ -81,24 +84,27 @@ num_epochs: 1
 optimizer: adamw_bnb_8bit
 output_dir: miner_id_24
 pad_to_sequence_len: true
 resume_from_checkpoint: null
 s2_attention: null
 sample_packing: false
-save_steps: null
-saves_per_epoch: null
-sequence_len: 1024
 strict: false
 tf32: false
 tokenizer_type: AutoTokenizer
 train_on_inputs: false
 trust_remote_code: true
 val_set_size: 0.05
-wandb_entity: sn56-miner
-wandb_mode: disabled
-wandb_name: sn56a5/d1f354f0
-wandb_project: god
-wandb_run: 8g06
-wandb_runid: sn56a5/d1f354f0
 warmup_steps: 10
 weight_decay: 0.0
 xformers_attention: null
@@ -107,11 +113,11 @@ xformers_attention: null
 </details><br>
-# 3d9c9a86-41a1-463a-a26d-1a82887c0da8
 This model is a fine-tuned version of [unsloth/SmolLM2-360M](https://huggingface.co/unsloth/SmolLM2-360M) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.2026
 ## Model description
@@ -134,25 +140,22 @@ The following hyperparameters were used during training:
 - train_batch_size: 2
 - eval_batch_size: 2
 - seed: 42
-- distributed_type: multi-GPU
-- num_devices: 4
-- gradient_accumulation_steps: 4
 - total_train_batch_size: 32
-- total_eval_batch_size: 8
 - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
-- training_steps: 200
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| No log        | 0.0003 | 1    | 0.5289          |
-| 0.3662        | 0.0167 | 50   | 0.3411          |
-| 0.2267        | 0.0335 | 100  | 0.2339          |
-| 0.1912        | 0.0502 | 150  | 0.2059          |
-| 0.1909        | 0.0669 | 200  | 0.2026          |
 ### Framework versions

 - axolotl
 - generated_from_trainer
 model-index:
+- name: 20b2db71-bebe-45cf-98f2-4bbd5debff43
   results: []
 ---
 axolotl version: `0.4.1`
 ```yaml
+accelerate_config:
+  dynamo_backend: inductor
+  mixed_precision: bf16
+  num_machines: 1
+  num_processes: auto
+  use_cpu: false
 adapter: lora
 base_model: unsloth/SmolLM2-360M
 bf16: auto
 debug: null
 deepspeed: null
 device_map: auto
 early_stopping_patience: null
 eval_max_new_tokens: 128
 eval_table_size: null
 evals_per_epoch: 4
+flash_attention: false
+fp16: null
 fsdp: null
 fsdp_config: null
+gradient_accumulation_steps: 16
+gradient_checkpointing: true
+group_by_length: false
 hub_model_id: null
 hub_repo: null
 hub_strategy: checkpoint
 hub_token: null
 learning_rate: 0.0001
 local_rank: null
+logging_steps: 1
 lora_alpha: 16
 lora_dropout: 0.05
 lora_fan_in_fan_out: null
 lora_model_dir: null
 lora_r: 8
 lora_target_linear: true
+lora_target_modules:
+- q_proj
+- v_proj
 lr_scheduler: cosine
 max_memory:
+  0: 70GiB
+max_steps: 100
 micro_batch_size: 2
 mlflow_experiment_name: /tmp/a87830592f0aef9a_train_data.json
 model_type: AutoModelForCausalLM
 optimizer: adamw_bnb_8bit
 output_dir: miner_id_24
 pad_to_sequence_len: true
+quantization_config:
+  llm_int8_enable_fp32_cpu_offload: true
+  load_in_8bit: true
 resume_from_checkpoint: null
 s2_attention: null
 sample_packing: false
+saves_per_epoch: 4
+sequence_len: 512
 strict: false
 tf32: false
 tokenizer_type: AutoTokenizer
+torch_compile: true
 train_on_inputs: false
 trust_remote_code: true
 val_set_size: 0.05
+wandb_entity: null
+wandb_mode: online
+wandb_name: 20b2db71-bebe-45cf-98f2-4bbd5debff43
+wandb_project: Gradients-On-Demand
+wandb_run: your_name
+wandb_runid: 20b2db71-bebe-45cf-98f2-4bbd5debff43
 warmup_steps: 10
 weight_decay: 0.0
 xformers_attention: null
 </details><br>
+# 20b2db71-bebe-45cf-98f2-4bbd5debff43
 This model is a fine-tuned version of [unsloth/SmolLM2-360M](https://huggingface.co/unsloth/SmolLM2-360M) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: nan
 ## Model description
 - train_batch_size: 2
 - eval_batch_size: 2
 - seed: 42
+- gradient_accumulation_steps: 16
 - total_train_batch_size: 32
 - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
+- training_steps: 100
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| 32.0444       | 0.0003 | 1    | nan             |
+| 0.0           | 0.0084 | 25   | nan             |
+| 0.0           | 0.0167 | 50   | nan             |
+| 0.0           | 0.0251 | 75   | nan             |
+| 0.0           | 0.0335 | 100  | nan             |
 ### Framework versions

adapter_config.json CHANGED Viewed

@@ -20,13 +20,13 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "k_proj",
     "q_proj",
-    "v_proj",
     "down_proj",
-    "gate_proj",
     "o_proj",
-    "up_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
+    "gate_proj",
     "q_proj",
+    "up_proj",
     "down_proj",
     "o_proj",
+    "k_proj",
+    "v_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

adapter_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c8e0fb66e7a88caec0a08aeb69dbec5cc8d7ee535d28fe3f36e8b869cabd6384
 size 17528138

 version https://git-lfs.github.com/spec/v1
+oid sha256:5f1a04cf82e6d853f68c54944a28d8e79cc282e58f6aed24deccf8d305b4f627
 size 17528138

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:33301e5b49f7775f53a83a53756b9192184d0948a701ff258a1b6201ff69e090
 size 17425352

 version https://git-lfs.github.com/spec/v1
+oid sha256:3ef249b0f207339e330f5bb95b2f74751d4ce2da3dcb80ce68a26897590a0919
 size 17425352

last-checkpoint/adapter_config.json CHANGED Viewed

@@ -20,13 +20,13 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "k_proj",
     "q_proj",
-    "v_proj",
     "down_proj",
-    "gate_proj",
     "o_proj",
-    "up_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
+    "gate_proj",
     "q_proj",
+    "up_proj",
     "down_proj",
     "o_proj",
+    "k_proj",
+    "v_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

last-checkpoint/adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:33301e5b49f7775f53a83a53756b9192184d0948a701ff258a1b6201ff69e090
 size 17425352

 version https://git-lfs.github.com/spec/v1
+oid sha256:3ef249b0f207339e330f5bb95b2f74751d4ce2da3dcb80ce68a26897590a0919
 size 17425352

last-checkpoint/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9df1216f40cd7740e0fc5dd3b6cd38bcbe6759bb03764ede2f4d27879faf0598
 size 10251668

 version https://git-lfs.github.com/spec/v1
+oid sha256:a2877331f933cd0d3eda6c6f20f3203f32ec583e75c93d59882ed85052b3507b
 size 10251668

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f4c6893fc4a9ed236abb30b22cb769913941a10e57b959801e5127eb964077c8
+size 14244

last-checkpoint/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d2d754412c61116546142914503e7369d0cc35d3c380a07e5218f595d76b6d96
 size 1064

 version https://git-lfs.github.com/spec/v1
+oid sha256:49d60a69e2379be2053e816cbaff31e6c931b5922dd86c71c9eaf473299cbf62
 size 1064

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -1,339 +1,759 @@
 {
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 0.06691201070592172,
-  "eval_steps": 50,
-  "global_step": 200,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
   "log_history": [
     {
       "epoch": 0.00033456005352960856,
-      "eval_loss": 0.528853178024292,
-      "eval_runtime": 39.5233,
-      "eval_samples_per_second": 127.393,
-      "eval_steps_per_second": 15.94,
       "step": 1
     },
     {
       "epoch": 0.0016728002676480427,
-      "grad_norm": 0.0893344134092331,
       "learning_rate": 5e-05,
-      "loss": 0.5191,
       "step": 5
     },
     {
       "epoch": 0.0033456005352960855,
-      "grad_norm": 0.12134993076324463,
       "learning_rate": 0.0001,
-      "loss": 0.584,
       "step": 10
     },
     {
       "epoch": 0.005018400802944129,
-      "grad_norm": 0.08858965337276459,
-      "learning_rate": 9.98292246503335e-05,
-      "loss": 0.5242,
       "step": 15
     },
     {
       "epoch": 0.006691201070592171,
-      "grad_norm": 0.09778374433517456,
-      "learning_rate": 9.931806517013612e-05,
-      "loss": 0.5202,
       "step": 20
     },
     {
       "epoch": 0.008364001338240215,
-      "grad_norm": 0.10649651288986206,
-      "learning_rate": 9.847001329696653e-05,
-      "loss": 0.4911,
       "step": 25
     },
     {
       "epoch": 0.010036801605888258,
-      "grad_norm": 0.06970233470201492,
-      "learning_rate": 9.729086208503174e-05,
-      "loss": 0.3985,
       "step": 30
     },
     {
       "epoch": 0.0117096018735363,
-      "grad_norm": 0.0955878272652626,
-      "learning_rate": 9.578866633275288e-05,
-      "loss": 0.43,
       "step": 35
     },
     {
       "epoch": 0.013382402141184342,
-      "grad_norm": 0.060985565185546875,
-      "learning_rate": 9.397368756032445e-05,
-      "loss": 0.4198,
       "step": 40
     },
     {
       "epoch": 0.015055202408832385,
-      "grad_norm": 0.0769026130437851,
-      "learning_rate": 9.185832391312644e-05,
-      "loss": 0.4067,
       "step": 45
     },
     {
       "epoch": 0.01672800267648043,
-      "grad_norm": 0.1008349359035492,
-      "learning_rate": 8.945702546981969e-05,
-      "loss": 0.3662,
       "step": 50
     },
     {
       "epoch": 0.01672800267648043,
-      "eval_loss": 0.34106191992759705,
-      "eval_runtime": 39.432,
-      "eval_samples_per_second": 127.688,
-      "eval_steps_per_second": 15.977,
       "step": 50
     },
     {
       "epoch": 0.01840080294412847,
-      "grad_norm": 0.07255814969539642,
-      "learning_rate": 8.678619553365659e-05,
-      "loss": 0.3496,
       "step": 55
     },
     {
       "epoch": 0.020073603211776515,
-      "grad_norm": 0.10015455633401871,
-      "learning_rate": 8.386407858128706e-05,
-      "loss": 0.328,
       "step": 60
     },
     {
       "epoch": 0.021746403479424557,
-      "grad_norm": 0.07114718109369278,
-      "learning_rate": 8.07106356344834e-05,
-      "loss": 0.3169,
       "step": 65
     },
     {
       "epoch": 0.0234192037470726,
-      "grad_norm": 0.07913073152303696,
-      "learning_rate": 7.734740790612136e-05,
-      "loss": 0.3134,
       "step": 70
     },
     {
-      "epoch": 0.025092004014720642,
-      "grad_norm": 0.1750458925962448,
-      "learning_rate": 7.379736965185368e-05,
-      "loss": 0.2852,
-      "step": 75
     },
     {
-      "epoch": 0.026764804282368684,
-      "grad_norm": 0.061932601034641266,
-      "learning_rate": 7.008477123264848e-05,
-      "loss": 0.2737,
-      "step": 80
     },
     {
-      "epoch": 0.02843760455001673,
-      "grad_norm": 0.06881576776504517,
-      "learning_rate": 6.623497346023418e-05,
-      "loss": 0.2502,
-      "step": 85
     },
     {
-      "epoch": 0.03011040481766477,
-      "grad_norm": 0.062322817742824554,
-      "learning_rate": 6.227427435703997e-05,
-      "loss": 0.2659,
-      "step": 90
     },
     {
-      "epoch": 0.031783205085312814,
-      "grad_norm": 0.06230627000331879,
-      "learning_rate": 5.8229729514036705e-05,
-      "loss": 0.2645,
-      "step": 95
     },
     {
-      "epoch": 0.03345600535296086,
-      "grad_norm": 0.11283061653375626,
-      "learning_rate": 5.4128967273616625e-05,
-      "loss": 0.2267,
-      "step": 100
     },
     {
-      "epoch": 0.03345600535296086,
-      "eval_loss": 0.23386509716510773,
-      "eval_runtime": 39.348,
-      "eval_samples_per_second": 127.961,
-      "eval_steps_per_second": 16.011,
-      "step": 100
     },
     {
-      "epoch": 0.0351288056206089,
-      "grad_norm": 0.0672779530286789,
-      "learning_rate": 5e-05,
-      "loss": 0.2497,
-      "step": 105
     },
     {
-      "epoch": 0.03680160588825694,
-      "grad_norm": 0.06097684055566788,
-      "learning_rate": 4.5871032726383386e-05,
-      "loss": 0.2516,
-      "step": 110
     },
     {
-      "epoch": 0.038474406155904986,
-      "grad_norm": 0.06827884912490845,
-      "learning_rate": 4.17702704859633e-05,
-      "loss": 0.2279,
-      "step": 115
     },
     {
-      "epoch": 0.04014720642355303,
-      "grad_norm": 0.0606272853910923,
-      "learning_rate": 3.772572564296005e-05,
-      "loss": 0.2541,
-      "step": 120
     },
     {
-      "epoch": 0.04182000669120107,
-      "grad_norm": 0.14288191497325897,
-      "learning_rate": 3.3765026539765834e-05,
-      "loss": 0.1806,
-      "step": 125
     },
     {
-      "epoch": 0.04349280695884911,
-      "grad_norm": 0.052643969655036926,
-      "learning_rate": 2.991522876735154e-05,
-      "loss": 0.246,
-      "step": 130
     },
     {
-      "epoch": 0.04516560722649716,
-      "grad_norm": 0.06944818049669266,
-      "learning_rate": 2.6202630348146324e-05,
-      "loss": 0.2419,
-      "step": 135
     },
     {
-      "epoch": 0.0468384074941452,
-      "grad_norm": 0.06182454898953438,
-      "learning_rate": 2.2652592093878666e-05,
-      "loss": 0.2159,
-      "step": 140
     },
     {
-      "epoch": 0.04851120776179324,
-      "grad_norm": 0.06247089058160782,
-      "learning_rate": 1.928936436551661e-05,
-      "loss": 0.2275,
-      "step": 145
     },
     {
-      "epoch": 0.050184008029441285,
-      "grad_norm": 0.1229231059551239,
-      "learning_rate": 1.6135921418712956e-05,
-      "loss": 0.1912,
-      "step": 150
     },
     {
-      "epoch": 0.050184008029441285,
-      "eval_loss": 0.2059282809495926,
-      "eval_runtime": 39.4366,
-      "eval_samples_per_second": 127.673,
-      "eval_steps_per_second": 15.975,
-      "step": 150
     },
     {
-      "epoch": 0.05185680829708933,
-      "grad_norm": 0.06170298531651497,
-      "learning_rate": 1.3213804466343421e-05,
-      "loss": 0.2107,
-      "step": 155
     },
     {
-      "epoch": 0.05352960856473737,
-      "grad_norm": 0.06112481653690338,
-      "learning_rate": 1.0542974530180327e-05,
-      "loss": 0.2021,
-      "step": 160
     },
     {
-      "epoch": 0.05520240883238541,
-      "grad_norm": 0.060423221439123154,
-      "learning_rate": 8.141676086873572e-06,
-      "loss": 0.2191,
-      "step": 165
     },
     {
-      "epoch": 0.05687520910003346,
-      "grad_norm": 0.05647290125489235,
-      "learning_rate": 6.026312439675552e-06,
-      "loss": 0.2134,
-      "step": 170
     },
     {
-      "epoch": 0.0585480093676815,
-      "grad_norm": 0.10841598361730576,
-      "learning_rate": 4.2113336672471245e-06,
-      "loss": 0.212,
-      "step": 175
     },
     {
-      "epoch": 0.06022080963532954,
-      "grad_norm": 0.05532608553767204,
-      "learning_rate": 2.7091379149682685e-06,
-      "loss": 0.2154,
-      "step": 180
     },
     {
-      "epoch": 0.061893609902977584,
-      "grad_norm": 0.06235940009355545,
-      "learning_rate": 1.5299867030334814e-06,
-      "loss": 0.1993,
-      "step": 185
     },
     {
-      "epoch": 0.06356641017062563,
-      "grad_norm": 0.06464989483356476,
-      "learning_rate": 6.819348298638839e-07,
-      "loss": 0.1926,
-      "step": 190
     },
     {
-      "epoch": 0.06523921043827367,
-      "grad_norm": 0.06490304321050644,
-      "learning_rate": 1.7077534966650766e-07,
-      "loss": 0.2289,
-      "step": 195
     },
     {
-      "epoch": 0.06691201070592172,
-      "grad_norm": 0.10941293835639954,
       "learning_rate": 0.0,
-      "loss": 0.1909,
-      "step": 200
     },
     {
-      "epoch": 0.06691201070592172,
-      "eval_loss": 0.20260320603847504,
-      "eval_runtime": 39.2566,
-      "eval_samples_per_second": 128.259,
-      "eval_steps_per_second": 16.048,
-      "step": 200
     }
   ],
-  "logging_steps": 5,
-  "max_steps": 200,
   "num_input_tokens_seen": 0,
   "num_train_epochs": 1,
-  "save_steps": 500,
   "stateful_callbacks": {
     "TrainerControl": {
       "args": {
@@ -346,7 +766,7 @@
       "attributes": {}
     }
   },
-  "total_flos": 1.670528771293184e+16,
   "train_batch_size": 2,
   "trial_name": null,
   "trial_params": null

 {
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 0.03345600535296086,
+  "eval_steps": 25,
+  "global_step": 100,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
   "log_history": [
     {
       "epoch": 0.00033456005352960856,
+      "grad_norm": NaN,
+      "learning_rate": 1e-05,
+      "loss": 32.0444,
       "step": 1
     },
+    {
+      "epoch": 0.00033456005352960856,
+      "eval_loss": NaN,
+      "eval_runtime": 225.0524,
+      "eval_samples_per_second": 22.373,
+      "eval_steps_per_second": 11.189,
+      "step": 1
+    },
+    {
+      "epoch": 0.0006691201070592171,
+      "grad_norm": NaN,
+      "learning_rate": 2e-05,
+      "loss": 0.0,
+      "step": 2
+    },
+    {
+      "epoch": 0.0010036801605888257,
+      "grad_norm": NaN,
+      "learning_rate": 3e-05,
+      "loss": 0.0,
+      "step": 3
+    },
+    {
+      "epoch": 0.0013382402141184342,
+      "grad_norm": NaN,
+      "learning_rate": 4e-05,
+      "loss": 0.0,
+      "step": 4
+    },
     {
       "epoch": 0.0016728002676480427,
+      "grad_norm": NaN,
       "learning_rate": 5e-05,
+      "loss": 0.0,
       "step": 5
     },
+    {
+      "epoch": 0.0020073603211776514,
+      "grad_norm": NaN,
+      "learning_rate": 6e-05,
+      "loss": 0.0,
+      "step": 6
+    },
+    {
+      "epoch": 0.00234192037470726,
+      "grad_norm": NaN,
+      "learning_rate": 7e-05,
+      "loss": 0.0,
+      "step": 7
+    },
+    {
+      "epoch": 0.0026764804282368685,
+      "grad_norm": NaN,
+      "learning_rate": 8e-05,
+      "loss": 0.0,
+      "step": 8
+    },
+    {
+      "epoch": 0.003011040481766477,
+      "grad_norm": NaN,
+      "learning_rate": 9e-05,
+      "loss": 0.0,
+      "step": 9
+    },
     {
       "epoch": 0.0033456005352960855,
+      "grad_norm": NaN,
       "learning_rate": 0.0001,
+      "loss": 0.0,
       "step": 10
     },
+    {
+      "epoch": 0.0036801605888256944,
+      "grad_norm": NaN,
+      "learning_rate": 9.99695413509548e-05,
+      "loss": 0.0,
+      "step": 11
+    },
+    {
+      "epoch": 0.004014720642355303,
+      "grad_norm": NaN,
+      "learning_rate": 9.987820251299122e-05,
+      "loss": 0.0,
+      "step": 12
+    },
+    {
+      "epoch": 0.004349280695884911,
+      "grad_norm": NaN,
+      "learning_rate": 9.972609476841367e-05,
+      "loss": 0.0,
+      "step": 13
+    },
+    {
+      "epoch": 0.00468384074941452,
+      "grad_norm": NaN,
+      "learning_rate": 9.951340343707852e-05,
+      "loss": 0.0,
+      "step": 14
+    },
     {
       "epoch": 0.005018400802944129,
+      "grad_norm": NaN,
+      "learning_rate": 9.924038765061042e-05,
+      "loss": 0.0,
       "step": 15
     },
+    {
+      "epoch": 0.005352960856473737,
+      "grad_norm": NaN,
+      "learning_rate": 9.890738003669029e-05,
+      "loss": 0.0,
+      "step": 16
+    },
+    {
+      "epoch": 0.005687520910003346,
+      "grad_norm": NaN,
+      "learning_rate": 9.851478631379982e-05,
+      "loss": 0.0,
+      "step": 17
+    },
+    {
+      "epoch": 0.006022080963532954,
+      "grad_norm": NaN,
+      "learning_rate": 9.806308479691595e-05,
+      "loss": 0.0,
+      "step": 18
+    },
+    {
+      "epoch": 0.006356641017062563,
+      "grad_norm": NaN,
+      "learning_rate": 9.755282581475769e-05,
+      "loss": 0.0,
+      "step": 19
+    },
     {
       "epoch": 0.006691201070592171,
+      "grad_norm": NaN,
+      "learning_rate": 9.698463103929542e-05,
+      "loss": 0.0,
       "step": 20
     },
+    {
+      "epoch": 0.00702576112412178,
+      "grad_norm": NaN,
+      "learning_rate": 9.635919272833938e-05,
+      "loss": 0.0,
+      "step": 21
+    },
+    {
+      "epoch": 0.007360321177651389,
+      "grad_norm": NaN,
+      "learning_rate": 9.567727288213005e-05,
+      "loss": 0.0,
+      "step": 22
+    },
+    {
+      "epoch": 0.007694881231180997,
+      "grad_norm": NaN,
+      "learning_rate": 9.493970231495835e-05,
+      "loss": 0.0,
+      "step": 23
+    },
+    {
+      "epoch": 0.008029441284710606,
+      "grad_norm": NaN,
+      "learning_rate": 9.414737964294636e-05,
+      "loss": 0.0,
+      "step": 24
+    },
+    {
+      "epoch": 0.008364001338240215,
+      "grad_norm": NaN,
+      "learning_rate": 9.330127018922194e-05,
+      "loss": 0.0,
+      "step": 25
+    },
     {
       "epoch": 0.008364001338240215,
+      "eval_loss": NaN,
+      "eval_runtime": 127.579,
+      "eval_samples_per_second": 39.466,
+      "eval_steps_per_second": 19.737,
       "step": 25
     },
+    {
+      "epoch": 0.008698561391769822,
+      "grad_norm": NaN,
+      "learning_rate": 9.24024048078213e-05,
+      "loss": 0.0,
+      "step": 26
+    },
+    {
+      "epoch": 0.009033121445299431,
+      "grad_norm": NaN,
+      "learning_rate": 9.145187862775209e-05,
+      "loss": 0.0,
+      "step": 27
+    },
+    {
+      "epoch": 0.00936768149882904,
+      "grad_norm": NaN,
+      "learning_rate": 9.045084971874738e-05,
+      "loss": 0.0,
+      "step": 28
+    },
+    {
+      "epoch": 0.009702241552358649,
+      "grad_norm": NaN,
+      "learning_rate": 8.940053768033609e-05,
+      "loss": 0.0,
+      "step": 29
+    },
     {
       "epoch": 0.010036801605888258,
+      "grad_norm": NaN,
+      "learning_rate": 8.83022221559489e-05,
+      "loss": 0.0,
       "step": 30
     },
+    {
+      "epoch": 0.010371361659417865,
+      "grad_norm": NaN,
+      "learning_rate": 8.715724127386972e-05,
+      "loss": 0.0,
+      "step": 31
+    },
+    {
+      "epoch": 0.010705921712947474,
+      "grad_norm": NaN,
+      "learning_rate": 8.596699001693255e-05,
+      "loss": 0.0,
+      "step": 32
+    },
+    {
+      "epoch": 0.011040481766477083,
+      "grad_norm": NaN,
+      "learning_rate": 8.473291852294987e-05,
+      "loss": 0.0,
+      "step": 33
+    },
+    {
+      "epoch": 0.011375041820006692,
+      "grad_norm": NaN,
+      "learning_rate": 8.345653031794292e-05,
+      "loss": 0.0,
+      "step": 34
+    },
     {
       "epoch": 0.0117096018735363,
+      "grad_norm": NaN,
+      "learning_rate": 8.213938048432697e-05,
+      "loss": 0.0,
       "step": 35
     },
+    {
+      "epoch": 0.012044161927065908,
+      "grad_norm": NaN,
+      "learning_rate": 8.07830737662829e-05,
+      "loss": 0.0,
+      "step": 36
+    },
+    {
+      "epoch": 0.012378721980595517,
+      "grad_norm": NaN,
+      "learning_rate": 7.938926261462366e-05,
+      "loss": 0.0,
+      "step": 37
+    },
+    {
+      "epoch": 0.012713282034125126,
+      "grad_norm": NaN,
+      "learning_rate": 7.795964517353735e-05,
+      "loss": 0.0,
+      "step": 38
+    },
+    {
+      "epoch": 0.013047842087654735,
+      "grad_norm": NaN,
+      "learning_rate": 7.649596321166024e-05,
+      "loss": 0.0,
+      "step": 39
+    },
     {
       "epoch": 0.013382402141184342,
+      "grad_norm": NaN,
+      "learning_rate": 7.500000000000001e-05,
+      "loss": 0.0,
       "step": 40
     },
+    {
+      "epoch": 0.01371696219471395,
+      "grad_norm": NaN,
+      "learning_rate": 7.347357813929454e-05,
+      "loss": 0.0,
+      "step": 41
+    },
+    {
+      "epoch": 0.01405152224824356,
+      "grad_norm": NaN,
+      "learning_rate": 7.191855733945387e-05,
+      "loss": 0.0,
+      "step": 42
+    },
+    {
+      "epoch": 0.014386082301773169,
+      "grad_norm": NaN,
+      "learning_rate": 7.033683215379002e-05,
+      "loss": 0.0,
+      "step": 43
+    },
+    {
+      "epoch": 0.014720642355302778,
+      "grad_norm": NaN,
+      "learning_rate": 6.873032967079561e-05,
+      "loss": 0.0,
+      "step": 44
+    },
     {
       "epoch": 0.015055202408832385,
+      "grad_norm": NaN,
+      "learning_rate": 6.710100716628344e-05,
+      "loss": 0.0,
       "step": 45
     },
+    {
+      "epoch": 0.015389762462361994,
+      "grad_norm": NaN,
+      "learning_rate": 6.545084971874738e-05,
+      "loss": 0.0,
+      "step": 46
+    },
+    {
+      "epoch": 0.015724322515891603,
+      "grad_norm": NaN,
+      "learning_rate": 6.378186779084995e-05,
+      "loss": 0.0,
+      "step": 47
+    },
+    {
+      "epoch": 0.01605888256942121,
+      "grad_norm": NaN,
+      "learning_rate": 6.209609477998338e-05,
+      "loss": 0.0,
+      "step": 48
+    },
+    {
+      "epoch": 0.01639344262295082,
+      "grad_norm": NaN,
+      "learning_rate": 6.0395584540887963e-05,
+      "loss": 0.0,
+      "step": 49
+    },
     {
       "epoch": 0.01672800267648043,
+      "grad_norm": NaN,
+      "learning_rate": 5.868240888334653e-05,
+      "loss": 0.0,
       "step": 50
     },
     {
       "epoch": 0.01672800267648043,
+      "eval_loss": NaN,
+      "eval_runtime": 65.6977,
+      "eval_samples_per_second": 76.639,
+      "eval_steps_per_second": 38.327,
       "step": 50
     },
+    {
+      "epoch": 0.01706256273001004,
+      "grad_norm": NaN,
+      "learning_rate": 5.695865504800327e-05,
+      "loss": 0.0,
+      "step": 51
+    },
+    {
+      "epoch": 0.017397122783539644,
+      "grad_norm": NaN,
+      "learning_rate": 5.522642316338268e-05,
+      "loss": 0.0,
+      "step": 52
+    },
+    {
+      "epoch": 0.017731682837069253,
+      "grad_norm": NaN,
+      "learning_rate": 5.348782368720626e-05,
+      "loss": 0.0,
+      "step": 53
+    },
+    {
+      "epoch": 0.018066242890598862,
+      "grad_norm": NaN,
+      "learning_rate": 5.174497483512506e-05,
+      "loss": 0.0,
+      "step": 54
+    },
     {
       "epoch": 0.01840080294412847,
+      "grad_norm": NaN,
+      "learning_rate": 5e-05,
+      "loss": 0.0,
       "step": 55
     },
+    {
+      "epoch": 0.01873536299765808,
+      "grad_norm": NaN,
+      "learning_rate": 4.825502516487497e-05,
+      "loss": 0.0,
+      "step": 56
+    },
+    {
+      "epoch": 0.01906992305118769,
+      "grad_norm": NaN,
+      "learning_rate": 4.6512176312793736e-05,
+      "loss": 0.0,
+      "step": 57
+    },
+    {
+      "epoch": 0.019404483104717297,
+      "grad_norm": NaN,
+      "learning_rate": 4.477357683661734e-05,
+      "loss": 0.0,
+      "step": 58
+    },
+    {
+      "epoch": 0.019739043158246906,
+      "grad_norm": NaN,
+      "learning_rate": 4.3041344951996746e-05,
+      "loss": 0.0,
+      "step": 59
+    },
     {
       "epoch": 0.020073603211776515,
+      "grad_norm": NaN,
+      "learning_rate": 4.131759111665349e-05,
+      "loss": 0.0,
       "step": 60
     },
+    {
+      "epoch": 0.02040816326530612,
+      "grad_norm": NaN,
+      "learning_rate": 3.960441545911204e-05,
+      "loss": 0.0,
+      "step": 61
+    },
+    {
+      "epoch": 0.02074272331883573,
+      "grad_norm": NaN,
+      "learning_rate": 3.790390522001662e-05,
+      "loss": 0.0,
+      "step": 62
+    },
+    {
+      "epoch": 0.02107728337236534,
+      "grad_norm": NaN,
+      "learning_rate": 3.6218132209150045e-05,
+      "loss": 0.0,
+      "step": 63
+    },
+    {
+      "epoch": 0.021411843425894948,
+      "grad_norm": NaN,
+      "learning_rate": 3.4549150281252636e-05,
+      "loss": 0.0,
+      "step": 64
+    },
     {
       "epoch": 0.021746403479424557,
+      "grad_norm": NaN,
+      "learning_rate": 3.289899283371657e-05,
+      "loss": 0.0,
       "step": 65
     },
+    {
+      "epoch": 0.022080963532954166,
+      "grad_norm": NaN,
+      "learning_rate": 3.12696703292044e-05,
+      "loss": 0.0,
+      "step": 66
+    },
+    {
+      "epoch": 0.022415523586483774,
+      "grad_norm": NaN,
+      "learning_rate": 2.9663167846209998e-05,
+      "loss": 0.0,
+      "step": 67
+    },
+    {
+      "epoch": 0.022750083640013383,
+      "grad_norm": NaN,
+      "learning_rate": 2.8081442660546125e-05,
+      "loss": 0.0,
+      "step": 68
+    },
+    {
+      "epoch": 0.023084643693542992,
+      "grad_norm": NaN,
+      "learning_rate": 2.6526421860705473e-05,
+      "loss": 0.0,
+      "step": 69
+    },
     {
       "epoch": 0.0234192037470726,
+      "grad_norm": NaN,
+      "learning_rate": 2.500000000000001e-05,
+      "loss": 0.0,
       "step": 70
     },
     {
+      "epoch": 0.023753763800602207,
+      "grad_norm": NaN,
+      "learning_rate": 2.350403678833976e-05,
+      "loss": 0.0,
+      "step": 71
     },
     {
+      "epoch": 0.024088323854131816,
+      "grad_norm": NaN,
+      "learning_rate": 2.2040354826462668e-05,
+      "loss": 0.0,
+      "step": 72
     },
     {
+      "epoch": 0.024422883907661425,
+      "grad_norm": NaN,
+      "learning_rate": 2.061073738537635e-05,
+      "loss": 0.0,
+      "step": 73
     },
     {
+      "epoch": 0.024757443961191034,
+      "grad_norm": NaN,
+      "learning_rate": 1.9216926233717085e-05,
+      "loss": 0.0,
+      "step": 74
     },
     {
+      "epoch": 0.025092004014720642,
+      "grad_norm": NaN,
+      "learning_rate": 1.7860619515673033e-05,
+      "loss": 0.0,
+      "step": 75
     },
     {
+      "epoch": 0.025092004014720642,
+      "eval_loss": NaN,
+      "eval_runtime": 74.8837,
+      "eval_samples_per_second": 67.238,
+      "eval_steps_per_second": 33.625,
+      "step": 75
     },
     {
+      "epoch": 0.02542656406825025,
+      "grad_norm": NaN,
+      "learning_rate": 1.6543469682057106e-05,
+      "loss": 0.0,
+      "step": 76
     },
     {
+      "epoch": 0.02576112412177986,
+      "grad_norm": NaN,
+      "learning_rate": 1.526708147705013e-05,
+      "loss": 0.0,
+      "step": 77
     },
     {
+      "epoch": 0.02609568417530947,
+      "grad_norm": NaN,
+      "learning_rate": 1.4033009983067452e-05,
+      "loss": 0.0,
+      "step": 78
     },
     {
+      "epoch": 0.026430244228839078,
+      "grad_norm": NaN,
+      "learning_rate": 1.2842758726130283e-05,
+      "loss": 0.0,
+      "step": 79
     },
     {
+      "epoch": 0.026764804282368684,
+      "grad_norm": NaN,
+      "learning_rate": 1.1697777844051105e-05,
+      "loss": 0.0,
+      "step": 80
     },
     {
+      "epoch": 0.027099364335898293,
+      "grad_norm": NaN,
+      "learning_rate": 1.0599462319663905e-05,
+      "loss": 0.0,
+      "step": 81
     },
     {
+      "epoch": 0.0274339243894279,
+      "grad_norm": NaN,
+      "learning_rate": 9.549150281252633e-06,
+      "loss": 0.0,
+      "step": 82
     },
     {
+      "epoch": 0.02776848444295751,
+      "grad_norm": NaN,
+      "learning_rate": 8.548121372247918e-06,
+      "loss": 0.0,
+      "step": 83
     },
     {
+      "epoch": 0.02810304449648712,
+      "grad_norm": NaN,
+      "learning_rate": 7.597595192178702e-06,
+      "loss": 0.0,
+      "step": 84
     },
     {
+      "epoch": 0.02843760455001673,
+      "grad_norm": NaN,
+      "learning_rate": 6.698729810778065e-06,
+      "loss": 0.0,
+      "step": 85
     },
     {
+      "epoch": 0.028772164603546337,
+      "grad_norm": NaN,
+      "learning_rate": 5.852620357053651e-06,
+      "loss": 0.0,
+      "step": 86
     },
     {
+      "epoch": 0.029106724657075946,
+      "grad_norm": NaN,
+      "learning_rate": 5.060297685041659e-06,
+      "loss": 0.0,
+      "step": 87
     },
     {
+      "epoch": 0.029441284710605555,
+      "grad_norm": NaN,
+      "learning_rate": 4.322727117869951e-06,
+      "loss": 0.0,
+      "step": 88
     },
     {
+      "epoch": 0.02977584476413516,
+      "grad_norm": NaN,
+      "learning_rate": 3.6408072716606346e-06,
+      "loss": 0.0,
+      "step": 89
     },
     {
+      "epoch": 0.03011040481766477,
+      "grad_norm": NaN,
+      "learning_rate": 3.0153689607045845e-06,
+      "loss": 0.0,
+      "step": 90
     },
     {
+      "epoch": 0.03044496487119438,
+      "grad_norm": NaN,
+      "learning_rate": 2.4471741852423237e-06,
+      "loss": 0.0,
+      "step": 91
     },
     {
+      "epoch": 0.030779524924723987,
+      "grad_norm": NaN,
+      "learning_rate": 1.9369152030840556e-06,
+      "loss": 0.0,
+      "step": 92
     },
     {
+      "epoch": 0.031114084978253596,
+      "grad_norm": NaN,
+      "learning_rate": 1.4852136862001764e-06,
+      "loss": 0.0,
+      "step": 93
     },
     {
+      "epoch": 0.031448645031783205,
+      "grad_norm": NaN,
+      "learning_rate": 1.0926199633097157e-06,
+      "loss": 0.0,
+      "step": 94
     },
     {
+      "epoch": 0.031783205085312814,
+      "grad_norm": NaN,
+      "learning_rate": 7.596123493895991e-07,
+      "loss": 0.0,
+      "step": 95
     },
     {
+      "epoch": 0.03211776513884242,
+      "grad_norm": NaN,
+      "learning_rate": 4.865965629214819e-07,
+      "loss": 0.0,
+      "step": 96
     },
     {
+      "epoch": 0.03245232519237203,
+      "grad_norm": NaN,
+      "learning_rate": 2.7390523158633554e-07,
+      "loss": 0.0,
+      "step": 97
+    },
+    {
+      "epoch": 0.03278688524590164,
+      "grad_norm": NaN,
+      "learning_rate": 1.2179748700879012e-07,
+      "loss": 0.0,
+      "step": 98
+    },
+    {
+      "epoch": 0.03312144529943125,
+      "grad_norm": NaN,
+      "learning_rate": 3.04586490452119e-08,
+      "loss": 0.0,
+      "step": 99
+    },
+    {
+      "epoch": 0.03345600535296086,
+      "grad_norm": NaN,
       "learning_rate": 0.0,
+      "loss": 0.0,
+      "step": 100
     },
     {
+      "epoch": 0.03345600535296086,
+      "eval_loss": NaN,
+      "eval_runtime": 83.2987,
+      "eval_samples_per_second": 60.445,
+      "eval_steps_per_second": 30.229,
+      "step": 100
     }
   ],
+  "logging_steps": 1,
+  "max_steps": 100,
   "num_input_tokens_seen": 0,
   "num_train_epochs": 1,
+  "save_steps": 25,
   "stateful_callbacks": {
     "TrainerControl": {
       "args": {
       "attributes": {}
     }
   },
+  "total_flos": 6271342215168000.0,
   "train_batch_size": 2,
   "trial_name": null,
   "trial_params": null

last-checkpoint/training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0662807773092334a74bd9571af7ccfea1da3a5b81276092de9ac7b5492fa0e4
-size 6712

 version https://git-lfs.github.com/spec/v1
+oid sha256:239f06c62ee4317bc3f67ccabbea4f161d802c558557cbb5c5e70285a5b6026c
+size 6776

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0662807773092334a74bd9571af7ccfea1da3a5b81276092de9ac7b5492fa0e4
-size 6712

 version https://git-lfs.github.com/spec/v1
+oid sha256:239f06c62ee4317bc3f67ccabbea4f161d802c558557cbb5c5e70285a5b6026c
+size 6776