End of training

Browse files

Files changed (9) hide show

README.md +28 -48
added_tokens.json +0 -0
config.json +7 -7
generation_config.json +0 -1
model.safetensors +2 -2
runs/Feb05_01-07-24_Lees-MacBook-Pro.local/events.out.tfevents.1738742844.Lees-MacBook-Pro.local +3 -0
tokenizer.json +2 -2
tokenizer_config.json +2 -2
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -9,12 +9,11 @@ model-index:
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/thalesian-university-of-new-mexico-press/huggingface/runs/xzm8h2sw)
 # train_2
 This model was trained from scratch on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.0677
 ## Model description
@@ -33,61 +32,42 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 5e-07
-- train_batch_size: 8
-- eval_batch_size: 8
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 10000
-- num_epochs: 1000
 ### Training results
 | Training Loss | Epoch | Step  | Validation Loss |
 |:-------------:|:-----:|:-----:|:---------------:|
-| 3.0178        | 1.0   | 1762  | 30.5008         |
-| 2.6192        | 2.0   | 3524  | 23.6284         |
-| 2.0989        | 3.0   | 5286  | 15.6537         |
-| 1.1218        | 4.0   | 7048  | 8.6861          |
-| 0.7012        | 5.0   | 8810  | 4.8304          |
-| 0.5128        | 6.0   | 10572 | 2.8713          |
-| 0.458         | 7.0   | 12334 | 2.2613          |
-| 0.4257        | 8.0   | 14096 | 1.9845          |
-| 0.3936        | 9.0   | 15858 | 1.7669          |
-| 0.3828        | 10.0  | 17620 | 1.6619          |
-| 0.3696        | 11.0  | 19382 | 1.6021          |
-| 0.3523        | 12.0  | 21144 | 1.5797          |
-| 0.3577        | 13.0  | 22906 | 1.5279          |
-| 0.3586        | 14.0  | 24668 | 1.4815          |
-| 0.3536        | 15.0  | 26430 | 1.4177          |
-| 0.3421        | 16.0  | 28192 | 1.3723          |
-| 0.3426        | 17.0  | 29954 | 1.3208          |
-| 0.3316        | 18.0  | 31716 | 1.2861          |
-| 0.3281        | 19.0  | 33478 | 1.2838          |
-| 0.3242        | 20.0  | 35240 | 1.2195          |
-| 0.3205        | 21.0  | 37002 | 1.2078          |
-| 0.3269        | 22.0  | 38764 | 1.1876          |
-| 0.3225        | 23.0  | 40526 | 1.1729          |
-| 0.3261        | 24.0  | 42288 | 1.1359          |
-| 0.3141        | 25.0  | 44050 | 1.1536          |
-| 0.3112        | 26.0  | 45812 | 1.1326          |
-| 0.3144        | 27.0  | 47574 | 1.1352          |
-| 0.3036        | 28.0  | 49336 | 1.1404          |
-| 0.3024        | 29.0  | 51098 | 1.0948          |
-| 0.2996        | 30.0  | 52860 | 1.1103          |
-| 0.3043        | 31.0  | 54622 | 1.0854          |
-| 0.2985        | 32.0  | 56384 | 1.0956          |
-| 0.3056        | 33.0  | 58146 | 1.1020          |
-| 0.2919        | 34.0  | 59908 | 1.1080          |
-| 0.2993        | 35.0  | 61670 | 1.0691          |
-| 0.2981        | 36.0  | 63432 | 1.0809          |
-| 0.2942        | 37.0  | 65194 | 1.0515          |
-| 0.2911        | 38.0  | 66956 | 1.0637          |
-| 0.296         | 39.0  | 68718 | 1.0840          |
-| 0.2945        | 40.0  | 70480 | 1.0684          |
-| 0.3031        | 41.0  | 72242 | 1.0645          |
-| 0.2903        | 42.0  | 74004 | 1.0677          |
 ### Framework versions

 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 # train_2
 This model was trained from scratch on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.1944
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 5e-06
+- train_batch_size: 36
+- eval_batch_size: 36
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 10000
+- num_epochs: 200
 ### Training results
 | Training Loss | Epoch | Step  | Validation Loss |
 |:-------------:|:-----:|:-----:|:---------------:|
+| 1.7988        | 1.0   | 905   | 7.8026          |
+| 1.0102        | 2.0   | 1810  | 2.3353          |
+| 0.3427        | 3.0   | 2715  | 1.2595          |
+| 0.2379        | 4.0   | 3620  | 0.8825          |
+| 0.189         | 5.0   | 4525  | 0.6379          |
+| 0.1718        | 6.0   | 5430  | 0.4706          |
+| 0.1389        | 7.0   | 6335  | 0.3872          |
+| 0.1151        | 8.0   | 7240  | 0.3151          |
+| 0.0996        | 9.0   | 8145  | 0.2532          |
+| 0.089         | 10.0  | 9050  | 0.2301          |
+| 0.0825        | 11.0  | 9955  | 0.2173          |
+| 0.0775        | 12.0  | 10860 | 0.2077          |
+| 0.0735        | 13.0  | 11765 | 0.2059          |
+| 0.0704        | 14.0  | 12670 | 0.1962          |
+| 0.0689        | 15.0  | 13575 | 0.1961          |
+| 0.0677        | 16.0  | 14480 | 0.1960          |
+| 0.0661        | 17.0  | 15385 | 0.1944          |
+| 0.066         | 18.0  | 16290 | 0.1926          |
+| 0.0643        | 19.0  | 17195 | 0.1929          |
+| 0.0639        | 20.0  | 18100 | 0.1916          |
+| 0.062         | 21.0  | 19005 | 0.1904          |
+| 0.0617        | 22.0  | 19910 | 0.1932          |
+| 0.0611        | 23.0  | 20815 | 0.1944          |
 ### Framework versions

added_tokens.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

config.json CHANGED Viewed

@@ -1,12 +1,12 @@
 {
-  "_name_or_path": "/Users/lee/GitHub/results/HIT-T5small/train_1/checkpoint-24288",
   "architectures": [
     "T5ForConditionalGeneration"
   ],
   "classifier_dropout": 0.0,
-  "d_ff": 2048,
   "d_kv": 64,
-  "d_model": 512,
   "decoder_start_token_id": 0,
   "dense_act_fn": "relu",
   "dropout_rate": 0.1,
@@ -18,9 +18,9 @@
   "layer_norm_epsilon": 1e-06,
   "model_type": "t5",
   "n_positions": 512,
-  "num_decoder_layers": 6,
-  "num_heads": 8,
-  "num_layers": 6,
   "output_past": true,
   "pad_token_id": 0,
   "relative_attention_max_distance": 128,
@@ -57,5 +57,5 @@
   "torch_dtype": "float32",
   "transformers_version": "4.44.0.dev0",
   "use_cache": true,
-  "vocab_size": 47354
 }

 {
+  "_name_or_path": "/Users/lee/GitHub/results/GMY-T5_trans/train_1/checkpoint-12768",
   "architectures": [
     "T5ForConditionalGeneration"
   ],
   "classifier_dropout": 0.0,
+  "d_ff": 3072,
   "d_kv": 64,
+  "d_model": 768,
   "decoder_start_token_id": 0,
   "dense_act_fn": "relu",
   "dropout_rate": 0.1,
   "layer_norm_epsilon": 1e-06,
   "model_type": "t5",
   "n_positions": 512,
+  "num_decoder_layers": 12,
+  "num_heads": 12,
+  "num_layers": 12,
   "output_past": true,
   "pad_token_id": 0,
   "relative_attention_max_distance": 128,
   "torch_dtype": "float32",
   "transformers_version": "4.44.0.dev0",
   "use_cache": true,
+  "vocab_size": 33280
 }

generation_config.json CHANGED Viewed

@@ -1,5 +1,4 @@
 {
-  "_from_model_config": true,
   "decoder_start_token_id": 0,
   "eos_token_id": 1,
   "pad_token_id": 0,

 {
   "decoder_start_token_id": 0,
   "eos_token_id": 1,
   "pad_token_id": 0,

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f2e04878b9df543342773e9e27c73fc8eca92d05f069fe6a5e32fe8d4bae4186
-size 273224744

 version https://git-lfs.github.com/spec/v1
+oid sha256:d45bcd946c68b05b1f5d3aaba6c251d1989f92d9ea24506f269332ab990f581d
+size 895183656

runs/Feb05_01-07-24_Lees-MacBook-Pro.local/events.out.tfevents.1738742844.Lees-MacBook-Pro.local ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8b2e61b3f20aecd298dc722753db7988d63ec7356b4326501b4da105ed453675
+size 21353

tokenizer.json CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:120446cfffabdcc38eb797df240520b60bca71a4ead69df14190a9c36876c3ad
-size 5252571

 version https://git-lfs.github.com/spec/v1
+oid sha256:a1281df2937736c4b7e63c9141a8827fdbf4c528ce7322301a1b4a607d94de24
+size 2637564

tokenizer_config.json CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:40d69a1bc5d85ec36b9ce046efbf437dcadf8f9bb9fbba366977d0899bed3f19
-size 2698627

 version https://git-lfs.github.com/spec/v1
+oid sha256:e48d592b6409e0b1b35dad738d975eb2c845af0d84ff1b2f7d46556d3bc4a61a
+size 224388

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:861d38d03b38c9f10d2f7a6ffc8c417fa995a3f5b7e52a4faf367bb62b80dd92
 size 5432

 version https://git-lfs.github.com/spec/v1
+oid sha256:d28f9a5811731e3fc7aff1d03408237fdccbd6ffee4f0bfba0d69430379fd3a1
 size 5432