Thalesian commited on
Commit
f4ab870
·
verified ·
1 Parent(s): f454b9b

End of training

Browse files
README.md CHANGED
@@ -9,12 +9,11 @@ model-index:
9
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
10
  should probably proofread and complete it, then remove this comment. -->
11
 
12
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/thalesian-university-of-new-mexico-press/huggingface/runs/xzm8h2sw)
13
  # train_2
14
 
15
  This model was trained from scratch on the None dataset.
16
  It achieves the following results on the evaluation set:
17
- - Loss: 1.0677
18
 
19
  ## Model description
20
 
@@ -33,61 +32,42 @@ More information needed
33
  ### Training hyperparameters
34
 
35
  The following hyperparameters were used during training:
36
- - learning_rate: 5e-07
37
- - train_batch_size: 8
38
- - eval_batch_size: 8
39
  - seed: 42
40
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
41
  - lr_scheduler_type: linear
42
  - lr_scheduler_warmup_steps: 10000
43
- - num_epochs: 1000
44
 
45
  ### Training results
46
 
47
  | Training Loss | Epoch | Step | Validation Loss |
48
  |:-------------:|:-----:|:-----:|:---------------:|
49
- | 3.0178 | 1.0 | 1762 | 30.5008 |
50
- | 2.6192 | 2.0 | 3524 | 23.6284 |
51
- | 2.0989 | 3.0 | 5286 | 15.6537 |
52
- | 1.1218 | 4.0 | 7048 | 8.6861 |
53
- | 0.7012 | 5.0 | 8810 | 4.8304 |
54
- | 0.5128 | 6.0 | 10572 | 2.8713 |
55
- | 0.458 | 7.0 | 12334 | 2.2613 |
56
- | 0.4257 | 8.0 | 14096 | 1.9845 |
57
- | 0.3936 | 9.0 | 15858 | 1.7669 |
58
- | 0.3828 | 10.0 | 17620 | 1.6619 |
59
- | 0.3696 | 11.0 | 19382 | 1.6021 |
60
- | 0.3523 | 12.0 | 21144 | 1.5797 |
61
- | 0.3577 | 13.0 | 22906 | 1.5279 |
62
- | 0.3586 | 14.0 | 24668 | 1.4815 |
63
- | 0.3536 | 15.0 | 26430 | 1.4177 |
64
- | 0.3421 | 16.0 | 28192 | 1.3723 |
65
- | 0.3426 | 17.0 | 29954 | 1.3208 |
66
- | 0.3316 | 18.0 | 31716 | 1.2861 |
67
- | 0.3281 | 19.0 | 33478 | 1.2838 |
68
- | 0.3242 | 20.0 | 35240 | 1.2195 |
69
- | 0.3205 | 21.0 | 37002 | 1.2078 |
70
- | 0.3269 | 22.0 | 38764 | 1.1876 |
71
- | 0.3225 | 23.0 | 40526 | 1.1729 |
72
- | 0.3261 | 24.0 | 42288 | 1.1359 |
73
- | 0.3141 | 25.0 | 44050 | 1.1536 |
74
- | 0.3112 | 26.0 | 45812 | 1.1326 |
75
- | 0.3144 | 27.0 | 47574 | 1.1352 |
76
- | 0.3036 | 28.0 | 49336 | 1.1404 |
77
- | 0.3024 | 29.0 | 51098 | 1.0948 |
78
- | 0.2996 | 30.0 | 52860 | 1.1103 |
79
- | 0.3043 | 31.0 | 54622 | 1.0854 |
80
- | 0.2985 | 32.0 | 56384 | 1.0956 |
81
- | 0.3056 | 33.0 | 58146 | 1.1020 |
82
- | 0.2919 | 34.0 | 59908 | 1.1080 |
83
- | 0.2993 | 35.0 | 61670 | 1.0691 |
84
- | 0.2981 | 36.0 | 63432 | 1.0809 |
85
- | 0.2942 | 37.0 | 65194 | 1.0515 |
86
- | 0.2911 | 38.0 | 66956 | 1.0637 |
87
- | 0.296 | 39.0 | 68718 | 1.0840 |
88
- | 0.2945 | 40.0 | 70480 | 1.0684 |
89
- | 0.3031 | 41.0 | 72242 | 1.0645 |
90
- | 0.2903 | 42.0 | 74004 | 1.0677 |
91
 
92
 
93
  ### Framework versions
 
9
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
10
  should probably proofread and complete it, then remove this comment. -->
11
 
 
12
  # train_2
13
 
14
  This model was trained from scratch on the None dataset.
15
  It achieves the following results on the evaluation set:
16
+ - Loss: 0.1944
17
 
18
  ## Model description
19
 
 
32
  ### Training hyperparameters
33
 
34
  The following hyperparameters were used during training:
35
+ - learning_rate: 5e-06
36
+ - train_batch_size: 36
37
+ - eval_batch_size: 36
38
  - seed: 42
39
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
40
  - lr_scheduler_type: linear
41
  - lr_scheduler_warmup_steps: 10000
42
+ - num_epochs: 200
43
 
44
  ### Training results
45
 
46
  | Training Loss | Epoch | Step | Validation Loss |
47
  |:-------------:|:-----:|:-----:|:---------------:|
48
+ | 1.7988 | 1.0 | 905 | 7.8026 |
49
+ | 1.0102 | 2.0 | 1810 | 2.3353 |
50
+ | 0.3427 | 3.0 | 2715 | 1.2595 |
51
+ | 0.2379 | 4.0 | 3620 | 0.8825 |
52
+ | 0.189 | 5.0 | 4525 | 0.6379 |
53
+ | 0.1718 | 6.0 | 5430 | 0.4706 |
54
+ | 0.1389 | 7.0 | 6335 | 0.3872 |
55
+ | 0.1151 | 8.0 | 7240 | 0.3151 |
56
+ | 0.0996 | 9.0 | 8145 | 0.2532 |
57
+ | 0.089 | 10.0 | 9050 | 0.2301 |
58
+ | 0.0825 | 11.0 | 9955 | 0.2173 |
59
+ | 0.0775 | 12.0 | 10860 | 0.2077 |
60
+ | 0.0735 | 13.0 | 11765 | 0.2059 |
61
+ | 0.0704 | 14.0 | 12670 | 0.1962 |
62
+ | 0.0689 | 15.0 | 13575 | 0.1961 |
63
+ | 0.0677 | 16.0 | 14480 | 0.1960 |
64
+ | 0.0661 | 17.0 | 15385 | 0.1944 |
65
+ | 0.066 | 18.0 | 16290 | 0.1926 |
66
+ | 0.0643 | 19.0 | 17195 | 0.1929 |
67
+ | 0.0639 | 20.0 | 18100 | 0.1916 |
68
+ | 0.062 | 21.0 | 19005 | 0.1904 |
69
+ | 0.0617 | 22.0 | 19910 | 0.1932 |
70
+ | 0.0611 | 23.0 | 20815 | 0.1944 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
 
73
  ### Framework versions
added_tokens.json CHANGED
The diff for this file is too large to render. See raw diff
 
config.json CHANGED
@@ -1,12 +1,12 @@
1
  {
2
- "_name_or_path": "/Users/lee/GitHub/results/HIT-T5small/train_1/checkpoint-24288",
3
  "architectures": [
4
  "T5ForConditionalGeneration"
5
  ],
6
  "classifier_dropout": 0.0,
7
- "d_ff": 2048,
8
  "d_kv": 64,
9
- "d_model": 512,
10
  "decoder_start_token_id": 0,
11
  "dense_act_fn": "relu",
12
  "dropout_rate": 0.1,
@@ -18,9 +18,9 @@
18
  "layer_norm_epsilon": 1e-06,
19
  "model_type": "t5",
20
  "n_positions": 512,
21
- "num_decoder_layers": 6,
22
- "num_heads": 8,
23
- "num_layers": 6,
24
  "output_past": true,
25
  "pad_token_id": 0,
26
  "relative_attention_max_distance": 128,
@@ -57,5 +57,5 @@
57
  "torch_dtype": "float32",
58
  "transformers_version": "4.44.0.dev0",
59
  "use_cache": true,
60
- "vocab_size": 47354
61
  }
 
1
  {
2
+ "_name_or_path": "/Users/lee/GitHub/results/GMY-T5_trans/train_1/checkpoint-12768",
3
  "architectures": [
4
  "T5ForConditionalGeneration"
5
  ],
6
  "classifier_dropout": 0.0,
7
+ "d_ff": 3072,
8
  "d_kv": 64,
9
+ "d_model": 768,
10
  "decoder_start_token_id": 0,
11
  "dense_act_fn": "relu",
12
  "dropout_rate": 0.1,
 
18
  "layer_norm_epsilon": 1e-06,
19
  "model_type": "t5",
20
  "n_positions": 512,
21
+ "num_decoder_layers": 12,
22
+ "num_heads": 12,
23
+ "num_layers": 12,
24
  "output_past": true,
25
  "pad_token_id": 0,
26
  "relative_attention_max_distance": 128,
 
57
  "torch_dtype": "float32",
58
  "transformers_version": "4.44.0.dev0",
59
  "use_cache": true,
60
+ "vocab_size": 33280
61
  }
generation_config.json CHANGED
@@ -1,5 +1,4 @@
1
  {
2
- "_from_model_config": true,
3
  "decoder_start_token_id": 0,
4
  "eos_token_id": 1,
5
  "pad_token_id": 0,
 
1
  {
 
2
  "decoder_start_token_id": 0,
3
  "eos_token_id": 1,
4
  "pad_token_id": 0,
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f2e04878b9df543342773e9e27c73fc8eca92d05f069fe6a5e32fe8d4bae4186
3
- size 273224744
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d45bcd946c68b05b1f5d3aaba6c251d1989f92d9ea24506f269332ab990f581d
3
+ size 895183656
runs/Feb05_01-07-24_Lees-MacBook-Pro.local/events.out.tfevents.1738742844.Lees-MacBook-Pro.local ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8b2e61b3f20aecd298dc722753db7988d63ec7356b4326501b4da105ed453675
3
+ size 21353
tokenizer.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:120446cfffabdcc38eb797df240520b60bca71a4ead69df14190a9c36876c3ad
3
- size 5252571
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a1281df2937736c4b7e63c9141a8827fdbf4c528ce7322301a1b4a607d94de24
3
+ size 2637564
tokenizer_config.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:40d69a1bc5d85ec36b9ce046efbf437dcadf8f9bb9fbba366977d0899bed3f19
3
- size 2698627
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e48d592b6409e0b1b35dad738d975eb2c845af0d84ff1b2f7d46556d3bc4a61a
3
+ size 224388
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:861d38d03b38c9f10d2f7a6ffc8c417fa995a3f5b7e52a4faf367bb62b80dd92
3
  size 5432
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d28f9a5811731e3fc7aff1d03408237fdccbd6ffee4f0bfba0d69430379fd3a1
3
  size 5432