lengoctuong commited on
Commit
01463ac
1 Parent(s): 8955b1f

Update new-train

Browse files
README.md CHANGED
@@ -1,234 +1,59 @@
1
  ---
2
- license: mit
3
- base_model: gpt2
4
- datasets:
5
- - wikitext
6
- language:
7
- - en
8
- - vi
9
- metrics:
10
- - perplexity
11
- library_name: transformers
12
- pipeline_tag: text-generation
13
  tags:
14
- - code
15
- - text-generation-inference
16
  - generated_from_trainer
17
  model-index:
18
- - name: gpt2-finetuned-wikitext2
19
  results: []
20
  ---
21
 
22
- # gpt2-finetuned-wikitext2
 
23
 
24
- This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on the None dataset.
25
- It achieves the following results on the evaluation set:
26
- - Loss: {loss: 3.6248738765716553, perplexity: 37.519989013671875}
27
-
28
- # Model Card for Model ID
29
-
30
- <!-- Provide a quick summary of what the model is/does. -->
31
-
32
- This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).
33
-
34
- ## Model Details
35
-
36
- ### Model Description
37
-
38
- <!-- Provide a longer summary of what this model is. -->
39
-
40
-
41
-
42
- - **Developed by:** [More Information Needed]
43
- - **Shared by [optional]:** [More Information Needed]
44
- - **Model type:** [More Information Needed]
45
- - **Language(s) (NLP):** [More Information Needed]
46
- - **License:** [More Information Needed]
47
- - **Finetuned from model [optional]:** [More Information Needed]
48
-
49
- ### Model Sources [optional]
50
-
51
- <!-- Provide the basic links for the model. -->
52
-
53
- - **Repository:** [More Information Needed]
54
- - **Paper [optional]:** [More Information Needed]
55
- - **Demo [optional]:** [More Information Needed]
56
-
57
- ## Uses
58
-
59
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
60
-
61
- ### Direct Use
62
-
63
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
64
-
65
- [More Information Needed]
66
-
67
- ### Downstream Use [optional]
68
-
69
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
70
-
71
- [More Information Needed]
72
-
73
- ### Out-of-Scope Use
74
 
75
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
76
-
77
- [More Information Needed]
78
-
79
- ## Bias, Risks, and Limitations
80
-
81
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
82
-
83
- [More Information Needed]
84
-
85
- ### Recommendations
86
-
87
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
88
-
89
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
90
-
91
- ## How to Get Started with the Model
92
-
93
- Use the code below to get started with the model.
94
-
95
- [More Information Needed]
96
-
97
- ## Training Details
98
-
99
- ### Training Data
100
-
101
- <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
102
 
103
- [More Information Needed]
104
 
105
- ### Training Procedure
106
 
107
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
108
 
109
- #### Preprocessing [optional]
110
 
111
- [More Information Needed]
112
 
 
113
 
114
- #### Training Hyperparameters
115
 
116
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
117
 
118
  The following hyperparameters were used during training:
119
- - learning_rate: 5e-04
120
- - train_batch_size: 8
121
- - eval_batch_size: 8
122
- - optimizer: AdamW
123
- - lr_scheduler_type: linear
124
- - num_epochs: 2.0
125
-
126
- #### Speeds, Sizes, Times [optional]
127
-
128
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
129
-
130
- [More Information Needed]
131
-
132
- ## Evaluation
133
-
134
- <!-- This section describes the evaluation protocols and provides the results. -->
135
-
136
- ### Testing Data, Factors & Metrics
137
-
138
- #### Testing Data
139
-
140
- <!-- This should link to a Data Card if possible. -->
141
-
142
- [More Information Needed]
143
-
144
- #### Factors
145
-
146
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
147
-
148
- [More Information Needed]
149
-
150
- #### Metrics
151
-
152
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
153
-
154
- [More Information Needed]
155
-
156
- ### Results
157
-
158
- | Epoch | Step | Validation Loss |
159
- |:-----:|:----:|:---------------:|
160
- | 1.0 | 1000 | 3.6487 |
161
- | 1.0 | 2000 | 3.6033 |
162
- | 2.0 | 1000 | 3.6578 |
163
- | 2.0 | 2000 | 3.6434 |
164
-
165
-
166
- #### Summary
167
-
168
-
169
-
170
- ## Model Examination [optional]
171
-
172
- <!-- Relevant interpretability work for the model goes here -->
173
-
174
- [More Information Needed]
175
-
176
- ## Environmental Impact
177
-
178
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
179
-
180
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
181
-
182
- - **Hardware Type:** [More Information Needed]
183
- - **Hours used:** [More Information Needed]
184
- - **Cloud Provider:** [More Information Needed]
185
- - **Compute Region:** [More Information Needed]
186
- - **Carbon Emitted:** [More Information Needed]
187
-
188
- ## Technical Specifications [optional]
189
-
190
- ### Model Architecture and Objective
191
-
192
- [More Information Needed]
193
-
194
- ### Compute Infrastructure
195
-
196
- [More Information Needed]
197
-
198
- #### Hardware
199
-
200
- [More Information Needed]
201
-
202
- #### Software
203
-
204
- [More Information Needed]
205
-
206
- ## Citation [optional]
207
-
208
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
209
-
210
- **BibTeX:**
211
-
212
- [More Information Needed]
213
-
214
- **APA:**
215
-
216
- [More Information Needed]
217
-
218
- ## Glossary [optional]
219
-
220
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
221
-
222
- [More Information Needed]
223
-
224
- ## More Information [optional]
225
-
226
- [More Information Needed]
227
-
228
- ## Model Card Authors [optional]
229
-
230
- [More Information Needed]
231
-
232
- ## Model Card Contact
233
-
234
- [More Information Needed]
 
1
  ---
 
 
 
 
 
 
 
 
 
 
 
2
  tags:
 
 
3
  - generated_from_trainer
4
  model-index:
5
+ - name: gpt2-finetuned-wikitext2-2-finetuned-wikitext2-2
6
  results: []
7
  ---
8
 
9
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
10
+ should probably proofread and complete it, then remove this comment. -->
11
 
12
+ # gpt2-finetuned-wikitext2-2-finetuned-wikitext2-2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
+ This model was trained from scratch on the None dataset.
15
+ It achieves the following results on the evaluation set:
16
+ - Loss: 2.7010
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
+ ## Model description
19
 
20
+ More information needed
21
 
22
+ ## Intended uses & limitations
23
 
24
+ More information needed
25
 
26
+ ## Training and evaluation data
27
 
28
+ More information needed
29
 
30
+ ## Training procedure
31
 
32
+ ### Training hyperparameters
33
 
34
  The following hyperparameters were used during training:
35
+ - learning_rate: 2e-05
36
+ - train_batch_size: 32
37
+ - eval_batch_size: 32
38
+ - seed: 42
39
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
40
+ - lr_scheduler_type: cosine
41
+ - lr_scheduler_warmup_steps: 100
42
+ - num_epochs: 2
43
+
44
+ ### Training results
45
+
46
+ | Training Loss | Epoch | Step | Validation Loss |
47
+ |:-------------:|:-----:|:----:|:---------------:|
48
+ | 2.9705 | 0.43 | 250 | 3.5828 |
49
+ | 2.921 | 0.86 | 500 | 3.5685 |
50
+ | 2.877 | 1.28 | 750 | 3.5727 |
51
+ | 2.8577 | 1.71 | 1000 | 3.5707 |
52
+
53
+
54
+ ### Framework versions
55
+
56
+ - Transformers 4.32.0
57
+ - Pytorch 2.0.1+cu118
58
+ - Datasets 2.14.4
59
+ - Tokenizers 0.13.3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:36f4f1420484adc8acd05811996154e4d12e2c182864d118a79ce49abef219ad
3
  size 497807197
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8baf578a08127f3c23d2ae604c1461f3f20df4aa709ef794b49d0fac6b624f05
3
  size 497807197
special_tokens_map.json CHANGED
@@ -1,5 +1,6 @@
1
  {
2
  "bos_token": "<|endoftext|>",
3
  "eos_token": "<|endoftext|>",
 
4
  "unk_token": "<|endoftext|>"
5
  }
 
1
  {
2
  "bos_token": "<|endoftext|>",
3
  "eos_token": "<|endoftext|>",
4
+ "pad_token": "<|endoftext|>",
5
  "unk_token": "<|endoftext|>"
6
  }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:67abf1b2952c59fd49bf7b1a9b06f5ce14b121c095f15eeb25f8e59d7fcaab5f
3
+ size 4091
training_loss.json DELETED
@@ -1,159 +0,0 @@
1
- {
2
- "Epoch1.0": {
3
- "Print1.0": {
4
- "lr": 0.000245,
5
- "samples": 1600,
6
- "steps": 49,
7
- "loss/train": 54179.7109375
8
- },
9
-
10
- "Print2.0": {
11
- "lr": 0.000495,
12
- "samples": 3200,
13
- "steps": 99,
14
- "loss/train": 59412.3828125
15
- },
16
-
17
- "Print3.0": {
18
- "lr": 0.0004946366024518389,
19
- "samples": 4800,
20
- "steps": 149,
21
- "loss/train": 57198.5625
22
- },
23
-
24
- "Print4.0": {
25
- "lr": 0.0004891637478108582,
26
- "samples": 6400,
27
- "steps": 199,
28
- "loss/train": 60152.0234375
29
- },
30
-
31
- "Print5.0": {
32
- "lr": 0.0004836908931698774,
33
- "samples": 8000,
34
- "steps": 249,
35
- "loss/train": 56401.99609375
36
- },
37
-
38
- "Print6.0": {
39
- "lr": 0.0004782180385288967,
40
- "samples": 9600,
41
- "steps": 299,
42
- "loss/train": 53559.71484375
43
- },
44
-
45
- "Print7.0": {
46
- "lr": 0.0004727451838879159,
47
- "samples": 11200,
48
- "steps": 349,
49
- "loss/train": 58140.875
50
- },
51
-
52
- "Print8.0": {
53
- "lr": 0.0004672723292469352,
54
- "samples": 12800,
55
- "steps": 399,
56
- "loss/train": 58182.828125
57
- },
58
-
59
- "Print9.0": {
60
- "lr": 0.00046179947460595446,
61
- "samples": 14400,
62
- "steps": 449,
63
- "loss/train": 60645.5
64
- },
65
-
66
- "Print10.0": {
67
- "lr": 0.0004563266199649738,
68
- "samples": 16000,
69
- "steps": 499,
70
- "loss/train": 59840.78515625
71
- },
72
-
73
- "Print11.0": {
74
- "lr": 0.000450853765323993,
75
- "samples": 17600,
76
- "steps": 549,
77
- "loss/train": 60670.56640625
78
- }
79
- },
80
-
81
- "Epoch2.0": {
82
- "Print1.0": {
83
- "lr": 0.00044176882661996496,
84
- "samples": 1600,
85
- "steps": 632,
86
- "loss/train": 47929.0
87
- },
88
-
89
- "Print2.0": {
90
- "lr": 0.00043629597197898423,
91
- "samples": 3200,
92
- "steps": 682,
93
- "loss/train": 43820.296875
94
- },
95
-
96
- "Print3.0": {
97
- "lr": 0.0004308231173380035,
98
- "samples": 4800,
99
- "steps": 732,
100
- "loss/train": 49377.87890625
101
- },
102
-
103
- "Print4.0": {
104
- "lr": 0.0004253502626970228,
105
- "samples": 6400,
106
- "steps": 782,
107
- "loss/train": 51437.4375
108
- },
109
-
110
- "Print5.0": {
111
- "lr": 0.000419877408056042,
112
- "samples": 8000,
113
- "steps": 832,
114
- "loss/train": 53303.359375
115
- },
116
-
117
- "Print6.0": {
118
- "lr": 0.0004144045534150613,
119
- "samples": 9600,
120
- "steps": 882,
121
- "loss/train": 48758.8515625
122
- },
123
-
124
- "Print7.0": {
125
- "lr": 0.0004089316987740806,
126
- "samples": 11200,
127
- "steps": 932,
128
- "loss/train": 46764.76953125
129
- },
130
-
131
- "Print8.0": {
132
- "lr": 0.00040345884413309983,
133
- "samples": 12800,
134
- "steps": 982,
135
- "loss/train": 47659.34765625
136
- },
137
-
138
- "Print9.0": {
139
- "lr": 0.0003979859894921191,
140
- "samples": 14400,
141
- "steps": 1032,
142
- "loss/train": 45718.71484375
143
- },
144
-
145
- "Print10.0": {
146
- "lr": 0.0003925131348511384,
147
- "samples": 16000,
148
- "steps": 1082,
149
- "loss/train": 48865.2578125
150
- },
151
-
152
- "Print11.0": {
153
- "lr": 0.00038704028021015766,
154
- "samples": 17600,
155
- "steps": 1132,
156
- "loss/train": 47677.7109375
157
- }
158
- }
159
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
valid_loss.csv DELETED
@@ -1,5 +0,0 @@
1
- epoch, step, loss/eval, perplexity
2
- 1.0, 1000, 3.648764133453369, 38.427146911621094
3
- 1.0, 2000, 3.6033315658569336, 36.720367431640625
4
- 2.0, 1000, 3.6578149795532227, 38.77652359008789
5
- 2.0, 2000, 3.6434502601623535, 38.22349166870117