lengoctuong
commited on
Commit
•
40ac109
1
Parent(s):
9c9249b
Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,6 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
3 |
datasets:
|
4 |
- wikitext
|
5 |
language:
|
@@ -12,7 +13,18 @@ pipeline_tag: text-generation
|
|
12 |
tags:
|
13 |
- code
|
14 |
- text-generation-inference
|
|
|
|
|
|
|
|
|
15 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
# Model Card for Model ID
|
17 |
|
18 |
<!-- Provide a quick summary of what the model is/does. -->
|
@@ -103,6 +115,14 @@ Use the code below to get started with the model.
|
|
103 |
|
104 |
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
105 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
106 |
#### Speeds, Sizes, Times [optional]
|
107 |
|
108 |
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
@@ -135,7 +155,13 @@ Use the code below to get started with the model.
|
|
135 |
|
136 |
### Results
|
137 |
|
138 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
139 |
|
140 |
#### Summary
|
141 |
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
base_model: gpt2
|
4 |
datasets:
|
5 |
- wikitext
|
6 |
language:
|
|
|
13 |
tags:
|
14 |
- code
|
15 |
- text-generation-inference
|
16 |
+
- generated_from_trainer
|
17 |
+
model-index:
|
18 |
+
- name: gpt2-finetuned-wikitext2
|
19 |
+
results: []
|
20 |
---
|
21 |
+
|
22 |
+
# gpt2-finetuned-wikitext2
|
23 |
+
|
24 |
+
This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on the None dataset.
|
25 |
+
It achieves the following results on the evaluation set:
|
26 |
+
- Loss:
|
27 |
+
|
28 |
# Model Card for Model ID
|
29 |
|
30 |
<!-- Provide a quick summary of what the model is/does. -->
|
|
|
115 |
|
116 |
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
117 |
|
118 |
+
The following hyperparameters were used during training:
|
119 |
+
- learning_rate: 5e-04
|
120 |
+
- train_batch_size: 8
|
121 |
+
- eval_batch_size: 8
|
122 |
+
- optimizer: AdamW
|
123 |
+
- lr_scheduler_type: linear
|
124 |
+
- num_epochs: 2.0
|
125 |
+
-
|
126 |
#### Speeds, Sizes, Times [optional]
|
127 |
|
128 |
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
|
|
155 |
|
156 |
### Results
|
157 |
|
158 |
+
| Epoch | Step | Validation Loss |
|
159 |
+
|:-----:|:----:|:---------------:|
|
160 |
+
| 1.0 | 1000 | 3.6487 |
|
161 |
+
| 1.0 | 2000 | 3.6033 |
|
162 |
+
| 2.0 | 1000 | 3.6578 |
|
163 |
+
| 2.0 | 2000 | 3.6434 |
|
164 |
+
|
165 |
|
166 |
#### Summary
|
167 |
|