Update README.md
Browse files
README.md
CHANGED
@@ -2,7 +2,7 @@
|
|
2 |
language: Chinese
|
3 |
datasets: CLUECorpusSmall
|
4 |
widget:
|
5 |
-
- text: "
|
6 |
|
7 |
|
8 |
|
@@ -12,11 +12,14 @@ widget:
|
|
12 |
|
13 |
## Model description
|
14 |
|
15 |
-
The
|
16 |
|
17 |
-
|
|
|
|
|
|
|
18 |
|
19 |
-
|
20 |
|
21 |
You can use the model directly with a pipeline for text2text generation:
|
22 |
|
@@ -29,15 +32,13 @@ You can use the model directly with a pipeline for text2text generation:
|
|
29 |
[{'generated_text': 'extra0 北 extra1 extra2 extra3 extra4 extra5'}]
|
30 |
```
|
31 |
|
32 |
-
|
33 |
-
|
34 |
## Training data
|
35 |
|
36 |
[CLUECorpusSmall](https://github.com/CLUEbenchmark/CLUECorpus2020/) is used as training data.
|
37 |
|
38 |
## Training procedure
|
39 |
|
40 |
-
The model is pre-trained by [UER-py](https://github.com/dbiir/UER-py/) on [Tencent Cloud
|
41 |
|
42 |
Stage1:
|
43 |
|
|
|
2 |
language: Chinese
|
3 |
datasets: CLUECorpusSmall
|
4 |
widget:
|
5 |
+
- text: "作为电子为主的电商平台,京东商城绝对是extra0者。如今的刘强extra1已经是身价过extra2的老板。"
|
6 |
|
7 |
|
8 |
|
|
|
12 |
|
13 |
## Model description
|
14 |
|
15 |
+
The Text-to-Text Transfer Transformer (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. Following their paper, we released a series of Chinese T5 models.
|
16 |
|
17 |
+
| | Link |
|
18 |
+
| -------- | :-----------------------: |
|
19 |
+
| **Small** | [**2/128 (Tiny)**][2_128] |
|
20 |
+
| **Base** | [**4/256 (Mini)**][4_256] |
|
21 |
|
22 |
+
## How to use
|
23 |
|
24 |
You can use the model directly with a pipeline for text2text generation:
|
25 |
|
|
|
32 |
[{'generated_text': 'extra0 北 extra1 extra2 extra3 extra4 extra5'}]
|
33 |
```
|
34 |
|
|
|
|
|
35 |
## Training data
|
36 |
|
37 |
[CLUECorpusSmall](https://github.com/CLUEbenchmark/CLUECorpus2020/) is used as training data.
|
38 |
|
39 |
## Training procedure
|
40 |
|
41 |
+
The model is pre-trained by [UER-py](https://github.com/dbiir/UER-py/) on [Tencent Cloud](https://cloud.tencent.com/). We pre-train 1,000,000 steps with a sequence length of 128 and then pre-train 250,000 additional steps with a sequence length of 512.
|
42 |
|
43 |
Stage1:
|
44 |
|