Update README.md
Browse files
README.md
CHANGED
@@ -10,16 +10,35 @@ widget:
|
|
10 |
---
|
11 |
|
12 |
# LASSL roberta-ko-small
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
Pretrained `roberta-ko-small` on korean language was trained by [LASSL](https://github.com/lassl/lassl) framework. Below performance was evaluated at 2021/12/15.
|
14 |
|
15 |
| nsmc | klue_nli | klue_sts | korquadv1 | klue_mrc | avg |
|
16 |
| ---- | -------- | -------- | --------- | ---- | -------- |
|
17 |
| 87.8846 | 66.3086 | 83.8353 | 83.1780 | 42.4585 | 72.7330 |
|
18 |
|
19 |
-
##
|
|
|
20 |
|
21 |
-
```
|
22 |
-
|
23 |
-
|
24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
```
|
|
|
|
|
|
|
|
10 |
---
|
11 |
|
12 |
# LASSL roberta-ko-small
|
13 |
+
## How to use
|
14 |
+
|
15 |
+
```python
|
16 |
+
from transformers import AutoModel, AutoTokenizer
|
17 |
+
model = AutoModel.from_pretrained("lassl/roberta-ko-small")
|
18 |
+
tokenizer = AutoTokenizer.from_pretrained("lassl/roberta-ko-small")
|
19 |
+
```
|
20 |
+
|
21 |
+
## Evaluation
|
22 |
Pretrained `roberta-ko-small` on korean language was trained by [LASSL](https://github.com/lassl/lassl) framework. Below performance was evaluated at 2021/12/15.
|
23 |
|
24 |
| nsmc | klue_nli | klue_sts | korquadv1 | klue_mrc | avg |
|
25 |
| ---- | -------- | -------- | --------- | ---- | -------- |
|
26 |
| 87.8846 | 66.3086 | 83.8353 | 83.1780 | 42.4585 | 72.7330 |
|
27 |
|
28 |
+
## Corpora
|
29 |
+
This model was trained from 6,860,062 examples (whose have 3,512,351,744 tokens). 6,860,062 examples are extracted from below corpora. If you want to get information for training, you should see `config.json`.
|
30 |
|
31 |
+
```bash
|
32 |
+
corpora/
|
33 |
+
├── [707M] kowiki_latest.txt
|
34 |
+
├── [ 26M] modu_dialogue_v1.2.txt
|
35 |
+
├── [1.3G] modu_news_v1.1.txt
|
36 |
+
├── [9.7G] modu_news_v2.0.txt
|
37 |
+
├── [ 15M] modu_np_v1.1.txt
|
38 |
+
├── [1008M] modu_spoken_v1.2.txt
|
39 |
+
├── [6.5G] modu_written_v1.0.txt
|
40 |
+
└── [413M] petition.txt
|
41 |
```
|
42 |
+
|
43 |
+
|
44 |
+
|