lassl
/

roberta-ko-small

Inference Endpoints

Model card Files Files and versions Community

seopbo commited on Feb 19, 2022

Commit

cdf55ff

•

1 Parent(s): 1a1d527

Update README.md

Files changed (1) hide show

README.md +24 -5

README.md CHANGED Viewed

@@ -10,16 +10,35 @@ widget:
 ---
 # LASSL roberta-ko-small
 Pretrained `roberta-ko-small` on korean language was trained by [LASSL](https://github.com/lassl/lassl) framework. Below performance was evaluated at 2021/12/15.
 | nsmc | klue_nli | klue_sts | korquadv1 | klue_mrc | avg |
 | ---- | -------- | -------- | --------- | ---- | -------- |
 | 87.8846 | 66.3086 | 83.8353 | 83.1780 | 42.4585 | 72.7330 |
-## How to use
-```python
-from transformers import AutoModel, AutoTokenizer
-model = AutoModel.from_pretrained("lassl/roberta-ko-small")
-tokenizer = AutoTokenizer.from_pretrained("lassl/roberta-ko-small")
 ```

 ---
 # LASSL roberta-ko-small
+## How to use
+```python
+from transformers import AutoModel, AutoTokenizer
+model = AutoModel.from_pretrained("lassl/roberta-ko-small")
+tokenizer = AutoTokenizer.from_pretrained("lassl/roberta-ko-small")
+```
+## Evaluation
 Pretrained `roberta-ko-small` on korean language was trained by [LASSL](https://github.com/lassl/lassl) framework. Below performance was evaluated at 2021/12/15.
 | nsmc | klue_nli | klue_sts | korquadv1 | klue_mrc | avg |
 | ---- | -------- | -------- | --------- | ---- | -------- |
 | 87.8846 | 66.3086 | 83.8353 | 83.1780 | 42.4585 | 72.7330 |
+## Corpora
+This model was trained from 6,860,062 examples (whose have 3,512,351,744 tokens). 6,860,062 examples are extracted from below corpora. If you want to get information for training, you should see `config.json`.
+```bash
+corpora/
+├── [707M]  kowiki_latest.txt
+├── [ 26M]  modu_dialogue_v1.2.txt
+├── [1.3G]  modu_news_v1.1.txt
+├── [9.7G]  modu_news_v2.0.txt
+├── [ 15M]  modu_np_v1.1.txt
+├── [1008M]  modu_spoken_v1.2.txt
+├── [6.5G]  modu_written_v1.0.txt
+└── [413M]  petition.txt
 ```