docs: init readme
Browse files
README.md
ADDED
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
datasets:
|
3 |
+
- lst20
|
4 |
+
language:
|
5 |
+
- thai
|
6 |
+
widget:
|
7 |
+
- text: วัน ที่ _ 12 _ มีนาคม นี้ _ ฉัน จะ ไป เที่ยว วัดพระแก้ว _ ที่ กรุงเทพ
|
8 |
+
library_name: transformers
|
9 |
+
---
|
10 |
+
<!-- # HoogBERTa
|
11 |
+
|
12 |
+
This repository includes the Thai pretrained language representation (HoogBERTa_base) and the fine-tuned model for multitask sequence labeling. -->
|
13 |
+
|
14 |
+
|
15 |
+
# Documentation
|
16 |
+
|
17 |
+
|
18 |
+
## Prerequisite
|
19 |
+
Since we use subword-nmt BPE encoding, input needs to be pre-tokenize using [BEST](https://huggingface.co/datasets/best2009) standard before inputting into HoogBERTa
|
20 |
+
```
|
21 |
+
pip install attacut
|
22 |
+
```
|
23 |
+
|
24 |
+
# Citation
|
25 |
+
|
26 |
+
Please cite as:
|
27 |
+
|
28 |
+
``` bibtex
|
29 |
+
@inproceedings{porkaew2021hoogberta,
|
30 |
+
title = {HoogBERTa: Multi-task Sequence Labeling using Thai Pretrained Language Representation},
|
31 |
+
author = {Peerachet Porkaew, Prachya Boonkwan and Thepchai Supnithi},
|
32 |
+
booktitle = {The Joint International Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP 2021)},
|
33 |
+
year = {2021},
|
34 |
+
address={Online}
|
35 |
+
}
|
36 |
+
```
|
37 |
+
|
38 |
+
Download full-text [PDF](https://drive.google.com/file/d/1hwdyIssR5U_knhPE2HJigrc0rlkqWeLF/view?usp=sharing)
|
39 |
+
Check out the code on [Github](https://github.com/lstnlp/HoogBERTa)
|