new5558 commited on
Commit
e872017
1 Parent(s): 34b83ce

docs: init readme

Browse files
Files changed (1) hide show
  1. README.md +39 -0
README.md ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - lst20
4
+ language:
5
+ - thai
6
+ widget:
7
+ - text: วัน ที่ _ 12 _ มีนาคม นี้ _ ฉัน จะ ไป เที่ยว วัดพระแก้ว _ ที่ กรุงเทพ
8
+ library_name: transformers
9
+ ---
10
+ <!-- # HoogBERTa
11
+
12
+ This repository includes the Thai pretrained language representation (HoogBERTa_base) and the fine-tuned model for multitask sequence labeling. -->
13
+
14
+
15
+ # Documentation
16
+
17
+
18
+ ## Prerequisite
19
+ Since we use subword-nmt BPE encoding, input needs to be pre-tokenize using [BEST](https://huggingface.co/datasets/best2009) standard before inputting into HoogBERTa
20
+ ```
21
+ pip install attacut
22
+ ```
23
+
24
+ # Citation
25
+
26
+ Please cite as:
27
+
28
+ ``` bibtex
29
+ @inproceedings{porkaew2021hoogberta,
30
+ title = {HoogBERTa: Multi-task Sequence Labeling using Thai Pretrained Language Representation},
31
+ author = {Peerachet Porkaew, Prachya Boonkwan and Thepchai Supnithi},
32
+ booktitle = {The Joint International Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP 2021)},
33
+ year = {2021},
34
+ address={Online}
35
+ }
36
+ ```
37
+
38
+ Download full-text [PDF](https://drive.google.com/file/d/1hwdyIssR5U_knhPE2HJigrc0rlkqWeLF/view?usp=sharing)
39
+ Check out the code on [Github](https://github.com/lstnlp/HoogBERTa)