lst-nectec
/

HoogBERTa-NER-lst20

Token Classification

Inference Endpoints

Model card Files Files and versions Community

new5558 commited on Apr 5, 2023

Commit

e872017

•

1 Parent(s): 34b83ce

docs: init readme

Files changed (1) hide show

README.md +39 -0

README.md ADDED Viewed

	@@ -0,0 +1,39 @@

+---
+datasets:
+- lst20
+language:
+- thai
+widget:
+  - text: วัน ที่ _ 12 _ มีนาคม นี้ _ ฉัน จะ ไป เที่ยว วัดพระแก้ว _ ที่ กรุงเทพ
+library_name: transformers
+---
+<!-- # HoogBERTa
+This repository includes the Thai pretrained language representation (HoogBERTa_base) and the fine-tuned model for multitask sequence labeling.   -->
+# Documentation
+## Prerequisite
+Since we use subword-nmt BPE encoding, input needs to be pre-tokenize using [BEST](https://huggingface.co/datasets/best2009) standard before inputting into HoogBERTa
+```
+pip install attacut
+```
+# Citation
+Please cite as:
+``` bibtex
+@inproceedings{porkaew2021hoogberta,
+  title = {HoogBERTa: Multi-task Sequence Labeling using Thai Pretrained Language Representation},
+  author = {Peerachet Porkaew, Prachya Boonkwan and Thepchai Supnithi},
+  booktitle = {The Joint International Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP 2021)},
+  year = {2021},
+  address={Online}
+}
+```
+Download full-text [PDF](https://drive.google.com/file/d/1hwdyIssR5U_knhPE2HJigrc0rlkqWeLF/view?usp=sharing)
+Check out the code on [Github](https://github.com/lstnlp/HoogBERTa)