--- license: apache-2.0 pipeline_tag: fill-mask --- # manchuBERT manchuBERT is a BERT-base model trained with romanized Manchu data from scratch. [ManNER & ManPOS](https://aclanthology.org/2024.lrec-main.961.pdf) are fine-tuned manchuBERT models. ## Data manchuBERT utilizes the data augmentation method from [Mergen: The First Manchu-Korean Machine Translation Model Trained on Augmented Data](https://arxiv.org/pdf/2311.17492.pdf). | **Data** | **Number of Sentences(before augmentation)** | |:---------------------------:|:-----------------------:| | Manwén Lˇaodàng–Taizong | 2,220 | | Ilan gurun i bithe | 41,904 | | Gin ping mei bithe | 21,376 | | Yùzhì Q¯ıngwénjiàn | 11,954 | | Yùzhì Zengdìng Q¯ıngwénjiàn | 18,420 | | Manwén Lˇaodàng–Taizu | 22,578 | | Manchu-Korean Dictionary | 40,583 | ## Citation ``` @misc {jean_seo_2024, author = { {Jean Seo} }, title = { manchuBERT (Revision 64133be) }, year = 2024, url = { https://huggingface.co/seemdog/manchuBERT }, doi = { 10.57967/hf/1599 }, publisher = { Hugging Face } } ```