---
license: apache-2.0
pipeline_tag: fill-mask
---


# manchuBERT

manchuBERT is a BERT-base model trained with romanized Manchu data from scratch.  
[ManNER & ManPOS](https://aclanthology.org/2024.lrec-main.961.pdf) are fine-tuned manchuBERT models.


## Data

manchuBERT utilizes the data augmentation method from [Mergen: The First Manchu-Korean Machine Translation Model Trained on Augmented Data](https://arxiv.org/pdf/2311.17492.pdf).

|          **Data**          | **Number of Sentences(before augmentation)** |
|:---------------------------:|:-----------------------:|
|   Manwén Lˇaodàng–Taizong   |          2,220          |
|      Ilan gurun i bithe     |          41,904         |
|      Gin ping mei bithe     |          21,376         |
|      Yùzhì Q¯ıngwénjiàn     |          11,954         |
| Yùzhì Zengdìng Q¯ıngwénjiàn |          18,420         |
|    Manwén Lˇaodàng–Taizu    |          22,578         |
|   Manchu-Korean Dictionary  |          40,583         |

## Citation
```
@misc {jean_seo_2024,
	author       = { {Jean Seo} },
	title        = { manchuBERT (Revision 64133be) },
	year         = 2024,
	url          = { https://huggingface.co/seemdog/manchuBERT },
	doi          = { 10.57967/hf/1599 },
	publisher    = { Hugging Face }
}
```