Fill-Mask
Transformers
PyTorch
xlm-roberta
Inference Endpoints

MC^2XLMR-large

Github Repo

We continually pretrain XLM-RoBERTa-large with MC^2, which supports Tibetan, Uyghur, Kazakh in the Kazakh Arabic script, and Mongolian in the traditional Mongolian script.

See details in the paper.

We have also released another model trained on MC^2: MC^2Llama-13B.

Citation

@article{zhang2024mc,
  title={MC$^2$: Towards Transparent and Culturally-Aware NLP for Minority Languages in China},
  author={Zhang, Chen and Tao, Mingxu and Huang, Quzhe and Lin, Jiuheng and Chen, Zhibin and Feng, Yansong},
  journal={arXiv preprint arXiv:2311.08348},
  year={2024}
}
Downloads last month
108
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Dataset used to train pkupie/mc2-xlmr-large

Collection including pkupie/mc2-xlmr-large