Transformers
PyTorch
llama
text-generation-inference
Inference Endpoints
luciusssss commited on
Commit
9a3034b
·
verified ·
1 Parent(s): d8e7fd2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -3
README.md CHANGED
@@ -1,3 +1,40 @@
1
- ---
2
- license: llama2
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama2
3
+ datasets:
4
+ - pkupie/mc2_corpus
5
+ language:
6
+ - bo
7
+ - ug
8
+ - mn
9
+ - kk
10
+ ---
11
+ # [MC^2Llama-13B]
12
+ [Github Repo](https://github.com/luciusssss/mc2_corpus)
13
+
14
+
15
+ We continually pretrain [llama_chinese_13b](https://huggingface.co/quzhe/llama_chinese_13B) with [MC^2](https://huggingface.co/datasets/pkupie/mc2_corpus), which supports Tibetan, Uyghur, Kazakh in the Kazakh Arabic script, and Mongolian in the traditional Mongolian script.
16
+
17
+
18
+ See details in the [paper](https://arxiv.org/abs/2311.08348).
19
+
20
+ ## Usage
21
+ the model and tokenizer can be loaded via:
22
+ ```python
23
+ from transformers import LlamaForCausalLM, LlamaTokenizer
24
+
25
+ tokenizer = LlamaTokenizer.from_pretrained("pkupie/mc2-llama-13b")
26
+ model = LlamaForCausalLM.from_pretrained("pkupie/mc2-llama-13b")
27
+ ```
28
+
29
+ ## Citation
30
+ ```
31
+ @misc{zhang2023mc2,
32
+ title={MC^2: A Multilingual Corpus of Minority Languages in China},
33
+ author={Chen Zhang and Mingxu Tao and Quzhe Huang and Jiuheng Lin and Zhibin Chen and Yansong Feng},
34
+ year={2023},
35
+ eprint={2311.08348},
36
+ archivePrefix={arXiv},
37
+ primaryClass={cs.CL}
38
+ }
39
+ ```
40
+