yentinglin
commited on
Commit
•
11d7e35
1
Parent(s):
98ae011
Update README.md
Browse files
README.md
CHANGED
@@ -9,133 +9,85 @@ widget:
|
|
9 |
library_name: transformers
|
10 |
pipeline_tag: text-generation
|
11 |
---
|
12 |
-
|
13 |
|
|
|
14 |
|
15 |
-
|
16 |
-
✍️ <a href="https://huggingface.co/spaces/yentinglin/Taiwan-LLaMa2" target="_blank">Online Demo</a>
|
17 |
-
•
|
18 |
-
🤗 <a href="https://huggingface.co/yentinglin" target="_blank">HF Repo</a> • 🐦 <a href="https://twitter.com/yentinglin56" target="_blank">Twitter</a> • 📃 <a href="https://arxiv.org/pdf/2305.13711.pdf" target="_blank">[Paper Coming Soon]</a>
|
19 |
-
• 👨️ <a href="https://yentingl.com/" target="_blank">Yen-Ting Lin</a>
|
20 |
-
<br/><br/>
|
21 |
-
<img src="https://www.csie.ntu.edu.tw/~miulab/taiwan-llama/logo-v2.png" width="100"> <br/>
|
22 |
-
<a href="https://github.com/tatsu-lab/stanford_alpaca/blob/main/LICENSE">
|
23 |
-
<img src="https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg"></a>
|
24 |
-
<a href="https://github.com/tatsu-lab/stanford_alpaca/blob/main/DATA_LICENSE">
|
25 |
-
<img src="https://img.shields.io/badge/Data%20License-CC%20By%20NC%204.0-red.svg"></a>
|
26 |
-
<br/>
|
27 |
-
|
28 |
-
</p>
|
29 |
|
|
|
|
|
|
|
|
|
|
|
30 |
|
31 |
|
|
|
32 |
|
33 |
-
|
34 |
-
|
|
|
35 |
|
36 |
-
|
37 |
|
38 |
-
|
39 |
-
A live demonstration of the model can be accessed at [Hugging Face Spaces](https://huggingface.co/spaces/yentinglin/Taiwan-LLaMa2).
|
40 |
|
41 |
-
|
|
|
42 |
|
43 |
-
|
44 |
|
45 |
-
2. **Instruction-Tuned**: Further fine-tuned on conversational data to offer context-aware and instruction-following responses.
|
46 |
|
47 |
-
|
48 |
|
49 |
-
|
50 |
|
|
|
51 |
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
|
|
|
|
|
56 |
|
57 |
-
|
58 |
-
|
59 |
-
| Question | ChatGPT | **Taiwan LLaMa v1.0** | **Taiwan LLaMa v0.0 (w/o pretraining)** | Llama2_Traditional_Chinese_13b_Chat |
|
60 |
-
|:-|:-|:-|:-|:-|
|
61 |
-
| 以惡地景觀著名的是高雄哪一個景點? | 以惡地地景著名的是高雄市的旗津。 | 高雄以擁有惡地景觀著名的景點是田寮月世界。✅ | 以惡地景觀著名的是高雄市的旗津島。旗津島是一個小島,... | 高雄市內有多個以惡地景觀著名的景點,其中以下面幾個最為著名: 1. 高雄捷運紅線沿途景點:高雄市政府為了推廣高雄 |
|
62 |
-
| 唐鳳是誰? | 唐鳳(Audrey Tang)是台灣���政治家、社會活動家和軟體工程師。她於2016年擔任台灣行政院數位政務委員,是台灣政府歷史上首位沒有政黨背景的政務官。唐鳳在軟體工程領域有豐富的經驗。 ✅ | 唐鳳是政治評論家、資訊社會學研究者,也是台灣第一位擔任政府數位政委的人。 ✅ | 唐鳳(Tang Feng)是一位著名的中國演員和歌手。他出生於1974年12月17日,出生地點在中國北京。唐鳳在1995年開始他的演藝生涯,並在同年發行了他的首張專輯《我不孤單》。此後,他發行了許多受歡迎的歌曲,並在多部電視劇和電影中擔任主角。唐鳳因其精湛的演技和迷人的歌聲而被譽為“亞洲魅力男王”。他的音樂獲得了多個獎項,包括“華表最佳新男演唱獎”、“華表最佳歌曲獎”和“華表最佳舞蹈獎”等。 | 唐鳳是唐朝時期的一位著名文學家,他的代表作品包括《陰騭》、《狂凡子》等。他的文學風格具有強烈的個性和創意,被譽為「唐代文學之父」。 |
|
63 |
-
| NTU 在哪? | NTU位於新加坡,具體地址是50 Nanyang Avenue, Singapore 639798。 | NTU 是國立臺灣大學的縮寫。它是臺灣的一所頂尖研究型大學,位於臺北市。 ✅| NTU 是台灣的一所大學,位於台北市。✅ | NTU 的總部位於台灣新北市板橋區,佔地面積約為1,000公頃。|
|
64 |
-
| 甚麼是22K? | "22K" 可能指的是金的純度,尤其是在珠寶和金飾方面。 | "22K"是指台灣的大學畢業生的起薪,大約是新台幣22,000元。這個詞被用來形容大學畢業生的起薪偏低,並且在就業市場上對於大學畢業生的需求不足。✅| 22K是指溫度尺度上的22,000K(千度)。1K代表一度,即溫度差距,是攝氏溫標(攝氏度)的一百萬分之一。因此,22K等於-253.15°C(攝氏溫標)或-423.67°F(華氏溫標)。 | 22K是一種簡寫,即22,000。它是一個數字,表示22,000個單位。|
|
65 |
-
| 肉圓要不要加香菜? | 這是個個人口味的問題,有些人喜歡在肉圓中加入香菜,因為香菜的清香可以增添風味;而有些人則不喜歡香菜的味道,覺得它會蓋過其他食材的味道。所以要不要加香菜,可以根據個人喜好來決定。 | 這是一個個人喜好的問題,有些人喜歡在肉圓裡加香菜,有些人不喜歡。**在台灣**,肉圓加香菜是一種常見的做法,這是一種在肉圓上加香菜的做法。然而,在其他國這是一個文化差異的問題。😊 | 肉圓加香菜可以增添一些芳香和美味的口感,如果你喜歡香菜的味道,可以試試加入肉圓中。香菜通常在製作過程中炒香,可以增添一些芳香和美味的味道。如果你想要加入香菜,可以在製作過程中炒香一些香菜,然後將其混入肉圓中。肉圓加香菜可以增添一些芳香和美味的口感,如果你喜歡香菜的味道,可以試試加入肉圓中。| 加香菜的肉圓味道更好,但是不加香菜的肉圓也很好吃。如果你喜歡香菜的味道,那麼加上香菜就好了。如果你不喜歡香菜,那麼不加也不會影響肉 |
|
66 |
-
|
67 |
-
|
68 |
-
## Model
|
69 |
-
|
70 |
-
We provide a number of model checkpoints that we trained. Please find them on Hugging Face [here](https://huggingface.co/models?search=taiwan-llama). Here are some quick links to the checkpoints that are finetuned from LLaMa 2:
|
71 |
-
|
72 |
-
| **Model** | **13B** |
|
73 |
-
|--------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
|
74 |
-
| **Taiwan-LLaMa v1.0** (_better for Taiwanese Culture_) | 🤗 <a href="https://huggingface.co/yentinglin/Taiwan-LLaMa-v1.0" target="_blank">yentinglin/Taiwan-LLaMa-v1.0</a> |
|
75 |
-
| Taiwan-LLaMa v0.9 (partial instruction set) | 🤗 <a href="https://huggingface.co/yentinglin/Taiwan-LLaMa-v0.9" target="_blank">yentinglin/Taiwan-LLaMa-v0.9</a> |
|
76 |
-
| Taiwan-LLaMa v0.0 (no Traditional Mandarin pretraining) | 🤗 <a href="https://huggingface.co/yentinglin/Taiwan-LLaMa-v0.0" target="_blank">yentinglin/Taiwan-LLaMa-v0.0</a> |
|
77 |
-
|
78 |
-
## Data
|
79 |
-
|
80 |
-
Here are some quick links to the datasets that we used to train the models:
|
81 |
-
|
82 |
-
| **Dataset** | **Link** |
|
83 |
-
|---------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
|
84 |
-
| **Instruction-tuning** | 🤗 <a href="https://huggingface.co/datasets/yentinglin/traditional_mandarin_instructions" target="_blank">yentinglin/traditional_mandarin_instructions</a> |
|
85 |
-
| Traditional Mandarin Pretraining | 🤗 <a href="https://huggingface.co/datasets/yentinglin/zh_TW_c4" target="_blank">yentinglin/zh_TW_c4</a> |
|
86 |
-
|
87 |
-
|
88 |
-
## Architecture
|
89 |
-
Taiwan-LLaMa is based on LLaMa 2, leveraging transformer architecture, <a href="https://github.com/Dao-AILab/flash-attention" target="_blank">flash attention 2</a>, and bfloat16.
|
90 |
-
|
91 |
-
It includes:
|
92 |
-
|
93 |
-
* Pretraining Phase: Pretrained on a vast corpus of over 5 billion tokens, extracted from common crawl in Traditional Mandarin.
|
94 |
-
* Fine-tuning Phase: Further instruction-tuned on over 490k multi-turn conversational data to enable more instruction-following and context-aware responses.
|
95 |
-
|
96 |
-
## Generic Capabilities on Vicuna Benchmark
|
97 |
-
|
98 |
-
The data is translated into traditional mandarin for evaluating the general capability.
|
99 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
100 |
|
101 |
-
|
102 |
|
103 |
-
|
104 |
|
105 |
-
|
106 |
-
|-------------------------------------|--------------------|
|
107 |
-
| GPT-4 | 102.59% |
|
108 |
-
| ChatGPT | 100.00% |
|
109 |
-
| **Taiwan-LLaMa v1.0** | 76.76% |
|
110 |
-
| Claude-Instant-1.2 | 74.04% |
|
111 |
-
| Llama2_Traditional_Chinese_13b_Chat | 56.21% |
|
112 |
|
113 |
|
|
|
114 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
115 |
|
116 |
-
##
|
117 |
-
We recommend hosting models with [🤗 Text Generation Inference](https://github.com/huggingface/text-generation-inference). Please see their [license](https://github.com/huggingface/text-generation-inference/blob/main/LICENSE) for details on usage and limitations.
|
118 |
-
```bash
|
119 |
-
bash run_text_generation_inference.sh "yentinglin/Taiwan-LLaMa" NUM_GPUS DIR_TO_SAVE_MODEL PORT MAX_INPUT_LEN MODEL_MAX_LEN
|
120 |
-
```
|
121 |
|
122 |
-
|
123 |
-
```
|
124 |
-
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {user} ASSISTANT:
|
125 |
-
```
|
126 |
|
127 |
-
## Setup development environment
|
128 |
-
```bash
|
129 |
-
conda create -n taiwan-llama python=3.10 -y
|
130 |
-
conda activate taiwan-llama
|
131 |
-
pip install -r requirements.txt
|
132 |
```
|
133 |
-
|
134 |
-
|
135 |
-
## Citations
|
136 |
-
If you use our code, data, or models in your research, please cite this repository. You can use the following BibTeX entry:
|
137 |
-
|
138 |
-
```bibtex
|
139 |
@inproceedings{lin-chen-2023-llm,
|
140 |
title = "{LLM}-Eval: Unified Multi-Dimensional Automatic Evaluation for Open-Domain Conversations with Large Language Models",
|
141 |
author = "Lin, Yen-Ting and Chen, Yun-Nung",
|
@@ -150,25 +102,9 @@ If you use our code, data, or models in your research, please cite this reposito
|
|
150 |
|
151 |
@misc{taiwanllama,
|
152 |
author={Lin, Yen-Ting and Chen, Yun-Nung},
|
153 |
-
title={
|
154 |
year={2023},
|
155 |
-
url={https://github.com/
|
156 |
-
note={Code and models available at https://github.com/
|
157 |
}
|
158 |
```
|
159 |
-
|
160 |
-
## Collaborate With Us
|
161 |
-
If you are interested in contributing to the development of Traditional Mandarin language models, exploring new applications, or leveraging Taiwan-LLaMa for your specific needs, please don't hesitate to contact us. We welcome collaborations from academia, industry, and individual contributors.
|
162 |
-
|
163 |
-
## License
|
164 |
-
The code in this project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.
|
165 |
-
|
166 |
-
The models included in this project are licensed under the LLAMA 2 Community License. See the [LLAMA2 License](https://github.com/facebookresearch/llama/blob/main/LICENSE) for full details.
|
167 |
-
|
168 |
-
## OpenAI Data Acknowledgment
|
169 |
-
The data included in this project were generated using OpenAI's models and are subject to OpenAI's Terms of Use. Please review [OpenAI's Terms of Use](https://openai.com/policies/terms-of-use) for details on usage and limitations.
|
170 |
-
|
171 |
-
|
172 |
-
## Acknowledgements
|
173 |
-
|
174 |
-
We thank [Meta LLaMA team](https://github.com/facebookresearch/llama) and [Vicuna team](https://github.com/lm-sys/FastChat) for their open-source efforts in democratizing large language models.
|
|
|
9 |
library_name: transformers
|
10 |
pipeline_tag: text-generation
|
11 |
---
|
12 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/5df9c78eda6d0311fd3d541f/CmusIT5OlSXvFrbTJ7l-C.png" alt="Taiwan LLM Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
|
13 |
|
14 |
+
# 🌟 Checkout [Taiwan-LLM Demo Chat-UI](http://www.twllm.com) 🌟
|
15 |
|
16 |
+
# Model Card for Taiwan LLM 13B v0.0 chat
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
+
Taiwan LLM is an advanced language model tailored for Traditional Chinese, focusing on the linguistic and cultural contexts of Taiwan.
|
19 |
+
Developed from a large base model, it's enriched with diverse Taiwanese textual sources and refined through Supervised Fine-Tuning.
|
20 |
+
This model excels in language understanding and generation, aligning closely with Taiwan's cultural nuances.
|
21 |
+
It demonstrates improved performance on various benchmarks like TC-Eval, showcasing its contextual comprehension and cultural relevance.
|
22 |
+
For detailed insights into Taiwan LLM's development and features, refer to our [technical report](https://github.com/MiuLab/Taiwan-LLaMa/blob/main/twllm_paper.pdf).
|
23 |
|
24 |
|
25 |
+
## Model description
|
26 |
|
27 |
+
- **Model type:** A 13B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
|
28 |
+
- **Language(s) (NLP):** Primarily Traditional Chinese (zh-tw)
|
29 |
+
- **Finetuned from model:** [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf)
|
30 |
|
31 |
+
### Model Sources
|
32 |
|
33 |
+
<!-- Provide the basic links for the model. -->
|
|
|
34 |
|
35 |
+
- **Repository:** https://github.com/MiuLab/Taiwan-LLaMa
|
36 |
+
- **Demo:** https://twllm.com/
|
37 |
|
38 |
+
## Performance
|
39 |
|
|
|
40 |
|
41 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/5df9c78eda6d0311fd3d541f/HTwIzw6RDha2-PhuWqSuI.png)
|
42 |
|
43 |
+
## Intended uses
|
44 |
|
45 |
+
Here's how you can run the model using the `pipeline()` function from 🤗 Transformers:
|
46 |
|
47 |
+
```python
|
48 |
+
# pip install transformers>=4.34
|
49 |
+
# pip install accelerate
|
50 |
|
51 |
+
import torch
|
52 |
+
from transformers import pipeline
|
53 |
|
54 |
+
pipe = pipeline("text-generation", model="yentinglin/Taiwan-LLaMa-v0.0", torch_dtype=torch.bfloat16, device_map="auto")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
55 |
|
56 |
+
# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
|
57 |
+
messages = [
|
58 |
+
{
|
59 |
+
"role": "system",
|
60 |
+
"content": "你是一個人工智慧助理",
|
61 |
+
},
|
62 |
+
{"role": "user", "content": "東北季風如何影響台灣氣候?"},
|
63 |
+
]
|
64 |
+
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
65 |
+
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
|
66 |
+
print(outputs[0]["generated_text"])
|
67 |
+
```
|
68 |
|
69 |
+
### Training hyperparameters
|
70 |
|
71 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/5df9c78eda6d0311fd3d541f/MdvHwdUvH-c926qyRAw7K.png)
|
72 |
|
73 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/5df9c78eda6d0311fd3d541f/kKpkvxDzOEyiAoTqmzRYO.png)
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
|
75 |
|
76 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/5df9c78eda6d0311fd3d541f/FsnlJ_fkRxf7fn5RKZnjE.png)
|
77 |
|
78 |
+
The following hyperparameters were used during training:
|
79 |
+
- learning_rate: 5e-05
|
80 |
+
- distributed_type: multi-GPU
|
81 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
82 |
+
- lr_scheduler_type: cosine
|
83 |
+
- lr_scheduler_warmup_ratio: 0.03
|
84 |
+
- num_epochs: 5.0
|
85 |
|
86 |
+
## Citation
|
|
|
|
|
|
|
|
|
87 |
|
88 |
+
If you find Taiwan LLM is useful in your work, please cite it with:
|
|
|
|
|
|
|
89 |
|
|
|
|
|
|
|
|
|
|
|
90 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
91 |
@inproceedings{lin-chen-2023-llm,
|
92 |
title = "{LLM}-Eval: Unified Multi-Dimensional Automatic Evaluation for Open-Domain Conversations with Large Language Models",
|
93 |
author = "Lin, Yen-Ting and Chen, Yun-Nung",
|
|
|
102 |
|
103 |
@misc{taiwanllama,
|
104 |
author={Lin, Yen-Ting and Chen, Yun-Nung},
|
105 |
+
title={Language Models for Taiwanese Culture},
|
106 |
year={2023},
|
107 |
+
url={https://github.com/MiuLab/Taiwan-LLaMa},
|
108 |
+
note={Code and models available at https://github.com/MiuLab/Taiwan-LLaMa},
|
109 |
}
|
110 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|