Merge remote-tracking branch 'origin/main'
Browse files
README.md
CHANGED
@@ -64,16 +64,16 @@ They are known to work with:
|
|
64 |
|
65 |
|
66 |
## Overview
|
67 |
-
Taiwan-LLaMa is a full parameter fine-tuned model based on LLaMa 2 for Traditional
|
68 |
|
69 |
-
**Taiwan-LLaMa v1.0** pretrained on over 5 billion tokens and instruction-tuned on over 490k conversations both in traditional
|
70 |
|
71 |
## Demo
|
72 |
A live demonstration of the model can be accessed at [Hugging Face Spaces](https://huggingface.co/spaces/yentinglin/Taiwan-LLaMa2).
|
73 |
|
74 |
## Key Features
|
75 |
|
76 |
-
1. **Traditional
|
77 |
|
78 |
2. **Instruction-Tuned**: Further fine-tuned on conversational data to offer context-aware and instruction-following responses.
|
79 |
|
@@ -106,7 +106,7 @@ We provide a number of model checkpoints that we trained. Please find them on Hu
|
|
106 |
|--------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
|
107 |
| **Taiwan-LLaMa v1.0** (_better for Taiwanese Culture_) | 🤗 <a href="https://huggingface.co/yentinglin/Taiwan-LLaMa-v1.0" target="_blank">yentinglin/Taiwan-LLaMa-v1.0</a> |
|
108 |
| Taiwan-LLaMa v0.9 (partial instruction set) | 🤗 <a href="https://huggingface.co/yentinglin/Taiwan-LLaMa-v0.9" target="_blank">yentinglin/Taiwan-LLaMa-v0.9</a> |
|
109 |
-
| Taiwan-LLaMa v0.0 (no Traditional
|
110 |
|
111 |
## Data
|
112 |
|
@@ -114,8 +114,8 @@ Here are some quick links to the datasets that we used to train the models:
|
|
114 |
|
115 |
| **Dataset** | **Link** |
|
116 |
|---------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
|
117 |
-
| **Instruction-tuning** | 🤗 <a href="https://huggingface.co/datasets/yentinglin/
|
118 |
-
| Traditional
|
119 |
|
120 |
|
121 |
## Architecture
|
@@ -123,12 +123,12 @@ Taiwan-LLaMa is based on LLaMa 2, leveraging transformer architecture, <a href="
|
|
123 |
|
124 |
It includes:
|
125 |
|
126 |
-
* Pretraining Phase: Pretrained on a vast corpus of over 5 billion tokens, extracted from common crawl in Traditional
|
127 |
* Fine-tuning Phase: Further instruction-tuned on over 490k multi-turn conversational data to enable more instruction-following and context-aware responses.
|
128 |
|
129 |
## Generic Capabilities on Vicuna Benchmark
|
130 |
|
131 |
-
The data is translated into traditional
|
132 |
|
133 |
|
134 |
<img src="./images/zhtw_vicuna_bench_chatgptbaseline.png" width="700">
|
@@ -191,7 +191,7 @@ If you use our code, data, or models in your research, please cite this reposito
|
|
191 |
```
|
192 |
|
193 |
## Collaborate With Us
|
194 |
-
If you are interested in contributing to the development of Traditional
|
195 |
|
196 |
## License
|
197 |
The code in this project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.
|
|
|
64 |
|
65 |
|
66 |
## Overview
|
67 |
+
Taiwan-LLaMa is a full parameter fine-tuned model based on LLaMa 2 for Traditional Mandarin applications.
|
68 |
|
69 |
+
**Taiwan-LLaMa v1.0** pretrained on over 5 billion tokens and instruction-tuned on over 490k conversations both in traditional mandarin.
|
70 |
|
71 |
## Demo
|
72 |
A live demonstration of the model can be accessed at [Hugging Face Spaces](https://huggingface.co/spaces/yentinglin/Taiwan-LLaMa2).
|
73 |
|
74 |
## Key Features
|
75 |
|
76 |
+
1. **Traditional Mandarin Support**: The model is fine-tuned to understand and generate text in Traditional Mandarin, making it suitable for Taiwanese culture and related applications.
|
77 |
|
78 |
2. **Instruction-Tuned**: Further fine-tuned on conversational data to offer context-aware and instruction-following responses.
|
79 |
|
|
|
106 |
|--------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
|
107 |
| **Taiwan-LLaMa v1.0** (_better for Taiwanese Culture_) | 🤗 <a href="https://huggingface.co/yentinglin/Taiwan-LLaMa-v1.0" target="_blank">yentinglin/Taiwan-LLaMa-v1.0</a> |
|
108 |
| Taiwan-LLaMa v0.9 (partial instruction set) | 🤗 <a href="https://huggingface.co/yentinglin/Taiwan-LLaMa-v0.9" target="_blank">yentinglin/Taiwan-LLaMa-v0.9</a> |
|
109 |
+
| Taiwan-LLaMa v0.0 (no Traditional Mandarin pretraining) | 🤗 <a href="https://huggingface.co/yentinglin/Taiwan-LLaMa-v0.0" target="_blank">yentinglin/Taiwan-LLaMa-v0.0</a> |
|
110 |
|
111 |
## Data
|
112 |
|
|
|
114 |
|
115 |
| **Dataset** | **Link** |
|
116 |
|---------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
|
117 |
+
| **Instruction-tuning** | 🤗 <a href="https://huggingface.co/datasets/yentinglin/traditional_mandarin_instructions" target="_blank">yentinglin/traditional_mandarin_instructions</a> |
|
118 |
+
| Traditional Mandarin Pretraining | 🤗 <a href="https://huggingface.co/datasets/yentinglin/zh_TW_c4" target="_blank">yentinglin/zh_TW_c4</a> |
|
119 |
|
120 |
|
121 |
## Architecture
|
|
|
123 |
|
124 |
It includes:
|
125 |
|
126 |
+
* Pretraining Phase: Pretrained on a vast corpus of over 5 billion tokens, extracted from common crawl in Traditional Mandarin.
|
127 |
* Fine-tuning Phase: Further instruction-tuned on over 490k multi-turn conversational data to enable more instruction-following and context-aware responses.
|
128 |
|
129 |
## Generic Capabilities on Vicuna Benchmark
|
130 |
|
131 |
+
The data is translated into traditional mandarin for evaluating the general capability.
|
132 |
|
133 |
|
134 |
<img src="./images/zhtw_vicuna_bench_chatgptbaseline.png" width="700">
|
|
|
191 |
```
|
192 |
|
193 |
## Collaborate With Us
|
194 |
+
If you are interested in contributing to the development of Traditional Mandarin language models, exploring new applications, or leveraging Taiwan-LLaMa for your specific needs, please don't hesitate to contact us. We welcome collaborations from academia, industry, and individual contributors.
|
195 |
|
196 |
## License
|
197 |
The code in this project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.
|