yentinglin
commited on
Commit
•
e7ea1a0
1
Parent(s):
ac4e06b
Update README.md
Browse files
README.md
CHANGED
@@ -29,16 +29,16 @@ pipeline_tag: text-generation
|
|
29 |
|
30 |
|
31 |
## Overview
|
32 |
-
Taiwan-LLaMa is a full parameter fine-tuned model based on LLaMa 2 for
|
33 |
|
34 |
-
**Taiwan-LLaMa v1.0** pretrained on over 5 billion tokens and instruction-tuned on over 490k conversations both in traditional
|
35 |
|
36 |
## Demo
|
37 |
A live demonstration of the model can be accessed at [Hugging Face Spaces](https://huggingface.co/spaces/yentinglin/Taiwan-LLaMa2).
|
38 |
|
39 |
## Key Features
|
40 |
|
41 |
-
1. **Traditional
|
42 |
|
43 |
2. **Instruction-Tuned**: Further fine-tuned on conversational data to offer context-aware and instruction-following responses.
|
44 |
|
@@ -48,8 +48,8 @@ A live demonstration of the model can be accessed at [Hugging Face Spaces](https
|
|
48 |
|
49 |
|
50 |
## Work in progress
|
51 |
-
- [ ] **Improved
|
52 |
-
- [ ] **
|
53 |
|
54 |
|
55 |
## Taiwanese Culture Examples
|
@@ -71,7 +71,7 @@ We provide a number of model checkpoints that we trained. Please find them on Hu
|
|
71 |
|--------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
|
72 |
| **Taiwan-LLaMa v1.0** (_better for Taiwanese Culture_) | 🤗 <a href="https://huggingface.co/yentinglin/Taiwan-LLaMa-v1.0" target="_blank">yentinglin/Taiwan-LLaMa-v1.0</a> |
|
73 |
| Taiwan-LLaMa v0.9 (partial instruction set) | 🤗 <a href="https://huggingface.co/yentinglin/Taiwan-LLaMa-v0.9" target="_blank">yentinglin/Taiwan-LLaMa-v0.9</a> |
|
74 |
-
| Taiwan-LLaMa v0.0 (no Traditional
|
75 |
|
76 |
## Data
|
77 |
|
@@ -79,8 +79,8 @@ Here are some quick links to the datasets that we used to train the models:
|
|
79 |
|
80 |
| **Dataset** | **Link** |
|
81 |
|---------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
|
82 |
-
| **Instruction-tuning** | 🤗 <a href="https://huggingface.co/datasets/yentinglin/
|
83 |
-
| Traditional
|
84 |
|
85 |
|
86 |
## Architecture
|
@@ -88,12 +88,12 @@ Taiwan-LLaMa is based on LLaMa 2, leveraging transformer architecture, <a href="
|
|
88 |
|
89 |
It includes:
|
90 |
|
91 |
-
* Pretraining Phase: Pretrained on a vast corpus of over 5 billion tokens, extracted from common crawl in Traditional
|
92 |
* Fine-tuning Phase: Further instruction-tuned on over 490k multi-turn conversational data to enable more instruction-following and context-aware responses.
|
93 |
|
94 |
## Generic Capabilities on Vicuna Benchmark
|
95 |
|
96 |
-
The data is translated into traditional
|
97 |
|
98 |
|
99 |
<img src="./images/zhtw_vicuna_bench_chatgptbaseline.png" width="700">
|
@@ -156,7 +156,7 @@ If you use our code, data, or models in your research, please cite this reposito
|
|
156 |
```
|
157 |
|
158 |
## Collaborate With Us
|
159 |
-
If you are interested in contributing to the development of Traditional
|
160 |
|
161 |
## License
|
162 |
The code in this project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.
|
|
|
29 |
|
30 |
|
31 |
## Overview
|
32 |
+
Taiwan-LLaMa is a full parameter fine-tuned model based on LLaMa 2 for Traditional Mandarin applications.
|
33 |
|
34 |
+
**Taiwan-LLaMa v1.0** pretrained on over 5 billion tokens and instruction-tuned on over 490k conversations both in traditional mandarin.
|
35 |
|
36 |
## Demo
|
37 |
A live demonstration of the model can be accessed at [Hugging Face Spaces](https://huggingface.co/spaces/yentinglin/Taiwan-LLaMa2).
|
38 |
|
39 |
## Key Features
|
40 |
|
41 |
+
1. **Traditional Mandarin Support**: The model is fine-tuned to understand and generate text in Traditional Mandarin, making it suitable for Taiwanese culture and related applications.
|
42 |
|
43 |
2. **Instruction-Tuned**: Further fine-tuned on conversational data to offer context-aware and instruction-following responses.
|
44 |
|
|
|
48 |
|
49 |
|
50 |
## Work in progress
|
51 |
+
- [ ] **Improved pretraining**: A refined pretraining process (e.g. more data from Taiwan, training strategies) is under development, aiming to enhance model performance for better Taiwanese culture.
|
52 |
+
- [ ] **Extend max length**: Utilizing the Rope mechanism as described in [the paper](https://arxiv.org/abs/2104.09864), the model's length will be extended from 4k to 8k.
|
53 |
|
54 |
|
55 |
## Taiwanese Culture Examples
|
|
|
71 |
|--------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
|
72 |
| **Taiwan-LLaMa v1.0** (_better for Taiwanese Culture_) | 🤗 <a href="https://huggingface.co/yentinglin/Taiwan-LLaMa-v1.0" target="_blank">yentinglin/Taiwan-LLaMa-v1.0</a> |
|
73 |
| Taiwan-LLaMa v0.9 (partial instruction set) | 🤗 <a href="https://huggingface.co/yentinglin/Taiwan-LLaMa-v0.9" target="_blank">yentinglin/Taiwan-LLaMa-v0.9</a> |
|
74 |
+
| Taiwan-LLaMa v0.0 (no Traditional Mandarin pretraining) | 🤗 <a href="https://huggingface.co/yentinglin/Taiwan-LLaMa-v0.0" target="_blank">yentinglin/Taiwan-LLaMa-v0.0</a> |
|
75 |
|
76 |
## Data
|
77 |
|
|
|
79 |
|
80 |
| **Dataset** | **Link** |
|
81 |
|---------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
|
82 |
+
| **Instruction-tuning** | 🤗 <a href="https://huggingface.co/datasets/yentinglin/traditional_mandarin_instructions" target="_blank">yentinglin/traditional_mandarin_instructions</a> |
|
83 |
+
| Traditional Mandarin Pretraining | 🤗 <a href="https://huggingface.co/datasets/yentinglin/zh_TW_c4" target="_blank">yentinglin/zh_TW_c4</a> |
|
84 |
|
85 |
|
86 |
## Architecture
|
|
|
88 |
|
89 |
It includes:
|
90 |
|
91 |
+
* Pretraining Phase: Pretrained on a vast corpus of over 5 billion tokens, extracted from common crawl in Traditional Mandarin.
|
92 |
* Fine-tuning Phase: Further instruction-tuned on over 490k multi-turn conversational data to enable more instruction-following and context-aware responses.
|
93 |
|
94 |
## Generic Capabilities on Vicuna Benchmark
|
95 |
|
96 |
+
The data is translated into traditional mandarin for evaluating the general capability.
|
97 |
|
98 |
|
99 |
<img src="./images/zhtw_vicuna_bench_chatgptbaseline.png" width="700">
|
|
|
156 |
```
|
157 |
|
158 |
## Collaborate With Us
|
159 |
+
If you are interested in contributing to the development of Traditional Mandarin language models, exploring new applications, or leveraging Taiwan-LLaMa for your specific needs, please don't hesitate to contact us. We welcome collaborations from academia, industry, and individual contributors.
|
160 |
|
161 |
## License
|
162 |
The code in this project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.
|