liswei commited on
Commit
7bb344a
·
verified ·
1 Parent(s): 94b8161

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -99
README.md CHANGED
@@ -1,115 +1,60 @@
1
  ---
2
- license: other
3
- base_model: liswei/OpenELM-270M-zh-cp
4
- tags:
5
- - llama-factory
6
- - full
7
- - generated_from_trainer
8
- model-index:
9
- - name: zh-sft
10
- results: []
 
 
11
  ---
12
 
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
 
15
 
16
- # zh-sft
17
 
18
- This model is a fine-tuned version of [saves/OpenELM-270M/zh-cp-galore](https://huggingface.co/saves/OpenELM-270M/zh-cp-galore) on the en2tw-alignment-sft and the TaiwanChat datasets.
19
- It achieves the following results on the evaluation set:
20
- - Loss: 1.5575
21
 
22
- ## Model description
 
23
 
24
- More information needed
 
25
 
26
- ## Intended uses & limitations
27
 
28
- More information needed
 
29
 
30
- ## Training and evaluation data
 
 
 
 
31
 
32
- More information needed
 
 
33
 
34
- ## Training procedure
35
 
36
- ### Training hyperparameters
 
 
 
 
37
 
38
- The following hyperparameters were used during training:
39
- - learning_rate: 0.0001
40
- - train_batch_size: 2
41
- - eval_batch_size: 4
42
- - seed: 42
43
- - distributed_type: multi-GPU
44
- - num_devices: 2
45
- - gradient_accumulation_steps: 16
46
- - total_train_batch_size: 64
47
- - total_eval_batch_size: 8
48
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
49
- - lr_scheduler_type: cosine
50
- - lr_scheduler_warmup_steps: 1000
51
- - num_epochs: 3.0
52
 
53
- ### Training results
 
 
 
54
 
55
- | Training Loss | Epoch | Step | Validation Loss |
56
- |:-------------:|:------:|:-----:|:---------------:|
57
- | 2.285 | 0.0588 | 500 | 2.2416 |
58
- | 2.0921 | 0.1176 | 1000 | 2.1271 |
59
- | 2.1212 | 0.1764 | 1500 | 2.0457 |
60
- | 1.9794 | 0.2351 | 2000 | 1.9954 |
61
- | 1.8983 | 0.2939 | 2500 | 1.9546 |
62
- | 1.8976 | 0.3527 | 3000 | 1.9214 |
63
- | 1.9345 | 0.4115 | 3500 | 1.8950 |
64
- | 1.8782 | 0.4703 | 4000 | 1.8705 |
65
- | 1.806 | 0.5291 | 4500 | 1.8493 |
66
- | 1.8282 | 0.5878 | 5000 | 1.8275 |
67
- | 1.7949 | 0.6466 | 5500 | 1.8115 |
68
- | 1.7408 | 0.7054 | 6000 | 1.7943 |
69
- | 1.6978 | 0.7642 | 6500 | 1.7782 |
70
- | 1.7152 | 0.8230 | 7000 | 1.7644 |
71
- | 1.7186 | 0.8818 | 7500 | 1.7511 |
72
- | 1.6821 | 0.9406 | 8000 | 1.7357 |
73
- | 1.6238 | 0.9993 | 8500 | 1.7211 |
74
- | 1.4753 | 1.0581 | 9000 | 1.7177 |
75
- | 1.4412 | 1.1169 | 9500 | 1.7048 |
76
- | 1.4273 | 1.1757 | 10000 | 1.6991 |
77
- | 1.4464 | 1.2345 | 10500 | 1.6840 |
78
- | 1.4484 | 1.2933 | 11000 | 1.6749 |
79
- | 1.4752 | 1.3520 | 11500 | 1.6666 |
80
- | 1.4023 | 1.4108 | 12000 | 1.6602 |
81
- | 1.3717 | 1.4696 | 12500 | 1.6467 |
82
- | 1.411 | 1.5284 | 13000 | 1.6376 |
83
- | 1.41 | 1.5872 | 13500 | 1.6298 |
84
- | 1.4263 | 1.6460 | 14000 | 1.6193 |
85
- | 1.3655 | 1.7048 | 14500 | 1.6108 |
86
- | 1.3813 | 1.7635 | 15000 | 1.6027 |
87
- | 1.3913 | 1.8223 | 15500 | 1.5948 |
88
- | 1.4214 | 1.8811 | 16000 | 1.5872 |
89
- | 1.3626 | 1.9399 | 16500 | 1.5810 |
90
- | 1.4187 | 1.9987 | 17000 | 1.5737 |
91
- | 1.154 | 2.0575 | 17500 | 1.5879 |
92
- | 1.2142 | 2.1162 | 18000 | 1.5826 |
93
- | 1.1634 | 2.1750 | 18500 | 1.5811 |
94
- | 1.1774 | 2.2338 | 19000 | 1.5750 |
95
- | 1.196 | 2.2926 | 19500 | 1.5732 |
96
- | 1.1546 | 2.3514 | 20000 | 1.5697 |
97
- | 1.1804 | 2.4102 | 20500 | 1.5666 |
98
- | 1.1517 | 2.4690 | 21000 | 1.5646 |
99
- | 1.1941 | 2.5277 | 21500 | 1.5633 |
100
- | 1.1836 | 2.5865 | 22000 | 1.5611 |
101
- | 1.1603 | 2.6453 | 22500 | 1.5599 |
102
- | 1.2281 | 2.7041 | 23000 | 1.5588 |
103
- | 1.1626 | 2.7629 | 23500 | 1.5578 |
104
- | 1.077 | 2.8217 | 24000 | 1.5579 |
105
- | 1.1677 | 2.8804 | 24500 | 1.5575 |
106
- | 1.1624 | 2.9392 | 25000 | 1.5574 |
107
- | 1.217 | 2.9980 | 25500 | 1.5575 |
108
-
109
-
110
- ### Framework versions
111
-
112
- - Transformers 4.40.1
113
- - Pytorch 2.3.0+cu118
114
- - Datasets 2.18.0
115
- - Tokenizers 0.19.1
 
1
  ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ datasets:
5
+ - liswei/zhtw-news-and-articles-2B
6
+ - liswei/PromptPair-TW
7
+ - yentinglin/TaiwanChat
8
+ base_model:
9
+ - liswei/Taiwan-ELM-270M
10
+ language:
11
+ - zh
12
+ pipeline_tag: text-generation
13
  ---
14
 
15
+ <center>
16
+ <img src="https://huggingface.co/liswei/Taiwan-ELM/resolve/main/Taiwan%20ELM%20Logo.jpeg" alt="Efficient LLM for Taiwan">
17
+ </center>
18
 
19
+ > Efficient LLM for Taiwan
20
 
21
+ # Taiwan ELM
 
 
22
 
23
+ Taiwan ELM is a family of Efficient LLMs for Taiwan base on [apple/OpenELM](https://huggingface.co/apple/OpenELM).
24
+ The project aims to provide an efficient model for researchers without access to large-scale computing resources.
25
 
26
+ The model is trained using a custom fork of [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) on 2B Traditional Chinese tokens and 500K instruction samples.
27
+ We will extend the model to train on larger data sets and different base models if there is sufficient demand.
28
 
29
+ ## What is being released?
30
 
31
+ We release both pre-trained base models and instruction tuned variants with 270M and 1.1B parameters.
32
+ Along with the model, datasets used to train the base and instruction-tuned models are also released.
33
 
34
+ List of released models:
35
+ * [Taiwan-ELM-270M](https://huggingface.co/liswei/Taiwan-ELM-270M)
36
+ * [Taiwan-ELM-1_1B](https://huggingface.co/liswei/Taiwan-ELM-1_1B)
37
+ * [Taiwan-ELM-270M-Instruct](https://huggingface.co/liswei/Taiwan-ELM-270M-Instruct)
38
+ * [Taiwan-ELM-1_1B-Instruct](https://huggingface.co/liswei/Taiwan-ELM-1_1B-Instruct)
39
 
40
+ List of released datasets:
41
+ * [liswei/Taiwan-Text-Excellence-2B](https://huggingface.co/datasets/liswei/Taiwan-Text-Excellence-2B)
42
+ * [liswei/PromptPair-TW](https://huggingface.co/datasets/liswei/PromptPair-TW)
43
 
44
+ ## Usage Examples
45
 
46
+ We adapt the LLaMA2 template:
47
+ ```jinja2
48
+ <s>[INST] <<SYS>>
49
+ {{ system_prompt }}
50
+ <</SYS>>
51
 
52
+ {{ user_message }} [/INST]
53
+ ```
 
 
 
 
 
 
 
 
 
 
 
 
54
 
55
+ The model could be load via `AutoModelForCausalLM` with `trust_remote_code=True`:
56
+ ```python
57
+ taiwanelm_270m = AutoModelForCausalLM.from_pretrained("liswei/Taiwan-ELM-270M", trust_remote_code=True)
58
+ ```
59
 
60
+ We also support additional generation methods and speculative generation, please find reference at [OpenELM#usage](https://huggingface.co/apple/OpenELM#usage).