liswei commited on
Commit
5bae26d
·
verified ·
1 Parent(s): 19cfd32

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -49
README.md CHANGED
@@ -1,65 +1,60 @@
1
  ---
2
- base_model: liswei/OpenELM-1_1B-zh-cp
3
- tags:
4
- - llama-factory
5
- - full
6
- - generated_from_trainer
7
- model-index:
8
- - name: OpenELM-1_1B-zh-sft
9
- results: []
 
 
 
10
  ---
11
 
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
 
14
 
15
- # OpenELM-1_1B-zh-sft
16
 
17
- This model is a fine-tuned version of [liswei/OpenELM-1_1B-zh-cp](https://huggingface.co/liswei/OpenELM-1_1B-zh-cp) on the en2tw-alignment-sft and the TaiwanChat datasets.
18
- It achieves the following results on the evaluation set:
19
- - Loss: 1.3355
20
 
21
- ## Model description
 
22
 
23
- More information needed
 
24
 
25
- ## Intended uses & limitations
26
 
27
- More information needed
 
28
 
29
- ## Training and evaluation data
 
 
 
 
30
 
31
- More information needed
 
 
32
 
33
- ## Training procedure
34
 
35
- ### Training hyperparameters
 
 
 
 
36
 
37
- The following hyperparameters were used during training:
38
- - learning_rate: 0.0001
39
- - train_batch_size: 12
40
- - eval_batch_size: 12
41
- - seed: 42
42
- - distributed_type: multi-GPU
43
- - num_devices: 8
44
- - gradient_accumulation_steps: 4
45
- - total_train_batch_size: 384
46
- - total_eval_batch_size: 96
47
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
48
- - lr_scheduler_type: cosine
49
- - lr_scheduler_warmup_ratio: 0.1
50
- - num_epochs: 1.0
51
 
52
- ### Training results
 
 
 
53
 
54
- | Training Loss | Epoch | Step | Validation Loss |
55
- |:-------------:|:------:|:----:|:---------------:|
56
- | 1.4543 | 0.3527 | 500 | 1.4672 |
57
- | 1.3683 | 0.7053 | 1000 | 1.3570 |
58
-
59
-
60
- ### Framework versions
61
-
62
- - Transformers 4.41.1
63
- - Pytorch 2.3.0+cu121
64
- - Datasets 2.19.1
65
- - Tokenizers 0.19.1
 
1
  ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ datasets:
5
+ - liswei/zhtw-news-and-articles-2B
6
+ - liswei/PromptPair-TW
7
+ - yentinglin/TaiwanChat
8
+ base_model:
9
+ - liswei/Taiwan-ELM-1_1B
10
+ language:
11
+ - zh
12
+ pipeline_tag: text-generation
13
  ---
14
 
15
+ <center>
16
+ <img src="https://huggingface.co/liswei/Taiwan-ELM/resolve/main/Taiwan%20ELM%20Logo.jpeg" alt="Efficient LLM for Taiwan">
17
+ </center>
18
 
19
+ > Efficient LLM for Taiwan
20
 
21
+ # Taiwan ELM
 
 
22
 
23
+ Taiwan ELM is a family of Efficient LLMs for Taiwan base on [apple/OpenELM](https://huggingface.co/apple/OpenELM).
24
+ The project aims to provide an efficient model for researchers without access to large-scale computing resources.
25
 
26
+ The model is trained using a custom fork of [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) on 2B Traditional Chinese tokens and 500K instruction samples.
27
+ We will extend the model to train on larger data sets and different base models if there is sufficient demand.
28
 
29
+ ## What is being released?
30
 
31
+ We release both pre-trained base models and instruction tuned variants with 270M and 1.1B parameters.
32
+ Along with the model, datasets used to train the base and instruction-tuned models are also released.
33
 
34
+ List of released models:
35
+ * [Taiwan-ELM-270M](https://huggingface.co/liswei/Taiwan-ELM-270M)
36
+ * [Taiwan-ELM-1_1B](https://huggingface.co/liswei/Taiwan-ELM-1_1B)
37
+ * [Taiwan-ELM-270M-Instruct](https://huggingface.co/liswei/Taiwan-ELM-270M-Instruct)
38
+ * [Taiwan-ELM-1_1B-Instruct](https://huggingface.co/liswei/Taiwan-ELM-1_1B-Instruct)
39
 
40
+ List of released datasets:
41
+ * [liswei/Taiwan-Text-Excellence-2B](https://huggingface.co/datasets/liswei/Taiwan-Text-Excellence-2B)
42
+ * [liswei/PromptPair-TW](https://huggingface.co/datasets/liswei/PromptPair-TW)
43
 
44
+ ## Usage Examples
45
 
46
+ We adapt the LLaMA2 template:
47
+ ```jinja2
48
+ <s>[INST] <<SYS>>
49
+ {{ system_prompt }}
50
+ <</SYS>>
51
 
52
+ {{ user_message }} [/INST]
53
+ ```
 
 
 
 
 
 
 
 
 
 
 
 
54
 
55
+ The model could be load via `AutoModelForCausalLM` with `trust_remote_code=True`:
56
+ ```python
57
+ taiwanelm_270m = AutoModelForCausalLM.from_pretrained("liswei/Taiwan-ELM-270M", trust_remote_code=True)
58
+ ```
59
 
60
+ We also support additional generation methods and speculative generation, please find reference at [OpenELM#usage](https://huggingface.co/apple/OpenELM#usage).