Update README.md
Browse files
README.md
CHANGED
@@ -11,9 +11,11 @@ tags:
|
|
11 |
- merge
|
12 |
- Yi
|
13 |
---
|
14 |
-
# Yi 34B Merge
|
15 |
|
16 |
-
A merge of several Yi 34B 200K models using the new DARE Ties method via mergekit
|
|
|
|
|
17 |
|
18 |
## Prompt template: Orca-Vicuna
|
19 |
```
|
@@ -24,135 +26,25 @@ ASSISTANT:
|
|
24 |
It might recognize ChatML, and possibly Alpaca-like formats. Raw prompting as described here is also effective: https://old.reddit.com/r/LocalLLaMA/comments/18zqy4s/the_secret_to_writing_quality_stories_with_llms/
|
25 |
|
26 |
|
27 |
-
|
28 |
## Running
|
29 |
-
Being a Yi model, try running a lower temperature with 0.02-0.06 MinP, a little repetition penalty, maybe mirostat with a low tau, and no other samplers. Yi tends to run "hot" by default, and it really needs a low temperature + MinP to cull the huge vocabulary.
|
30 |
-
|
31 |
-
24GB GPUs can efficiently run Yi-34B-200K models at **45K-90K context** with exllamav2, and performant UIs like [exui](https://github.com/turboderp/exui). I go into more detail in this [post](https://old.reddit.com/r/LocalLLaMA/comments/1896igc/how_i_run_34b_models_at_75k_context_on_24gb_fast/). 16GB GPUs can still run the high context with aggressive quantization.
|
32 |
-
|
33 |
-
I recommend exl2 quantizations profiled on data similar to the desired task. It is especially sensitive to the quantization data at low bpw. I've uploaded my own fiction-oriented quantizations here: https://huggingface.co/collections/brucethemoose/most-recent-merge-65742644ca03b6c514afa204
|
34 |
-
|
35 |
-
To load/train this in full-context backends like transformers, you *must* change `max_position_embeddings` in config.json to a lower value than 200,000, otherwise you will OOM! I do not recommend running high context without context-efficient backends like exllamav2 or unsloth.
|
36 |
-
|
37 |
-
|
38 |
-
## Testing Notes
|
39 |
-
|
40 |
-
See: https://huggingface.co/brucethemoose/Yi-34B-200K-DARE-merge-v5#testing-notes
|
41 |
-
|
42 |
-
A "4k" merge model was created to try and extend the context of SUS Chat and DPO-bagel before adding them to the merge: https://huggingface.co/brucethemoose/SUS-Bagel-200K-DARE-Test
|
43 |
|
44 |
-
|
45 |
|
46 |
|
47 |
-
|
48 |
|
49 |
-
|
50 |
-
|
51 |
-
### Models Merged
|
52 |
-
|
53 |
-
The following models were included in the merge:
|
54 |
-
* https://huggingface.co/kyujinpy/PlatYi-34B-200k-Q-FastChat
|
55 |
-
* https://huggingface.co/jondurbin/bagel-34b-v0.2
|
56 |
-
* https://huggingface.co/NousResearch/Nous-Capybara-34B
|
57 |
-
* https://huggingface.co/migtissera/Tess-M-Creative-v1.0
|
58 |
-
* https://huggingface.co/brucethemoose/SUS-Bagel-200K-DARE-Test
|
59 |
-
* https://huggingface.co/Mihaiii/Pallas-0.5
|
60 |
-
* https://huggingface.co/bhenrym14/airoboros-3_1-yi-34b-200k
|
61 |
-
* https://huggingface.co/adamo1139/Yi-34B-200K-AEZAKMI-v2
|
62 |
-
* https://huggingface.co/migtissera/Tess-34B-v1.4
|
63 |
-
* https://huggingface.co/SUSTech/SUS-Chat-34B
|
64 |
-
* https://huggingface.co/jondurbin/bagel-dpo-34b-v0.2
|
65 |
-
* https://huggingface.co/chargoddard/Yi-34B-200K-Llama
|
66 |
-
* https://huggingface.co/chargoddard/Yi-34B-Llama
|
67 |
-
|
68 |
-
|
69 |
-
### Configuration
|
70 |
-
|
71 |
-
The following YAML configuration was used to produce this model:
|
72 |
-
|
73 |
-
```yaml
|
74 |
-
models:
|
75 |
-
- model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama
|
76 |
-
# No parameters necessary for base model
|
77 |
-
- model: /home/alpha/Storage/Models/Raw/migtissera_Tess-34B-v1.4
|
78 |
-
parameters:
|
79 |
-
weight: [0.23, 0.125, 0.125, 0.125, 0.125, 0.125]
|
80 |
-
density: 0.59
|
81 |
-
- model: /home/alpha/Models/Raw/Mihaiii_Pallas-0.5
|
82 |
-
parameters:
|
83 |
-
weight: [0.23, 0.125, 0.125, 0.125, 0.125, 0.125]
|
84 |
-
density: 0.59
|
85 |
-
- model: /home/alpha//Storage/Models/Raw/bhenrym14_airoboros-3_1-yi-34b-200k
|
86 |
-
parameters:
|
87 |
-
weight: [0.02, 0.106, 0.106, 0.106, 0.106, 0.106]
|
88 |
-
density: 0.59
|
89 |
-
- model: /home/alpha/Storage/Models/Raw/jondurbin_bagel-34b-v0.2
|
90 |
-
#Only the SFT in the main merge since the DPO version seems to have no long context ability at all
|
91 |
-
parameters:
|
92 |
-
weight: [0.02, 0.100, 0.100, 0.100, 0.100, 0.100]
|
93 |
-
density: 0.4
|
94 |
-
- model: /home/alpha/Storage/Models/Raw/kyujinpy_PlatYi-34B-200k-Q-FastChat
|
95 |
-
parameters:
|
96 |
-
weight: [0.02, 0.100, 0.100, 0.100, 0.100, 0.100]
|
97 |
-
density: 0.59
|
98 |
-
#- model: /home/alpha/Storage/Models/Raw/ehartford_dolphin-2.2-yi-34b-200k
|
99 |
-
# Dolphin 200K seems to be funky according to multiple leaderboards and perplexity tests?
|
100 |
-
# parameters:
|
101 |
-
# weight: 0.15
|
102 |
-
# density: 0.6
|
103 |
-
- model: /home/alpha/Models/Raw/adamo1139_Yi-34B-200K-AEZAKMI-v2
|
104 |
-
parameters:
|
105 |
-
weight: [0.02, 0.110, 0.110, 0.110, 0.110, 0.110]
|
106 |
-
density: 0.59
|
107 |
-
- model: /home/alpha/Storage/Models/Raw/Nous-Capybara-34B
|
108 |
-
parameters:
|
109 |
-
weight: [0.22, 0.126, 0.126, 0.126, 0.126, 0.126]
|
110 |
-
density: 0.59
|
111 |
-
- model: /home/alpha/Storage/Models/Raw/4kmerge
|
112 |
-
parameters:
|
113 |
-
weight: [0.02, 0.108, 0.108, 0.108, 0.108, 0.108]
|
114 |
-
density: 0.5
|
115 |
-
- model: /home/alpha/Models/Raw/migtissera_Tess-M-Creative-v1.0
|
116 |
-
parameters:
|
117 |
-
weight: [0.22, 0.100, 0.100, 0.100, 0.100, 0.10]
|
118 |
-
density: 0.59
|
119 |
-
merge_method: dare_ties
|
120 |
-
tokenizer_source: union
|
121 |
-
base_model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama
|
122 |
-
parameters:
|
123 |
-
int8_mask: true
|
124 |
-
dtype: bfloat16
|
125 |
|
|
|
126 |
```
|
|
|
127 |
|
128 |
-
|
129 |
|
130 |
-
|
131 |
-
models:
|
132 |
-
- model: /home/alpha/Models/Raw/chargoddard_Yi-34B-Llama
|
133 |
-
# No parameters necessary for base model
|
134 |
-
- model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama
|
135 |
-
parameters:
|
136 |
-
weight: 0.5
|
137 |
-
density: 1
|
138 |
-
- model: /home/alpha/Models/Raw/SUSTech_SUS-Chat-34B
|
139 |
-
parameters:
|
140 |
-
weight: 0.2
|
141 |
-
density: 0.12
|
142 |
-
- model: /home/alpha/Models/Raw/jondurbin_bagel-dpo-34b-v0.2
|
143 |
-
parameters:
|
144 |
-
weight: 0.2
|
145 |
-
density: 0.15
|
146 |
-
- model: /home/alpha/Models/Raw/jondurbin_bagel-34b-v0.2
|
147 |
-
parameters:
|
148 |
-
weight: 0.1
|
149 |
-
density: 0.12
|
150 |
-
merge_method: dare_ties
|
151 |
-
tokenizer_source: union
|
152 |
-
base_model: /home/alpha/Models/Raw/chargoddard_Yi-34B-Llama
|
153 |
-
parameters:
|
154 |
-
int8_mask: true
|
155 |
-
dtype: bfloat16
|
156 |
```
|
|
|
|
|
|
|
157 |
|
158 |
|
|
|
11 |
- merge
|
12 |
- Yi
|
13 |
---
|
14 |
+
# Yi 34B Merge v8
|
15 |
|
16 |
+
A merge of several Yi 34B 200K models using the new DARE Ties method via mergekit, quantized with exllamav2 on ~300K tokens of a sci-fi story, a fantasy story, and a vicuna chat for optimal long context storywriting performance.
|
17 |
+
|
18 |
+
See the main model card: https://huggingface.co/brucethemoose/Yi-34B-200K-DARE-megamerge-v8
|
19 |
|
20 |
## Prompt template: Orca-Vicuna
|
21 |
```
|
|
|
26 |
It might recognize ChatML, and possibly Alpaca-like formats. Raw prompting as described here is also effective: https://old.reddit.com/r/LocalLLaMA/comments/18zqy4s/the_secret_to_writing_quality_stories_with_llms/
|
27 |
|
28 |
|
|
|
29 |
## Running
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
|
31 |
+
24GB GPUs can run 4bpw Yi-34B-200K models at **45K context** with exllamav2, and performant UIs like [exui](https://github.com/turboderp/exui). I go into more detail in this [post](https://old.reddit.com/r/LocalLLaMA/comments/1896igc/how_i_run_34b_models_at_75k_context_on_24gb_fast/)
|
32 |
|
33 |
|
34 |
+
Being a Yi model, try running a lower temperature with 0.05+ MinP, a little repetition penalty, maybe mirostat with a low tau, and no other samplers. Yi tends to run "hot" by default, and it really needs a low temperature + MinP to cull the huge vocabulary.
|
35 |
|
36 |
+
## Quantization Commands
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
|
38 |
+
First pass:
|
39 |
```
|
40 |
+
python /home/alpha/AI/exllamav2/convert.py --in_dir /home/alpha/FastModels/v8/v8 -o /home/alpha/FastModels/scratch -om /home/alpha/FastModels/v8meas.json --cal_dataset /home/alpha/Documents/stories.parquet -ml 32768 -mr 8 -ss 4096 -b 4.0 -hb 6 -nr
|
41 |
|
42 |
+
```
|
43 |
|
44 |
+
Second pass:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
```
|
46 |
+
python /home/alpha/AI/exllamav2/convert.py --in_dir /home/alpha/FastModels/v8/v8 -o /home/alpha/FastModels/scratch -m /home/alpha/FastModels/v8meas.json --cal_dataset /home/alpha/Documents/stories.parquet -l 12288 -r 26 -ml 32768 -mr 8 -ss 4096 -b 4.0 -hb 6 -cf /home/alpha/FastModels/v8-exl2-4bpw-fiction -nr
|
47 |
+
```
|
48 |
+
|
49 |
|
50 |
|