YOYO-AI commited on
Commit
c4ed3cd
·
verified ·
1 Parent(s): 1f4095a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +228 -0
README.md CHANGED
@@ -14,3 +14,231 @@ tags:
14
  - merge
15
  ---
16
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e174e202fa032de4143324/8YkBIMWfWNXm0dbNwj2HH.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  - merge
15
  ---
16
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e174e202fa032de4143324/8YkBIMWfWNXm0dbNwj2HH.png)
17
+ # ZYH-LLM-Qwen2.5-14B-V3
18
+ This is the third-generation model of the **ZYH-LLM series**.
19
+
20
+ It employs a large amount of model merging techniques, aiming to provide a powerful and unified 14-billion-parameter model, laying a solid foundation for further *model merging* and *model fine-tuning*.
21
+
22
+ The following are the specific details of model merging, hoping to inspire you:
23
+
24
+ ## First stage:
25
+
26
+ ### Step 1:
27
+ ```yaml
28
+ models:
29
+ - model: Qwen/Qwen2.5-14B-Instruct
30
+ parameters:
31
+ density: 1
32
+ weight: 1
33
+ lambda: 0.9
34
+ merge_method: della
35
+ base_model: Qwen/Qwen2.5-14B
36
+ parameters:
37
+ density: 1
38
+ weight: 1
39
+ lambda: 0.9
40
+ normalize: true
41
+ int8_mask: true
42
+ dtype: bfloat16
43
+ tokenizer_source: base
44
+ name: Qwen2.5-14B-YOYO-1010
45
+ ```
46
+ ```yaml
47
+ models:
48
+ - model: Qwen/Qwen2.5-14B-Instruct-1M
49
+ parameters:
50
+ density: 1
51
+ weight: 1
52
+ lambda: 0.9
53
+ merge_method: della
54
+ base_model: Qwen/Qwen2.5-14B
55
+ parameters:
56
+ density: 1
57
+ weight: 1
58
+ lambda: 0.9
59
+ normalize: true
60
+ int8_mask: true
61
+ dtype: bfloat16
62
+ tokenizer_source: base
63
+ name: Qwen2.5-14B-YOYO-1010-1M
64
+ ```
65
+ ```yaml
66
+ models:
67
+ - model: Qwen/Qwen2.5-14B-Instruct
68
+ parameters:
69
+ density: 1
70
+ weight: 1
71
+ lambda: 0.9
72
+ merge_method: della
73
+ base_model: EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2
74
+ parameters:
75
+ density: 1
76
+ weight: 1
77
+ lambda: 0.9
78
+ normalize: true
79
+ int8_mask: true
80
+ dtype: bfloat16
81
+ tokenizer_source: base
82
+ name: EVA-Qwen2.5-14B-YOYO-1010
83
+ ```
84
+ ```yaml
85
+ models:
86
+ - model: Qwen/Qwen2.5-14B-Instruct-1M
87
+ parameters:
88
+ density: 1
89
+ weight: 1
90
+ lambda: 0.9
91
+ merge_method: della
92
+ base_model: EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2
93
+ parameters:
94
+ density: 1
95
+ weight: 1
96
+ lambda: 0.9
97
+ normalize: true
98
+ int8_mask: true
99
+ dtype: bfloat16
100
+ tokenizer_source: base
101
+ name: EVA-Qwen2.5-14B-YOYO-1010-1M
102
+ ```
103
+ ### Step 2:
104
+ ```yaml
105
+ models:
106
+ - model: EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2
107
+ parameters:
108
+ density: 1
109
+ weight: 1
110
+ lambda: 0.9
111
+ merge_method: della
112
+ base_model: Qwen/Qwen2.5-14B
113
+ parameters:
114
+ density: 1
115
+ weight: 1
116
+ lambda: 0.9
117
+ normalize: true
118
+ int8_mask: true
119
+ dtype: bfloat16
120
+ tokenizer_source: base
121
+ name: EVA-Qwen2.5-14B-base
122
+ ```
123
+ ```yaml
124
+ merge_method: sce
125
+ models:
126
+ - model: EVA-Qwen2.5-14B-base
127
+ base_model: Qwen/Qwen2.5-14B-Instruct-1M
128
+ parameters:
129
+ select_topk: 1
130
+ dtype: bfloat16
131
+ tokenizer_source: base
132
+ normalize: true
133
+ int8_mask: true
134
+ name: Qwen2.5-14B-pro
135
+ ```
136
+ ### Step 3:
137
+ ```yaml
138
+ models:
139
+ - model: Qwen2.5-14B-YOYO-1010-1M
140
+ - model: Qwen2.5-14B-YOYO-1010
141
+ - model: EVA-Qwen2.5-14B-YOYO-1010-1M
142
+ - model: EVA-Qwen2.5-14B-YOYO-1010
143
+ merge_method: sce
144
+ base_model: Qwen2.5-14B-pro
145
+ parameters:
146
+ normalize: true
147
+ int8_mask: true
148
+ dtype: bfloat16
149
+ tokenizer_source: base
150
+ name: ZYH-LLM-Qwen2.5-14B-V3-preview
151
+ ```
152
+
153
+ ## Second stage:
154
+ ```yaml
155
+ models:
156
+ - model: Qwen/Qwen2.5-14B-Instruct
157
+ parameters:
158
+ density: 1
159
+ weight: 1
160
+ lambda: 0.9
161
+ merge_method: della
162
+ base_model: arcee-ai/Virtuoso-Small-v2
163
+ parameters:
164
+ density: 1
165
+ weight: 1
166
+ lambda: 0.9
167
+ normalize: true
168
+ int8_mask: true
169
+ dtype: bfloat16
170
+ tokenizer_source: base
171
+ name: Qwen2.5-14B-YOYO-della1
172
+ ```
173
+ ```yaml
174
+ models:
175
+ - model: Qwen/Qwen2.5-14B-Instruct-1M
176
+ parameters:
177
+ density: 1
178
+ weight: 1
179
+ lambda: 0.9
180
+ merge_method: della
181
+ base_model: arcee-ai/Virtuoso-Small-v2
182
+ parameters:
183
+ density: 1
184
+ weight: 1
185
+ lambda: 0.9
186
+ normalize: true
187
+ int8_mask: true
188
+ dtype: bfloat16
189
+ tokenizer_source: base
190
+ name: Qwen2.5-14B-YOYO-della2
191
+ ```
192
+ ```yaml
193
+ models:
194
+ - model: Qwen/Qwen2.5-14B-Instruct
195
+ parameters:
196
+ density: 1
197
+ weight: 1
198
+ lambda: 0.9
199
+ merge_method: della
200
+ base_model: Azure99/Blossom-V6-14B
201
+ parameters:
202
+ density: 1
203
+ weight: 1
204
+ lambda: 0.9
205
+ normalize: true
206
+ int8_mask: true
207
+ dtype: bfloat16
208
+ tokenizer_source: base
209
+ name: Qwen2.5-14B-YOYO-della3
210
+ ```
211
+ ```yaml
212
+ models:
213
+ - model: Qwen/Qwen2.5-14B-Instruct-1M
214
+ parameters:
215
+ density: 1
216
+ weight: 1
217
+ lambda: 0.9
218
+ merge_method: della
219
+ base_model: Azure99/Blossom-V6-14B
220
+ parameters:
221
+ density: 1
222
+ weight: 1
223
+ lambda: 0.9
224
+ normalize: true
225
+ int8_mask: true
226
+ dtype: bfloat16
227
+ tokenizer_source: base
228
+ name: Qwen2.5-14B-YOYO-della4
229
+ ```
230
+ ## Final stage:
231
+ ```yaml
232
+ merge_method: model_stock
233
+ base_model: ZYH-LLM-Qwen2.5-14B-V3-preview
234
+ models:
235
+ - model: Qwen2.5-14B-YOYO-della1
236
+ - model: Qwen2.5-14B-YOYO-della2
237
+ - model: Qwen2.5-14B-YOYO-della3
238
+ - model: Qwen2.5-14B-YOYO-della4
239
+ dtype: bfloat16
240
+ tokenizer_source: base
241
+ int8_mask: true
242
+ normalize: true
243
+ name: ZYH-LLM-Qwen2.5-14B-V3
244
+ ```