kromeurus commited on
Commit
2486430
·
verified ·
1 Parent(s): b0ecee6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +278 -26
README.md CHANGED
@@ -1,55 +1,307 @@
1
  ---
2
  base_model:
3
- - kromcomp/L3.1-Syrupv4-10B
4
- - kromcomp/L3.1-Goldv5-10B
 
 
 
 
 
 
 
 
 
 
5
  library_name: transformers
6
  tags:
7
  - mergekit
8
  - merge
9
-
 
 
 
10
  ---
11
- # ir
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
14
 
15
- ## Merge Details
16
- ### Merge Method
17
 
18
- This model was merged using the [DELLA](https://arxiv.org/abs/2406.11617) merge method using [kromcomp/L3.1-Syrupv4-10B](https://huggingface.co/kromcomp/L3.1-Syrupv4-10B) as a base.
19
 
20
- ### Models Merged
21
 
22
- The following models were included in the merge:
23
- * [kromcomp/L3.1-Goldv5-10B](https://huggingface.co/kromcomp/L3.1-Goldv5-10B)
24
 
25
- ### Configuration
26
 
27
- The following YAML configuration was used to produce this model:
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  ```yaml
30
- base_model: kromcomp/L3.1-Syrupv4-10B
 
 
 
 
 
 
 
31
  chat_template: llama3
 
 
32
  dtype: float32
33
- merge_method: della
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  parameters:
35
- int8_mask: 1.0
36
- normalize: 0.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  slices:
38
  - sources:
39
- - layer_range: [0, 44]
40
- model: kromcomp/L3.1-Goldv5-10B
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  parameters:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  density: 0.7
43
  epsilon: 0.0125
44
  lambda: 1.05
45
- weight: [0.2, 0.8]
46
- - layer_range: [0, 44]
47
- model: kromcomp/L3.1-Syrupv4-10B
48
  parameters:
 
49
  density: 0.7
50
  epsilon: 0.0175
51
- lambda: 1.0
52
- weight: [0.8, 0.2]
 
 
53
  tokenizer:
54
- source: kromcomp/L3.1-Goldv5-10B
55
- ```
 
 
 
 
 
 
1
  ---
2
  base_model:
3
+ - Delta-Vector/Baldur-8B
4
+ - kromcomp/L3.1-Spark-r64-LoRA
5
+ - NarrativAI/Cakrawala-Llama-3.1-8B
6
+ - maximalists/BRAG-Llama-3.1-8b-v0.1
7
+ - NeverSleep/Lumimaid-v0.2-8B
8
+ - kromcomp/L3.1-Aura-r32-LoRA
9
+ - grimjim/BadApple-o1-Llama-3.1-8B
10
+ - crestf411/L3.1-8B-Slush-v1.1
11
+ - SicariusSicariiStuff/LLAMA-3_8B_Unaligned_BETA
12
+ - ArliAI/Llama-3.1-8B-ArliAI-RPMax-v1.3
13
+ - kromcomp/L3-T900-r64-LoRA
14
+ - invisietch/L3.1-EtherealRainbow-v1.0-rc1-8B
15
  library_name: transformers
16
  tags:
17
  - mergekit
18
  - merge
19
+ - roleplay
20
+ - RP
21
+ - storytelling
22
+ license: llama3.1
23
  ---
24
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/667eea5cdebd46a5ec4dcc3d/Ztk0b0LwLnf51kCnSGylf.jpeg)
25
+
26
+ It's technically 10.6B parameters, but for simple naming conventions just truncate the 6.
27
+
28
+ I took a break from model merging for a bit then came back to see the AI community launching themselves another year into the future and I need to learn everything again. It's great.
29
+
30
+ This project went through several iterations and though this is the final, some previous versions had some potential but didn't workout somehow. I might revisit those and try to make them their own models. Another flavor perhaps?
31
+ ### Quants
32
+
33
+ [My quants](https://huggingface.co/kromquant/L3.1-Tivir-10B-GGUFs)
34
+
35
+ Check the Model Tree for additional quants.
36
+ ### Details
37
+
38
+ General Roleplay/Storytelling use model. The best way I can explain the model it's weirdly direct until it decides to be creative in which it'll spit out some serious prose out of no where. Minimal slop, though if you want to kill it entirely you can use DRY and/or XTC. Surprisingly picky about instructions, so I recommend you run this model without instructs to taste then slowly introduce directions. The fewer the better it seems.
39
+
40
+ I'd also opt for higher Min P even on lower temps as for some reason, the low Min P outputs are very dry and sharp in writing. Otherwise, it's a solid model that can run hot and negative if prompted with good recall and character adhesion that can interweave said details throughout the story.
41
+
42
+ Recommended Settings Range:
43
+ ```
44
+ Template: Llama 3
45
+ Temperature: 1.1-1.3
46
+ Min P: 0.08-0.12
47
+ Repeat Penalty: 1.02-1.07
48
+ Repeat Penalty Tokens: 256
49
+ ```
50
 
51
+ ### Merge Theory
52
 
53
+ Where to fucking begin.
 
54
 
55
+ To start; majority of this model's creation process was experimentation and fooling around with LoRAs and new merge methods. Learned a lot at the cost of a few brain cells. Worth it.
56
 
57
+ As per usual, the idea was to make stable models and creative models then mush them together into a better model. After trial and error, I made two stable models; one (Soda) that was generally COA competent and the other (Cider) more adept for recall. Those got merged via SCE to retain context length and intellect.
58
 
59
+ The creative model was the next challenge. I knew I wanted to use [SicariusSicariiStuff](https://huggingface.co/SicariusSicariiStuff)'s Unaligned Llama project for its appropriately named unhinged creativity, but it's Llama 3 not 3.1. Trying to pull a LoRA directly from the model didn't work due to different layer names and doing some merging tricks to fix it resulted in a LoRA that made any model spam cusses like a 2012 COD lobby. So, the only feasible way to integrate it was to use ye ol faithful Model stock. Usual rules apply, higher L3.1 to L3 model ratio keeps the jank at bay. Though, some jank is inevitable.
 
60
 
61
+ If I had to place bets, I'd say that 50% of my time making this model was attempting to master DELLA. The theory is as straight forward as AI merging methods go, it's trying to find default values that work that has made me want to chuck my keyboard against a wall on multiple occasions. What I've gleamed is the following:
62
 
63
+ You don't need to set values for `epsilon` and `lambda`, but setting them give you more control over the resulting merge so it doesn't hurt to test. All of this is my opinion and flawed testing, ymmv.
64
+
65
+ `epsilon` dictates the range of what parameters will be 'nulled' per say, which is useful to avoid interference and slop. This is a double edge sword though as the the bigger the range that is, the more 'nulled' the model parameters will be when merging into base. Keep in mind that `epsilon` is *half* of that range since the drop probabilities are assigned between `density - epsilon` to `density + epsilon`. In my experimenting, anything above an a total of 0.05 per model runs the risk of creating a stylistically duller model and higher then a 0.1 total becomes a noticeably dumber model. I've made `epsilon: 0.0175` my personal default value to start.
66
+
67
+ `lambda` is less complicated as it's just the multiplication factor of the final parameters after the drop probabilities are assigned from the above range. Setting `lambda: 1` (I think this is the default setting too) keep things simple and this is usually the best value to keep it at. But, there is a tiny amount of wiggle room. If `lambda` > 1, you'll get a more expressive merge but lacks creativity with exponential diminishing returns. If `lambda` <1, the merge gets repetitive yet retains more sanity somehow. There's a time and place for either option. For me, `lambda: 1` for the base model and `lambda: 1-1.1` or `lambda: 0.9-1` for additional models depending the intended purposes.
68
+
69
+ As for why I expanded each model the way I did, two main reasons.
70
+
71
+ 1) I wasn't going to finetune on top of the resulting merge so the usual DUS stack would cause more problems then intended. The strengths of a DUS stack where you tack on an additional # of layers in the middle of the model come out after there's 'healing' to 'repair' the empty added layers via finetuning. I attempted a makeshift version of this strategy using pulled LoRAs in mergekit and it didn't work nearly as well. Having a handful of voided layers packed together makes the resulting merge less chatty and sometimes less coherent.
72
+ 2) It gave me more control over where I wanted extra 'brainpower'. While they are empty layers due to being zeroed out, that's only for two modules (`o_proj` and `down_proj`). The others still hold value therefore they still effect the final merge, though to a lesser extent. By being able to split and place where these layers go, I can keep similar layers closer to each other and limit problems down the line.
73
+
74
+ ### Config
75
 
76
  ```yaml
77
+ models:
78
+ - model: Delta-Vector/Baldur-8B+kromcomp/L3.1-Spark-r64-LoRA
79
+ - model: NarrativAI/Cakrawala-Llama-3.1-8B
80
+ - model: maximalists/BRAG-Llama-3.1-8b-v0.1
81
+ base_model: Delta-Vector/Baldur-8B+kromcomp/L3.1-Spark-r64-LoRA
82
+ parameters:
83
+ normalize: false
84
+ merge_method: model_stock
85
  chat_template: llama3
86
+ tokenizer:
87
+ source: union
88
  dtype: float32
89
+ name: soda
90
+ ---
91
+ slices:
92
+ - sources:
93
+ - layer_range: [0, 12]
94
+ model: soda
95
+ - sources:
96
+ - layer_range: [8, 12]
97
+ model: soda
98
+ parameters:
99
+ scale:
100
+ - filter: o_proj
101
+ value: 0
102
+ - filter: down_proj
103
+ value: 0
104
+ - value: 1
105
+ - sources:
106
+ - layer_range: [12, 20]
107
+ model: soda
108
+ - sources:
109
+ - layer_range: [16, 20]
110
+ model: soda
111
+ parameters:
112
+ scale:
113
+ - filter: o_proj
114
+ value: 0
115
+ - filter: down_proj
116
+ value: 0
117
+ - value: 1
118
+ - sources:
119
+ - layer_range: [20, 28]
120
+ model: soda
121
+ - sources:
122
+ - layer_range: [24, 28]
123
+ model: soda
124
+ parameters:
125
+ scale:
126
+ - filter: o_proj
127
+ value: 0
128
+ - filter: down_proj
129
+ value: 0
130
+ - value: 1
131
+ - sources:
132
+ - layer_range: [28, 32]
133
+ model: soda
134
+ parameters:
135
+ int8_mask: true
136
+ merge_method: passthrough
137
+ dtype: float32
138
+ name: pop
139
+ ---
140
+ models:
141
+ - model: NeverSleep/Lumimaid-v0.2-8B+kromcomp/L3.1-Aura-r32-LoRA
142
+ - model: grimjim/BadApple-o1-Llama-3.1-8B
143
+ - model: crestf411/L3.1-8B-Slush-v1.1
144
+ base_model: crestf411/L3.1-8B-Slush-v1.1
145
+ parameters:
146
+ normalize: false
147
+ merge_method: model_stock
148
+ chat_template: llama3
149
+ tokenizer:
150
+ source: union
151
+ dtype: float32
152
+ name: cider
153
+ ---
154
+ slices:
155
+ - sources:
156
+ - layer_range: [0, 12]
157
+ model: cider
158
+ - sources:
159
+ - layer_range: [8, 12]
160
+ model: cider
161
+ parameters:
162
+ scale:
163
+ - filter: o_proj
164
+ value: 0
165
+ - filter: down_proj
166
+ value: 0
167
+ - value: 1
168
+ - sources:
169
+ - layer_range: [12, 20]
170
+ model: cider
171
+ - sources:
172
+ - layer_range: [16, 20]
173
+ model: cider
174
+ parameters:
175
+ scale:
176
+ - filter: o_proj
177
+ value: 0
178
+ - filter: down_proj
179
+ value: 0
180
+ - value: 1
181
+ - sources:
182
+ - layer_range: [20, 28]
183
+ model: cider
184
+ - sources:
185
+ - layer_range: [24, 28]
186
+ model: cider
187
+ parameters:
188
+ scale:
189
+ - filter: o_proj
190
+ value: 0
191
+ - filter: down_proj
192
+ value: 0
193
+ - value: 1
194
+ - sources:
195
+ - layer_range: [28, 32]
196
+ model: cider
197
+ parameters:
198
+ int8_mask: true
199
+ merge_method: passthrough
200
+ dtype: float32
201
+ name: float
202
+ ---
203
+ ```yaml
204
+ models:
205
+ - model: float
206
+ parameters:
207
+ select_topk: 0.6
208
+ - model: pop
209
+ parameters:
210
+ select_topk: 0.6
211
+ base_model: float
212
+ merge_method: sce
213
+ chat_template: llama3
214
+ tokenizer:
215
+ source: union
216
  parameters:
217
+ int8_mask: true
218
+ dtype: float32
219
+ name: syrup
220
+ ---
221
+ models:
222
+ - model: SicariusSicariiStuff/LLAMA-3_8B_Unaligned_BETA
223
+ - model: ArliAI/Llama-3.1-8B-ArliAI-RPMax-v1.3+kromcomp/L3-T900-r64-LoRA
224
+ - model: invisietch/L3.1-EtherealRainbow-v1.0-rc1-8B
225
+ base_model: invisietch/L3.1-EtherealRainbow-v1.0-rc1-8B
226
+ parameters:
227
+ normalize: false
228
+ merge_method: model_stock
229
+ chat_template: llama3
230
+ tokenizer:
231
+ source: union
232
+ dtype: float32
233
+ name: semialign
234
+ ---
235
  slices:
236
  - sources:
237
+ - layer_range: [0, 12]
238
+ model: semialign
239
+ - sources:
240
+ - layer_range: [8, 12]
241
+ model: semialign
242
+ parameters:
243
+ scale:
244
+ - filter: o_proj
245
+ value: 0
246
+ - filter: down_proj
247
+ value: 0
248
+ - value: 1
249
+ - sources:
250
+ - layer_range: [12, 20]
251
+ model: semialign
252
+ - sources:
253
+ - layer_range: [16, 20]
254
+ model: semialign
255
  parameters:
256
+ scale:
257
+ - filter: o_proj
258
+ value: 0
259
+ - filter: down_proj
260
+ value: 0
261
+ - value: 1
262
+ - sources:
263
+ - layer_range: [20, 28]
264
+ model: semialign
265
+ - sources:
266
+ - layer_range: [24, 28]
267
+ model: semialign
268
+ parameters:
269
+ scale:
270
+ - filter: o_proj
271
+ value: 0
272
+ - filter: down_proj
273
+ value: 0
274
+ - value: 1
275
+ - sources:
276
+ - layer_range: [28, 32]
277
+ model: semialign
278
+ parameters:
279
+ int8_mask: true
280
+ merge_method: passthrough
281
+ dtype: float32
282
+ name: midal
283
+ ---
284
+ models:
285
+ - model: midal
286
+ parameters:
287
+ weight: [0.2, 0.8]
288
  density: 0.7
289
  epsilon: 0.0125
290
  lambda: 1.05
291
+ - model: syrup
 
 
292
  parameters:
293
+ weight: [0.8, 0.2]
294
  density: 0.7
295
  epsilon: 0.0175
296
+ lambda: 1
297
+ base_model: syrup
298
+ merge_method: della
299
+ chat_template: llama3
300
  tokenizer:
301
+ source: midal
302
+ parameters:
303
+ normalize: false
304
+ int8_mask: true
305
+ dtype: float32
306
+ name: ir
307
+ ```