File size: 10,670 Bytes
b0ecee6 2486430 b0ecee6 2486430 b0ecee6 2486430 b0ecee6 2486430 b0ecee6 2486430 b0ecee6 2486430 b0ecee6 2486430 b0ecee6 2486430 b0ecee6 2486430 b0ecee6 2486430 b0ecee6 2486430 b0ecee6 2486430 b0ecee6 2486430 b0ecee6 2486430 b0ecee6 2486430 b0ecee6 2486430 b0ecee6 2486430 b0ecee6 2486430 b0ecee6 2486430 b0ecee6 2486430 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 |
---
base_model:
- Delta-Vector/Baldur-8B
- kromcomp/L3.1-Spark-r64-LoRA
- NarrativAI/Cakrawala-Llama-3.1-8B
- maximalists/BRAG-Llama-3.1-8b-v0.1
- NeverSleep/Lumimaid-v0.2-8B
- kromcomp/L3.1-Aura-r32-LoRA
- grimjim/BadApple-o1-Llama-3.1-8B
- crestf411/L3.1-8B-Slush-v1.1
- SicariusSicariiStuff/LLAMA-3_8B_Unaligned_BETA
- ArliAI/Llama-3.1-8B-ArliAI-RPMax-v1.3
- kromcomp/L3-T900-r64-LoRA
- invisietch/L3.1-EtherealRainbow-v1.0-rc1-8B
library_name: transformers
tags:
- mergekit
- merge
- roleplay
- RP
- storytelling
license: llama3.1
---
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/667eea5cdebd46a5ec4dcc3d/Ztk0b0LwLnf51kCnSGylf.jpeg)
It's technically 10.6B parameters, but for simple naming conventions just truncate the 6.
I took a break from model merging for a bit then came back to see the AI community launching themselves another year into the future and I need to learn everything again. It's great.
This project went through several iterations and though this is the final, some previous versions had some potential but didn't workout somehow. I might revisit those and try to make them their own models. Another flavor perhaps?
### Quants
[My quants](https://huggingface.co/kromquant/L3.1-Tivir-10B-GGUFs)
Check the Model Tree for additional quants.
### Details
General Roleplay/Storytelling use model. The best way I can explain the model it's weirdly direct until it decides to be creative in which it'll spit out some serious prose out of no where. Minimal slop, though if you want to kill it entirely you can use DRY and/or XTC. Surprisingly picky about instructions, so I recommend you run this model without instructs to taste then slowly introduce directions. The fewer the better it seems.
I'd also opt for higher Min P even on lower temps as for some reason, the low Min P outputs are very dry and sharp in writing. Otherwise, it's a solid model that can run hot and negative if prompted with good recall and character adhesion that can interweave said details throughout the story.
Recommended Settings Range:
```
Template: Llama 3
Temperature: 1.1-1.3
Min P: 0.08-0.12
Repeat Penalty: 1.02-1.07
Repeat Penalty Tokens: 256
```
### Merge Theory
Where to fucking begin.
To start; majority of this model's creation process was experimentation and fooling around with LoRAs and new merge methods. Learned a lot at the cost of a few brain cells. Worth it.
As per usual, the idea was to make stable models and creative models then mush them together into a better model. After trial and error, I made two stable models; one (Soda) that was generally COA competent and the other (Cider) more adept for recall. Those got merged via SCE to retain context length and intellect.
The creative model was the next challenge. I knew I wanted to use [SicariusSicariiStuff](https://huggingface.co/SicariusSicariiStuff)'s Unaligned Llama project for its appropriately named unhinged creativity, but it's Llama 3 not 3.1. Trying to pull a LoRA directly from the model didn't work due to different layer names and doing some merging tricks to fix it resulted in a LoRA that made any model spam cusses like a 2012 COD lobby. So, the only feasible way to integrate it was to use ye ol faithful Model stock. Usual rules apply, higher L3.1 to L3 model ratio keeps the jank at bay. Though, some jank is inevitable.
If I had to place bets, I'd say that 50% of my time making this model was attempting to master DELLA. The theory is as straight forward as AI merging methods go, it's trying to find default values that work that has made me want to chuck my keyboard against a wall on multiple occasions. What I've gleamed is the following:
You don't need to set values for `epsilon` and `lambda`, but setting them give you more control over the resulting merge so it doesn't hurt to test. All of this is my opinion and flawed testing, ymmv.
`epsilon` dictates the range of what parameters will be 'nulled' per say, which is useful to avoid interference and slop. This is a double edge sword though as the the bigger the range that is, the more 'nulled' the model parameters will be when merging into base. Keep in mind that `epsilon` is *half* of that range since the drop probabilities are assigned between `density - epsilon` to `density + epsilon`. In my experimenting, anything above an a total of 0.05 per model runs the risk of creating a stylistically duller model and higher then a 0.1 total becomes a noticeably dumber model. I've made `epsilon: 0.0175` my personal default value to start.
`lambda` is less complicated as it's just the multiplication factor of the final parameters after the drop probabilities are assigned from the above range. Setting `lambda: 1` (I think this is the default setting too) keep things simple and this is usually the best value to keep it at. But, there is a tiny amount of wiggle room. If `lambda` > 1, you'll get a more expressive merge but lacks creativity with exponential diminishing returns. If `lambda` <1, the merge gets repetitive yet retains more sanity somehow. There's a time and place for either option. For me, `lambda: 1` for the base model and `lambda: 1-1.1` or `lambda: 0.9-1` for additional models depending the intended purposes.
As for why I expanded each model the way I did, two main reasons.
1) I wasn't going to finetune on top of the resulting merge so the usual DUS stack would cause more problems then intended. The strengths of a DUS stack where you tack on an additional # of layers in the middle of the model come out after there's 'healing' to 'repair' the empty added layers via finetuning. I attempted a makeshift version of this strategy using pulled LoRAs in mergekit and it didn't work nearly as well. Having a handful of voided layers packed together makes the resulting merge less chatty and sometimes less coherent.
2) It gave me more control over where I wanted extra 'brainpower'. While they are empty layers due to being zeroed out, that's only for two modules (`o_proj` and `down_proj`). The others still hold value therefore they still effect the final merge, though to a lesser extent. By being able to split and place where these layers go, I can keep similar layers closer to each other and limit problems down the line.
### Config
```yaml
models:
- model: Delta-Vector/Baldur-8B+kromcomp/L3.1-Spark-r64-LoRA
- model: NarrativAI/Cakrawala-Llama-3.1-8B
- model: maximalists/BRAG-Llama-3.1-8b-v0.1
base_model: Delta-Vector/Baldur-8B+kromcomp/L3.1-Spark-r64-LoRA
parameters:
normalize: false
merge_method: model_stock
chat_template: llama3
tokenizer:
source: union
dtype: float32
name: soda
---
slices:
- sources:
- layer_range: [0, 12]
model: soda
- sources:
- layer_range: [8, 12]
model: soda
parameters:
scale:
- filter: o_proj
value: 0
- filter: down_proj
value: 0
- value: 1
- sources:
- layer_range: [12, 20]
model: soda
- sources:
- layer_range: [16, 20]
model: soda
parameters:
scale:
- filter: o_proj
value: 0
- filter: down_proj
value: 0
- value: 1
- sources:
- layer_range: [20, 28]
model: soda
- sources:
- layer_range: [24, 28]
model: soda
parameters:
scale:
- filter: o_proj
value: 0
- filter: down_proj
value: 0
- value: 1
- sources:
- layer_range: [28, 32]
model: soda
parameters:
int8_mask: true
merge_method: passthrough
dtype: float32
name: pop
---
models:
- model: NeverSleep/Lumimaid-v0.2-8B+kromcomp/L3.1-Aura-r32-LoRA
- model: grimjim/BadApple-o1-Llama-3.1-8B
- model: crestf411/L3.1-8B-Slush-v1.1
base_model: crestf411/L3.1-8B-Slush-v1.1
parameters:
normalize: false
merge_method: model_stock
chat_template: llama3
tokenizer:
source: union
dtype: float32
name: cider
---
slices:
- sources:
- layer_range: [0, 12]
model: cider
- sources:
- layer_range: [8, 12]
model: cider
parameters:
scale:
- filter: o_proj
value: 0
- filter: down_proj
value: 0
- value: 1
- sources:
- layer_range: [12, 20]
model: cider
- sources:
- layer_range: [16, 20]
model: cider
parameters:
scale:
- filter: o_proj
value: 0
- filter: down_proj
value: 0
- value: 1
- sources:
- layer_range: [20, 28]
model: cider
- sources:
- layer_range: [24, 28]
model: cider
parameters:
scale:
- filter: o_proj
value: 0
- filter: down_proj
value: 0
- value: 1
- sources:
- layer_range: [28, 32]
model: cider
parameters:
int8_mask: true
merge_method: passthrough
dtype: float32
name: float
---
```yaml
models:
- model: float
parameters:
select_topk: 0.6
- model: pop
parameters:
select_topk: 0.6
base_model: float
merge_method: sce
chat_template: llama3
tokenizer:
source: union
parameters:
int8_mask: true
dtype: float32
name: syrup
---
models:
- model: SicariusSicariiStuff/LLAMA-3_8B_Unaligned_BETA
- model: ArliAI/Llama-3.1-8B-ArliAI-RPMax-v1.3+kromcomp/L3-T900-r64-LoRA
- model: invisietch/L3.1-EtherealRainbow-v1.0-rc1-8B
base_model: invisietch/L3.1-EtherealRainbow-v1.0-rc1-8B
parameters:
normalize: false
merge_method: model_stock
chat_template: llama3
tokenizer:
source: union
dtype: float32
name: semialign
---
slices:
- sources:
- layer_range: [0, 12]
model: semialign
- sources:
- layer_range: [8, 12]
model: semialign
parameters:
scale:
- filter: o_proj
value: 0
- filter: down_proj
value: 0
- value: 1
- sources:
- layer_range: [12, 20]
model: semialign
- sources:
- layer_range: [16, 20]
model: semialign
parameters:
scale:
- filter: o_proj
value: 0
- filter: down_proj
value: 0
- value: 1
- sources:
- layer_range: [20, 28]
model: semialign
- sources:
- layer_range: [24, 28]
model: semialign
parameters:
scale:
- filter: o_proj
value: 0
- filter: down_proj
value: 0
- value: 1
- sources:
- layer_range: [28, 32]
model: semialign
parameters:
int8_mask: true
merge_method: passthrough
dtype: float32
name: midal
---
models:
- model: midal
parameters:
weight: [0.2, 0.8]
density: 0.7
epsilon: 0.0125
lambda: 1.05
- model: syrup
parameters:
weight: [0.8, 0.2]
density: 0.7
epsilon: 0.0175
lambda: 1
base_model: syrup
merge_method: della
chat_template: llama3
tokenizer:
source: midal
parameters:
normalize: false
int8_mask: true
dtype: float32
name: ir
``` |