It's technically 10.6B parameters, but for simple naming conventions just truncate the 6.
I took a break from model merging for a bit then came back to see the AI community launching themselves another year into the future and I need to learn everything again. It's great.
This project went through several iterations and though this is the final, some previous versions had some potential but didn't workout somehow. I might revisit those and try to make them their own models. Another flavor perhaps?
Quants
Check the Model Tree for additional quants.
Details
General Roleplay/Storytelling use model. The best way I can explain the model it's weirdly direct until it decides to be creative in which it'll spit out some serious prose out of no where. Minimal slop, though if you want to kill it entirely you can use DRY and/or XTC. Surprisingly picky about instructions, so I recommend you run this model without instructs to taste then slowly introduce directions. The fewer the better it seems.
I'd also opt for higher Min P even on lower temps as for some reason, the low Min P outputs are very dry and sharp in writing. Otherwise, it's a solid model that can run hot and negative if prompted with good recall and character adhesion that can interweave said details throughout the story.
Recommended Settings Range:
Template: Llama 3
Temperature: 1.1-1.3
Min P: 0.08-0.12
Repeat Penalty: 1.05-1.1
Repeat Penalty Tokens: 256
Merge Theory
Where to fucking begin.
To start; majority of this model's creation process was experimentation and fooling around with LoRAs and new merge methods. Learned a lot at the cost of a few brain cells. Worth it.
As per usual, the idea was to make stable models and creative models then mush them together into a better model. After trial and error, I made two stable models; one (Soda) that was generally COA competent and the other (Cider) more adept for recall. Those got merged via SCE to retain context length and intellect.
The creative model was the next challenge. I knew I wanted to use SicariusSicariiStuff's Unaligned Llama project for its appropriately named unhinged creativity, but it's Llama 3 not 3.1. Trying to pull a LoRA directly from the model didn't work due to different layer names and doing some merging tricks to fix it resulted in a LoRA that made any model spam cusses like a 2012 COD lobby. So, the only feasible way to integrate it was to use ye ol faithful Model stock. Usual rules apply, higher L3.1 to L3 model ratio keeps the jank at bay. Though, some jank is inevitable.
If I had to place bets, I'd say that 50% of my time making this model was attempting to master DELLA. The theory is as straight forward as AI merging methods go, it's trying to find default values that work that has made me want to chuck my keyboard against a wall on multiple occasions. What I've gleamed is the following:
You don't need to set values for epsilon
and lambda
, but setting them give you more control over the resulting merge so it doesn't hurt to test. All of this is my opinion and flawed testing, ymmv.
epsilon
dictates the range of what parameters will be 'nulled' per say, which is useful to avoid interference and slop. This is a double edge sword though as the the bigger the range that is, the more 'nulled' the model parameters will be when merging into base. Keep in mind that epsilon
is half of that range since the drop probabilities are assigned between density - epsilon
to density + epsilon
. In my experimenting, anything above an a total of 0.05 per model runs the risk of creating a stylistically duller model and higher then a 0.1 total becomes a noticeably dumber model. I've made epsilon: 0.0175
my personal default value to start.
lambda
is less complicated as it's just the multiplication factor of the final parameters after the drop probabilities are assigned from the above range. Setting lambda: 1
(I think this is the default setting too) keep things simple and this is usually the best value to keep it at. But, there is a tiny amount of wiggle room. If lambda
> 1, you'll get a more expressive merge but lacks creativity with exponential diminishing returns. If lambda
<1, the merge gets repetitive yet retains more sanity somehow. There's a time and place for either option. For me, lambda: 1
for the base model and lambda: 1-1.1
or lambda: 0.9-1
for additional models depending the intended purposes.
As for why I expanded each model the way I did, two main reasons.
- I wasn't going to finetune on top of the resulting merge so the usual DUS stack would cause more problems then intended. The strengths of a DUS stack where you tack on an additional # of layers in the middle of the model come out after there's 'healing' to 'repair' the empty added layers via finetuning. I attempted a makeshift version of this strategy using pulled LoRAs in mergekit and it didn't work nearly as well. Having a handful of voided layers packed together makes the resulting merge less chatty and sometimes less coherent.
- It gave me more control over where I wanted extra 'brainpower'. While they are empty layers due to being zeroed out, that's only for two modules (
o_proj
anddown_proj
). The others still hold value therefore they still effect the final merge, though to a lesser extent. By being able to split and place where these layers go, I can keep similar layers closer to each other and limit problems down the line.
Config
models:
- model: Delta-Vector/Baldur-8B+kromcomp/L3.1-Spark-r64-LoRA
- model: NarrativAI/Cakrawala-Llama-3.1-8B
- model: maximalists/BRAG-Llama-3.1-8b-v0.1
base_model: Delta-Vector/Baldur-8B+kromcomp/L3.1-Spark-r64-LoRA
parameters:
normalize: false
merge_method: model_stock
chat_template: llama3
tokenizer:
source: union
dtype: float32
name: soda
---
slices:
- sources:
- layer_range: [0, 12]
model: soda
- sources:
- layer_range: [8, 12]
model: soda
parameters:
scale:
- filter: o_proj
value: 0
- filter: down_proj
value: 0
- value: 1
- sources:
- layer_range: [12, 20]
model: soda
- sources:
- layer_range: [16, 20]
model: soda
parameters:
scale:
- filter: o_proj
value: 0
- filter: down_proj
value: 0
- value: 1
- sources:
- layer_range: [20, 28]
model: soda
- sources:
- layer_range: [24, 28]
model: soda
parameters:
scale:
- filter: o_proj
value: 0
- filter: down_proj
value: 0
- value: 1
- sources:
- layer_range: [28, 32]
model: soda
parameters:
int8_mask: true
merge_method: passthrough
dtype: float32
name: pop
---
models:
- model: NeverSleep/Lumimaid-v0.2-8B+kromcomp/L3.1-Aura-r32-LoRA
- model: grimjim/BadApple-o1-Llama-3.1-8B
- model: crestf411/L3.1-8B-Slush-v1.1
base_model: crestf411/L3.1-8B-Slush-v1.1
parameters:
normalize: false
merge_method: model_stock
chat_template: llama3
tokenizer:
source: union
dtype: float32
name: cider
---
slices:
- sources:
- layer_range: [0, 12]
model: cider
- sources:
- layer_range: [8, 12]
model: cider
parameters:
scale:
- filter: o_proj
value: 0
- filter: down_proj
value: 0
- value: 1
- sources:
- layer_range: [12, 20]
model: cider
- sources:
- layer_range: [16, 20]
model: cider
parameters:
scale:
- filter: o_proj
value: 0
- filter: down_proj
value: 0
- value: 1
- sources:
- layer_range: [20, 28]
model: cider
- sources:
- layer_range: [24, 28]
model: cider
parameters:
scale:
- filter: o_proj
value: 0
- filter: down_proj
value: 0
- value: 1
- sources:
- layer_range: [28, 32]
model: cider
parameters:
int8_mask: true
merge_method: passthrough
dtype: float32
name: float
---
```yaml
models:
- model: float
parameters:
select_topk: 0.6
- model: pop
parameters:
select_topk: 0.6
base_model: float
merge_method: sce
chat_template: llama3
tokenizer:
source: union
parameters:
int8_mask: true
dtype: float32
name: syrup
---
models:
- model: SicariusSicariiStuff/LLAMA-3_8B_Unaligned_BETA
- model: ArliAI/Llama-3.1-8B-ArliAI-RPMax-v1.3+kromcomp/L3-T900-r64-LoRA
- model: invisietch/L3.1-EtherealRainbow-v1.0-rc1-8B
base_model: invisietch/L3.1-EtherealRainbow-v1.0-rc1-8B
parameters:
normalize: false
merge_method: model_stock
chat_template: llama3
tokenizer:
source: union
dtype: float32
name: semialign
---
slices:
- sources:
- layer_range: [0, 12]
model: semialign
- sources:
- layer_range: [8, 12]
model: semialign
parameters:
scale:
- filter: o_proj
value: 0
- filter: down_proj
value: 0
- value: 1
- sources:
- layer_range: [12, 20]
model: semialign
- sources:
- layer_range: [16, 20]
model: semialign
parameters:
scale:
- filter: o_proj
value: 0
- filter: down_proj
value: 0
- value: 1
- sources:
- layer_range: [20, 28]
model: semialign
- sources:
- layer_range: [24, 28]
model: semialign
parameters:
scale:
- filter: o_proj
value: 0
- filter: down_proj
value: 0
- value: 1
- sources:
- layer_range: [28, 32]
model: semialign
parameters:
int8_mask: true
merge_method: passthrough
dtype: float32
name: midal
---
models:
- model: midal
parameters:
weight: [0.2, 0.8]
density: 0.7
epsilon: 0.0125
lambda: 1.05
- model: syrup
parameters:
weight: [0.8, 0.2]
density: 0.7
epsilon: 0.0175
lambda: 1
base_model: syrup
merge_method: della
chat_template: llama3
tokenizer:
source: midal
parameters:
normalize: false
int8_mask: true
dtype: float32
name: ir
- Downloads last month
- 0