Update README.md

2486430 verified 1 day ago

10.7 kB

	---
	base_model:
	- Delta-Vector/Baldur-8B
	- kromcomp/L3.1-Spark-r64-LoRA
	- NarrativAI/Cakrawala-Llama-3.1-8B
	- maximalists/BRAG-Llama-3.1-8b-v0.1
	- NeverSleep/Lumimaid-v0.2-8B
	- kromcomp/L3.1-Aura-r32-LoRA
	- grimjim/BadApple-o1-Llama-3.1-8B
	- crestf411/L3.1-8B-Slush-v1.1
	- SicariusSicariiStuff/LLAMA-3_8B_Unaligned_BETA
	- ArliAI/Llama-3.1-8B-ArliAI-RPMax-v1.3
	- kromcomp/L3-T900-r64-LoRA
	- invisietch/L3.1-EtherealRainbow-v1.0-rc1-8B
	library_name: transformers
	tags:
	- mergekit
	- merge
	- roleplay
	- RP
	- storytelling
	license: llama3.1
	---
	![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/667eea5cdebd46a5ec4dcc3d/Ztk0b0LwLnf51kCnSGylf.jpeg)

	It's technically 10.6B parameters, but for simple naming conventions just truncate the 6.

	I took a break from model merging for a bit then came back to see the AI community launching themselves another year into the future and I need to learn everything again. It's great.

	This project went through several iterations and though this is the final, some previous versions had some potential but didn't workout somehow. I might revisit those and try to make them their own models. Another flavor perhaps?
	### Quants

	[My quants](https://huggingface.co/kromquant/L3.1-Tivir-10B-GGUFs)

	Check the Model Tree for additional quants.
	### Details

	General Roleplay/Storytelling use model. The best way I can explain the model it's weirdly direct until it decides to be creative in which it'll spit out some serious prose out of no where. Minimal slop, though if you want to kill it entirely you can use DRY and/or XTC. Surprisingly picky about instructions, so I recommend you run this model without instructs to taste then slowly introduce directions. The fewer the better it seems.

	I'd also opt for higher Min P even on lower temps as for some reason, the low Min P outputs are very dry and sharp in writing. Otherwise, it's a solid model that can run hot and negative if prompted with good recall and character adhesion that can interweave said details throughout the story.

	Recommended Settings Range:
	```
	Template: Llama 3
	Temperature: 1.1-1.3
	Min P: 0.08-0.12
	Repeat Penalty: 1.02-1.07
	Repeat Penalty Tokens: 256
	```

	### Merge Theory

	Where to fucking begin.

	To start; majority of this model's creation process was experimentation and fooling around with LoRAs and new merge methods. Learned a lot at the cost of a few brain cells. Worth it.

	As per usual, the idea was to make stable models and creative models then mush them together into a better model. After trial and error, I made two stable models; one (Soda) that was generally COA competent and the other (Cider) more adept for recall. Those got merged via SCE to retain context length and intellect.

	The creative model was the next challenge. I knew I wanted to use [SicariusSicariiStuff](https://huggingface.co/SicariusSicariiStuff)'s Unaligned Llama project for its appropriately named unhinged creativity, but it's Llama 3 not 3.1. Trying to pull a LoRA directly from the model didn't work due to different layer names and doing some merging tricks to fix it resulted in a LoRA that made any model spam cusses like a 2012 COD lobby. So, the only feasible way to integrate it was to use ye ol faithful Model stock. Usual rules apply, higher L3.1 to L3 model ratio keeps the jank at bay. Though, some jank is inevitable.

	If I had to place bets, I'd say that 50% of my time making this model was attempting to master DELLA. The theory is as straight forward as AI merging methods go, it's trying to find default values that work that has made me want to chuck my keyboard against a wall on multiple occasions. What I've gleamed is the following:

	You don't need to set values for `epsilon` and `lambda`, but setting them give you more control over the resulting merge so it doesn't hurt to test. All of this is my opinion and flawed testing, ymmv.

	`epsilon` dictates the range of what parameters will be 'nulled' per say, which is useful to avoid interference and slop. This is a double edge sword though as the the bigger the range that is, the more 'nulled' the model parameters will be when merging into base. Keep in mind that `epsilon` is half of that range since the drop probabilities are assigned between `density - epsilon` to `density + epsilon`. In my experimenting, anything above an a total of 0.05 per model runs the risk of creating a stylistically duller model and higher then a 0.1 total becomes a noticeably dumber model. I've made `epsilon: 0.0175` my personal default value to start.

	`lambda` is less complicated as it's just the multiplication factor of the final parameters after the drop probabilities are assigned from the above range. Setting `lambda: 1` (I think this is the default setting too) keep things simple and this is usually the best value to keep it at. But, there is a tiny amount of wiggle room. If `lambda` > 1, you'll get a more expressive merge but lacks creativity with exponential diminishing returns. If `lambda` <1, the merge gets repetitive yet retains more sanity somehow. There's a time and place for either option. For me, `lambda: 1` for the base model and `lambda: 1-1.1` or `lambda: 0.9-1` for additional models depending the intended purposes.

	As for why I expanded each model the way I did, two main reasons.

	1) I wasn't going to finetune on top of the resulting merge so the usual DUS stack would cause more problems then intended. The strengths of a DUS stack where you tack on an additional # of layers in the middle of the model come out after there's 'healing' to 'repair' the empty added layers via finetuning. I attempted a makeshift version of this strategy using pulled LoRAs in mergekit and it didn't work nearly as well. Having a handful of voided layers packed together makes the resulting merge less chatty and sometimes less coherent.
	2) It gave me more control over where I wanted extra 'brainpower'. While they are empty layers due to being zeroed out, that's only for two modules (`o_proj` and `down_proj`). The others still hold value therefore they still effect the final merge, though to a lesser extent. By being able to split and place where these layers go, I can keep similar layers closer to each other and limit problems down the line.

	### Config

	```yaml
	models:
	- model: Delta-Vector/Baldur-8B+kromcomp/L3.1-Spark-r64-LoRA
	- model: NarrativAI/Cakrawala-Llama-3.1-8B
	- model: maximalists/BRAG-Llama-3.1-8b-v0.1
	base_model: Delta-Vector/Baldur-8B+kromcomp/L3.1-Spark-r64-LoRA
	parameters:
	normalize: false
	merge_method: model_stock
	chat_template: llama3
	tokenizer:
	source: union
	dtype: float32
	name: soda
	---
	slices:
	- sources:
	- layer_range: [0, 12]
	model: soda
	- sources:
	- layer_range: [8, 12]
	model: soda
	parameters:
	scale:
	- filter: o_proj
	value: 0
	- filter: down_proj
	value: 0
	- value: 1
	- sources:
	- layer_range: [12, 20]
	model: soda
	- sources:
	- layer_range: [16, 20]
	model: soda
	parameters:
	scale:
	- filter: o_proj
	value: 0
	- filter: down_proj
	value: 0
	- value: 1
	- sources:
	- layer_range: [20, 28]
	model: soda
	- sources:
	- layer_range: [24, 28]
	model: soda
	parameters:
	scale:
	- filter: o_proj
	value: 0
	- filter: down_proj
	value: 0
	- value: 1
	- sources:
	- layer_range: [28, 32]
	model: soda
	parameters:
	int8_mask: true
	merge_method: passthrough
	dtype: float32
	name: pop
	---
	models:
	- model: NeverSleep/Lumimaid-v0.2-8B+kromcomp/L3.1-Aura-r32-LoRA
	- model: grimjim/BadApple-o1-Llama-3.1-8B
	- model: crestf411/L3.1-8B-Slush-v1.1
	base_model: crestf411/L3.1-8B-Slush-v1.1
	parameters:
	normalize: false
	merge_method: model_stock
	chat_template: llama3
	tokenizer:
	source: union
	dtype: float32
	name: cider
	---
	slices:
	- sources:
	- layer_range: [0, 12]
	model: cider
	- sources:
	- layer_range: [8, 12]
	model: cider
	parameters:
	scale:
	- filter: o_proj
	value: 0
	- filter: down_proj
	value: 0
	- value: 1
	- sources:
	- layer_range: [12, 20]
	model: cider
	- sources:
	- layer_range: [16, 20]
	model: cider
	parameters:
	scale:
	- filter: o_proj
	value: 0
	- filter: down_proj
	value: 0
	- value: 1
	- sources:
	- layer_range: [20, 28]
	model: cider
	- sources:
	- layer_range: [24, 28]
	model: cider
	parameters:
	scale:
	- filter: o_proj
	value: 0
	- filter: down_proj
	value: 0
	- value: 1
	- sources:
	- layer_range: [28, 32]
	model: cider
	parameters:
	int8_mask: true
	merge_method: passthrough
	dtype: float32
	name: float
	---
	```yaml
	models:
	- model: float
	parameters:
	select_topk: 0.6
	- model: pop
	parameters:
	select_topk: 0.6
	base_model: float
	merge_method: sce
	chat_template: llama3
	tokenizer:
	source: union
	parameters:
	int8_mask: true
	dtype: float32
	name: syrup
	---
	models:
	- model: SicariusSicariiStuff/LLAMA-3_8B_Unaligned_BETA
	- model: ArliAI/Llama-3.1-8B-ArliAI-RPMax-v1.3+kromcomp/L3-T900-r64-LoRA
	- model: invisietch/L3.1-EtherealRainbow-v1.0-rc1-8B
	base_model: invisietch/L3.1-EtherealRainbow-v1.0-rc1-8B
	parameters:
	normalize: false
	merge_method: model_stock
	chat_template: llama3
	tokenizer:
	source: union
	dtype: float32
	name: semialign
	---
	slices:
	- sources:
	- layer_range: [0, 12]
	model: semialign
	- sources:
	- layer_range: [8, 12]
	model: semialign
	parameters:
	scale:
	- filter: o_proj
	value: 0
	- filter: down_proj
	value: 0
	- value: 1
	- sources:
	- layer_range: [12, 20]
	model: semialign
	- sources:
	- layer_range: [16, 20]
	model: semialign
	parameters:
	scale:
	- filter: o_proj
	value: 0
	- filter: down_proj
	value: 0
	- value: 1
	- sources:
	- layer_range: [20, 28]
	model: semialign
	- sources:
	- layer_range: [24, 28]
	model: semialign
	parameters:
	scale:
	- filter: o_proj
	value: 0
	- filter: down_proj
	value: 0
	- value: 1
	- sources:
	- layer_range: [28, 32]
	model: semialign
	parameters:
	int8_mask: true
	merge_method: passthrough
	dtype: float32
	name: midal
	---
	models:
	- model: midal
	parameters:
	weight: [0.2, 0.8]
	density: 0.7
	epsilon: 0.0125
	lambda: 1.05
	- model: syrup
	parameters:
	weight: [0.8, 0.2]
	density: 0.7
	epsilon: 0.0175
	lambda: 1
	base_model: syrup
	merge_method: della
	chat_template: llama3
	tokenizer:
	source: midal
	parameters:
	normalize: false
	int8_mask: true
	dtype: float32
	name: ir
	```