Update README.md

c4115c2 verified 4 months ago

6.4 kB

	---
	base_model: []
	library_name: transformers
	tags:
	- mergekit
	- merge

	---
	![cute](https://huggingface.co/matchaaaaa/Chaifighter-Latte-14B/resolve/main/chaifighter-latte-cute.png)

	Thanks again to [@Brooketh](https://huggingface.co/brooketh) for the [GGUFs](https://huggingface.co/backyardai/Chaifighter-Latte-14B-GGUF)!!

	# Chaifighter Latte 14B

	Finally here, Chaifighter Latte is the successor to the Chaifighter 20B models. Like its predecessors, it is Mistral-based, but now it is dramatically reduced in size. Chaifighter Latte is formulated for creative, rich, verbose writing without sacrificing intelligence, awareness, and context-following abilities. Chaifighter Latte retains the great taste of the original, and despite being significantly lighter at 14 billion parameters, it performs even better. Try it for yourself!

	## Prompt Template: Alpaca

	```
	Below is an instruction that describes a task. Write a response that appropriately completes the request.

	### Instruction:
	{prompt}

	### Response:
	```

	## Recommended Settings: Universal-Light

	Here are some settings ranges that tend to work for me. They aren't strict values, and there's a bit of leeway in them. Feel free to experiment a bit!

	* Temperature: 1.0 to 1.25 (adjust to taste, but keep it low. Chaifighter is creative enough on its own)
	* Min-P: 0.1 (increasing might help if it goes cuckoo, but I suggest keeping it there)
	* Repetition Penalty: 1.05 to 1.1 (high values aren't needed and usually degrade output)
	* Rep. Penalty Range: 256 or 512
	* (all other samplers disabled)

	## The Deets

	### Mergekit

	This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).

	### Merge Method

	This model was merged using the passthrough merge method.

	### Models Merged

	* [SanjiWatsuki/Kunoichi-7B](https://huggingface.co/SanjiWatsuki/Kunoichi-7B)
	* [Sao10K/Fimbulvetr-11B-v2](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2)
	* [Sao10K/Frostwind-v2.1-m7](https://huggingface.co/Sao10K/Frostwind-v2.1-m7)
	* [Gryphe/MythoMist-7b](https://huggingface.co/Gryphe/MythoMist-7b)


	### The Sauce

	The following YAML configuration was used to produce this model:

	```yaml
	slices:
	- sources:
	- model: SanjiWatsuki/Kunoichi-7B
	layer_range: [16, 24]
	merge_method: passthrough
	dtype: float32
	name: Kuno-splice
	---
	slices:
	- sources:
	- model: Sao10K/Fimbulvetr-11B-v2
	layer_range: [8, 16]
	merge_method: passthrough
	dtype: float32
	name: Fimbul-splice
	---
	models:
	- model: Kuno-splice
	parameters:
	weight: [1, 1, 0.75, 0.625, 0.5, 0.375, 0.25, 0, 0] # 0.125 / 0.875 values removed here - "math gets screwy"
	- model: Fimbul-splice
	parameters:
	weight: [0, 0, 0.25, 0.375, 0.5, 0.625, 0.75, 1, 1] # 0.125 / 0.875 values removed here - "math gets screwy"
	merge_method: dare_linear # according to some paper, "DARE is all you need"
	base_model: Kuno-splice
	dtype: float32
	name: Kuno-Fimbul-splice
	---
	models:
	- model: Sao10K/Frostwind-v2.1-m7
	- model: Gryphe/MythoMist-7b
	parameters:
	weight: 0.37
	density: 0.8
	merge_method: dare_ties
	base_model: Sao10K/Frostwind-v2.1-m7
	dtype: float32
	name: Frosty-Mytho
	---
	slices:
	- sources:
	- model: Sao10K/Fimbulvetr-11B-v2
	layer_range: [32, 40]
	merge_method: passthrough
	dtype: float32
	name: Fimbul-splice-2
	---
	slices:
	- sources:
	- model: Frosty-Mytho
	layer_range: [8, 16]
	merge_method: passthrough
	dtype: float32
	name: Frosty-Mytho-splice
	---
	models:
	- model: Fimbul-splice-2
	parameters:
	weight: [1, 1, 0.75, 0.625, 0.5, 0.375, 0.25, 0, 0] # 0.125 / 0.875 values removed here - "math gets screwy"
	- model: Frosty-Mytho-splice
	parameters:
	weight: [0, 0, 0.25, 0.375, 0.5, 0.625, 0.75, 1, 1] # 0.125 / 0.875 values removed here - "math gets screwy"
	merge_method: dare_linear # according to some paper, "DARE is all you need"
	base_model: Fimbul-splice-2
	dtype: float32
	name: Fimbul-Frosty-Mytho-splice
	---
	slices:
	- sources: # kunoichi
	- model: SanjiWatsuki/Kunoichi-7B
	layer_range: [0, 16]
	- sources: # kunoichi gradient fimbul splice
	- model: Kuno-Fimbul-splice
	layer_range: [0, 8]
	- sources: # fimbulvetr
	- model: Sao10K/Fimbulvetr-11B-v2
	layer_range: [16, 32]
	# insert splice here
	- sources: # fimbulvetr gradient fwmm splice
	- model: Fimbul-Frosty-Mytho-splice
	layer_range: [0, 8]
	- sources: # frostwind + mythomist
	- model: Frosty-Mytho
	layer_range: [16, 32]
	merge_method: passthrough
	dtype: float32
	name: Chaifighter-Latte-14B
	```

	### The Thought Process

	So, I wanted the first layers to be Kunoichi. Kunoichi was chosen for its strong context and instruct following abilities, as well as being a really smart model overall. Plus, it's not sloutch at RP. I think this is partly what gave previous Chaifighter models the awareness that many people liked. To best harness its stellar prompt processing performance, I put Kunoichi at the head of the stack.
	Next, I applied a gradient merge that I call a "splice". Splicing models like this solves what I believe has significantly hurt the earlier Chaifighter models and many other frankenmerges, which is layer dissimilarity. Splicing the end of one stack from model A with the beginning of another stack of model B in theory helps smoothen over those differences and help bring everything together.
	The second model I introduced is Fimbulvetr-v2. This should be no surprise, as it's also a well-established ingredient of the Chaifighter recipe. Boasting incredibly strong coherence, it is the glue that can hold a story together, even with multiple characters and over longer contexts. I felt like the best place for Fimbulvetr was right after Kunoichi.
	Another splice.
	Lastly, I picked Frostwind and MythoMist as the final layers in this merge. I wanted to introduce MythoMist into the merge as I felt like it was what gave Chaifighter its flavorful writing. I paired it with Frostwind, as it's a very creative writer as well, and I felt like the two (with more emphasis on Frostwind for consistency) produced high quality outputs up to my standards.

	I'm super tired right now, sorry if some of this is hard to follow or if there are any goofy mistakes anywhere. I'll fix them, eventually

	Thanks for looking at my model, and have a fantastic day! :)