|
--- |
|
base_model: [] |
|
library_name: transformers |
|
tags: |
|
- mergekit |
|
- merge |
|
|
|
--- |
|
![cute](https://huggingface.co/matchaaaaa/Chaifighter-Latte-14B/resolve/main/chaifighter-latte-cute.png) |
|
|
|
**Thanks again to [@Brooketh](https://huggingface.co/brooketh) for the [GGUFs](https://huggingface.co/backyardai/Chaifighter-Latte-14B-GGUF)!!** |
|
|
|
# Chaifighter Latte 14B |
|
|
|
Finally here, Chaifighter Latte is the successor to the Chaifighter 20B models. Like its predecessors, it is Mistral-based, but now it is dramatically reduced in size. Chaifighter Latte is formulated for creative, rich, verbose writing without sacrificing intelligence, awareness, and context-following abilities. Chaifighter Latte retains the great taste of the original, and despite being significantly lighter at 14 billion parameters, it performs even better. Try it for yourself! |
|
|
|
## Prompt Template: Alpaca |
|
|
|
``` |
|
Below is an instruction that describes a task. Write a response that appropriately completes the request. |
|
|
|
### Instruction: |
|
{prompt} |
|
|
|
### Response: |
|
``` |
|
|
|
## Recommended Settings: Universal-Light |
|
|
|
Here are some settings ranges that tend to work for me. They aren't strict values, and there's a bit of leeway in them. Feel free to experiment a bit! |
|
|
|
* Temperature: **1.0** *to* **1.25** (adjust to taste, but keep it low. Chaifighter is creative enough on its own) |
|
* Min-P: **0.1** (increasing might help if it goes cuckoo, but I suggest keeping it there) |
|
* Repetition Penalty: **1.05** *to* **1.1** (high values aren't needed and usually degrade output) |
|
* Rep. Penalty Range: **256** *or* **512** |
|
* *(all other samplers disabled)* |
|
|
|
## The Deets |
|
|
|
### Mergekit |
|
|
|
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). |
|
|
|
### Merge Method |
|
|
|
This model was merged using the passthrough merge method. |
|
|
|
### Models Merged |
|
|
|
* [SanjiWatsuki/Kunoichi-7B](https://huggingface.co/SanjiWatsuki/Kunoichi-7B) |
|
* [Sao10K/Fimbulvetr-11B-v2](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2) |
|
* [Sao10K/Frostwind-v2.1-m7](https://huggingface.co/Sao10K/Frostwind-v2.1-m7) |
|
* [Gryphe/MythoMist-7b](https://huggingface.co/Gryphe/MythoMist-7b) |
|
|
|
|
|
### The Sauce |
|
|
|
The following YAML configuration was used to produce this model: |
|
|
|
```yaml |
|
slices: |
|
- sources: |
|
- model: SanjiWatsuki/Kunoichi-7B |
|
layer_range: [16, 24] |
|
merge_method: passthrough |
|
dtype: float32 |
|
name: Kuno-splice |
|
--- |
|
slices: |
|
- sources: |
|
- model: Sao10K/Fimbulvetr-11B-v2 |
|
layer_range: [8, 16] |
|
merge_method: passthrough |
|
dtype: float32 |
|
name: Fimbul-splice |
|
--- |
|
models: |
|
- model: Kuno-splice |
|
parameters: |
|
weight: [1, 1, 0.75, 0.625, 0.5, 0.375, 0.25, 0, 0] # 0.125 / 0.875 values removed here - "math gets screwy" |
|
- model: Fimbul-splice |
|
parameters: |
|
weight: [0, 0, 0.25, 0.375, 0.5, 0.625, 0.75, 1, 1] # 0.125 / 0.875 values removed here - "math gets screwy" |
|
merge_method: dare_linear # according to some paper, "DARE is all you need" |
|
base_model: Kuno-splice |
|
dtype: float32 |
|
name: Kuno-Fimbul-splice |
|
--- |
|
models: |
|
- model: Sao10K/Frostwind-v2.1-m7 |
|
- model: Gryphe/MythoMist-7b |
|
parameters: |
|
weight: 0.37 |
|
density: 0.8 |
|
merge_method: dare_ties |
|
base_model: Sao10K/Frostwind-v2.1-m7 |
|
dtype: float32 |
|
name: Frosty-Mytho |
|
--- |
|
slices: |
|
- sources: |
|
- model: Sao10K/Fimbulvetr-11B-v2 |
|
layer_range: [32, 40] |
|
merge_method: passthrough |
|
dtype: float32 |
|
name: Fimbul-splice-2 |
|
--- |
|
slices: |
|
- sources: |
|
- model: Frosty-Mytho |
|
layer_range: [8, 16] |
|
merge_method: passthrough |
|
dtype: float32 |
|
name: Frosty-Mytho-splice |
|
--- |
|
models: |
|
- model: Fimbul-splice-2 |
|
parameters: |
|
weight: [1, 1, 0.75, 0.625, 0.5, 0.375, 0.25, 0, 0] # 0.125 / 0.875 values removed here - "math gets screwy" |
|
- model: Frosty-Mytho-splice |
|
parameters: |
|
weight: [0, 0, 0.25, 0.375, 0.5, 0.625, 0.75, 1, 1] # 0.125 / 0.875 values removed here - "math gets screwy" |
|
merge_method: dare_linear # according to some paper, "DARE is all you need" |
|
base_model: Fimbul-splice-2 |
|
dtype: float32 |
|
name: Fimbul-Frosty-Mytho-splice |
|
--- |
|
slices: |
|
- sources: # kunoichi |
|
- model: SanjiWatsuki/Kunoichi-7B |
|
layer_range: [0, 16] |
|
- sources: # kunoichi gradient fimbul splice |
|
- model: Kuno-Fimbul-splice |
|
layer_range: [0, 8] |
|
- sources: # fimbulvetr |
|
- model: Sao10K/Fimbulvetr-11B-v2 |
|
layer_range: [16, 32] |
|
# insert splice here |
|
- sources: # fimbulvetr gradient fwmm splice |
|
- model: Fimbul-Frosty-Mytho-splice |
|
layer_range: [0, 8] |
|
- sources: # frostwind + mythomist |
|
- model: Frosty-Mytho |
|
layer_range: [16, 32] |
|
merge_method: passthrough |
|
dtype: float32 |
|
name: Chaifighter-Latte-14B |
|
``` |
|
|
|
### The Thought Process |
|
|
|
So, I wanted the first layers to be Kunoichi. Kunoichi was chosen for its strong context and instruct following abilities, as well as being a really smart model overall. Plus, it's not sloutch at RP. I think this is partly what gave previous Chaifighter models the awareness that many people liked. To best harness its stellar prompt processing performance, I put Kunoichi at the head of the stack. |
|
Next, I applied a gradient merge that I call a "splice". Splicing models like this solves what I believe has significantly hurt the earlier Chaifighter models and many other frankenmerges, which is layer dissimilarity. Splicing the end of one stack from model A with the beginning of another stack of model B in theory helps smoothen over those differences and help bring everything together. |
|
The second model I introduced is Fimbulvetr-v2. This should be no surprise, as it's also a well-established ingredient of the Chaifighter recipe. Boasting incredibly strong coherence, it is the glue that can hold a story together, even with multiple characters and over longer contexts. I felt like the best place for Fimbulvetr was right after Kunoichi. |
|
Another splice. |
|
Lastly, I picked Frostwind and MythoMist as the final layers in this merge. I wanted to introduce MythoMist into the merge as I felt like it was what gave Chaifighter its flavorful writing. I paired it with Frostwind, as it's a very creative writer as well, and I felt like the two (with more emphasis on Frostwind for consistency) produced high quality outputs up to my standards. |
|
|
|
I'm super tired right now, sorry if some of this is hard to follow or if there are any goofy mistakes anywhere. I'll fix them, eventually |
|
|
|
Thanks for looking at my model, and have a fantastic day! :) |