File size: 6,400 Bytes
0b68b50 2d5fe5f b467856 c4115c2 3c01d26 0b68b50 b467856 0b68b50 b467856 0b68b50 b467856 0b68b50 b467856 0b68b50 b467856 0b68b50 b467856 0b68b50 b467856 0b68b50 b467856 0b68b50 b467856 c4115c2 b467856 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
---
base_model: []
library_name: transformers
tags:
- mergekit
- merge
---
![cute](https://huggingface.co/matchaaaaa/Chaifighter-Latte-14B/resolve/main/chaifighter-latte-cute.png)
**Thanks again to [@Brooketh](https://huggingface.co/brooketh) for the [GGUFs](https://huggingface.co/backyardai/Chaifighter-Latte-14B-GGUF)!!**
# Chaifighter Latte 14B
Finally here, Chaifighter Latte is the successor to the Chaifighter 20B models. Like its predecessors, it is Mistral-based, but now it is dramatically reduced in size. Chaifighter Latte is formulated for creative, rich, verbose writing without sacrificing intelligence, awareness, and context-following abilities. Chaifighter Latte retains the great taste of the original, and despite being significantly lighter at 14 billion parameters, it performs even better. Try it for yourself!
## Prompt Template: Alpaca
```
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{prompt}
### Response:
```
## Recommended Settings: Universal-Light
Here are some settings ranges that tend to work for me. They aren't strict values, and there's a bit of leeway in them. Feel free to experiment a bit!
* Temperature: **1.0** *to* **1.25** (adjust to taste, but keep it low. Chaifighter is creative enough on its own)
* Min-P: **0.1** (increasing might help if it goes cuckoo, but I suggest keeping it there)
* Repetition Penalty: **1.05** *to* **1.1** (high values aren't needed and usually degrade output)
* Rep. Penalty Range: **256** *or* **512**
* *(all other samplers disabled)*
## The Deets
### Mergekit
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
### Merge Method
This model was merged using the passthrough merge method.
### Models Merged
* [SanjiWatsuki/Kunoichi-7B](https://huggingface.co/SanjiWatsuki/Kunoichi-7B)
* [Sao10K/Fimbulvetr-11B-v2](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2)
* [Sao10K/Frostwind-v2.1-m7](https://huggingface.co/Sao10K/Frostwind-v2.1-m7)
* [Gryphe/MythoMist-7b](https://huggingface.co/Gryphe/MythoMist-7b)
### The Sauce
The following YAML configuration was used to produce this model:
```yaml
slices:
- sources:
- model: SanjiWatsuki/Kunoichi-7B
layer_range: [16, 24]
merge_method: passthrough
dtype: float32
name: Kuno-splice
---
slices:
- sources:
- model: Sao10K/Fimbulvetr-11B-v2
layer_range: [8, 16]
merge_method: passthrough
dtype: float32
name: Fimbul-splice
---
models:
- model: Kuno-splice
parameters:
weight: [1, 1, 0.75, 0.625, 0.5, 0.375, 0.25, 0, 0] # 0.125 / 0.875 values removed here - "math gets screwy"
- model: Fimbul-splice
parameters:
weight: [0, 0, 0.25, 0.375, 0.5, 0.625, 0.75, 1, 1] # 0.125 / 0.875 values removed here - "math gets screwy"
merge_method: dare_linear # according to some paper, "DARE is all you need"
base_model: Kuno-splice
dtype: float32
name: Kuno-Fimbul-splice
---
models:
- model: Sao10K/Frostwind-v2.1-m7
- model: Gryphe/MythoMist-7b
parameters:
weight: 0.37
density: 0.8
merge_method: dare_ties
base_model: Sao10K/Frostwind-v2.1-m7
dtype: float32
name: Frosty-Mytho
---
slices:
- sources:
- model: Sao10K/Fimbulvetr-11B-v2
layer_range: [32, 40]
merge_method: passthrough
dtype: float32
name: Fimbul-splice-2
---
slices:
- sources:
- model: Frosty-Mytho
layer_range: [8, 16]
merge_method: passthrough
dtype: float32
name: Frosty-Mytho-splice
---
models:
- model: Fimbul-splice-2
parameters:
weight: [1, 1, 0.75, 0.625, 0.5, 0.375, 0.25, 0, 0] # 0.125 / 0.875 values removed here - "math gets screwy"
- model: Frosty-Mytho-splice
parameters:
weight: [0, 0, 0.25, 0.375, 0.5, 0.625, 0.75, 1, 1] # 0.125 / 0.875 values removed here - "math gets screwy"
merge_method: dare_linear # according to some paper, "DARE is all you need"
base_model: Fimbul-splice-2
dtype: float32
name: Fimbul-Frosty-Mytho-splice
---
slices:
- sources: # kunoichi
- model: SanjiWatsuki/Kunoichi-7B
layer_range: [0, 16]
- sources: # kunoichi gradient fimbul splice
- model: Kuno-Fimbul-splice
layer_range: [0, 8]
- sources: # fimbulvetr
- model: Sao10K/Fimbulvetr-11B-v2
layer_range: [16, 32]
# insert splice here
- sources: # fimbulvetr gradient fwmm splice
- model: Fimbul-Frosty-Mytho-splice
layer_range: [0, 8]
- sources: # frostwind + mythomist
- model: Frosty-Mytho
layer_range: [16, 32]
merge_method: passthrough
dtype: float32
name: Chaifighter-Latte-14B
```
### The Thought Process
So, I wanted the first layers to be Kunoichi. Kunoichi was chosen for its strong context and instruct following abilities, as well as being a really smart model overall. Plus, it's not sloutch at RP. I think this is partly what gave previous Chaifighter models the awareness that many people liked. To best harness its stellar prompt processing performance, I put Kunoichi at the head of the stack.
Next, I applied a gradient merge that I call a "splice". Splicing models like this solves what I believe has significantly hurt the earlier Chaifighter models and many other frankenmerges, which is layer dissimilarity. Splicing the end of one stack from model A with the beginning of another stack of model B in theory helps smoothen over those differences and help bring everything together.
The second model I introduced is Fimbulvetr-v2. This should be no surprise, as it's also a well-established ingredient of the Chaifighter recipe. Boasting incredibly strong coherence, it is the glue that can hold a story together, even with multiple characters and over longer contexts. I felt like the best place for Fimbulvetr was right after Kunoichi.
Another splice.
Lastly, I picked Frostwind and MythoMist as the final layers in this merge. I wanted to introduce MythoMist into the merge as I felt like it was what gave Chaifighter its flavorful writing. I paired it with Frostwind, as it's a very creative writer as well, and I felt like the two (with more emphasis on Frostwind for consistency) produced high quality outputs up to my standards.
I'm super tired right now, sorry if some of this is hard to follow or if there are any goofy mistakes anywhere. I'll fix them, eventually
Thanks for looking at my model, and have a fantastic day! :) |