|
Ooof, my man ain't feeling so hot, I'd pass on this one for now. Inverting and merging 20b Llama 2 models works quite well, evening out the gradients between slices. However, these 13b Mistrals seem to HATE it, I assume due to the unbalanced nature of my recipe. More study is required. |
|
|
|
### Recipe |
|
merge_method: dare_ties |
|
|
|
- base_model: athirdpath/BigMistral-13b |
|
|
|
- model: athirdpath/NeuralHermes-Mistral-13b |
|
|
|
weight: 0.60 / density: 0.35 |
|
|
|
- model: athirdpath/NeuralHermes-Mistral-13b-INV |
|
|
|
weight: 0.40 / density: 0.30 |
|
|
|
int8_mask: true |
|
|
|
dtype: bfloat16 |