Update README.md
Browse files
README.md
CHANGED
@@ -29,10 +29,10 @@ Lamarck 14B v0.7: A generalist merge with emphasis on multi-step reasoning, pro
|
|
29 |
Lamarck is produced by a custom toolchain to automate a complex sequences of LoRAs and various layer-targeting merges:
|
30 |
|
31 |
- **Extracted LoRA adapters from special-purpose merges**
|
32 |
-
- **Custom base models and model_stocks
|
33 |
- **Separate branches for aggressive breadcrumbs and conservative DELLA merges**
|
34 |
- **Highly targeted weight/density gradients for every 2-4 layers, at each stage**
|
35 |
-
- **Finalization through SLERP+TIES merges recombining the
|
36 |
|
37 |
Lamarck's performance comes from an ancestry that goes back through careful merges to select finetuning work, upcycled and combined. Through intermediate merges, [arcee-ai/Virtuoso-Small](https://huggingface.co/arcee-ai/Virtuoso-Small) [sthenno-com/miscii-14b-1225](https://huggingface.co/sthenno-com/miscii-14b-1225) and [VAGOsolutions/SauerkrautLM-v2-14b-DPO](https://huggingface.co/VAGOsolutions/SauerkrautLM-v2-14b-DPO) are emphasized in early layers for extra BBH; later layers add synergistic influence from [deepseek-ai/DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), [Krystalan/DRT-o1-14B](https://huggingface.co/Krystalan/DRT-o1-14B), [EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2), and [CultriX/Qwen2.5-14B-Wernicke](https://huggingface.co/CultriX/Qwen2.5-14B-Wernicke).
|
38 |
|
|
|
29 |
Lamarck is produced by a custom toolchain to automate a complex sequences of LoRAs and various layer-targeting merges:
|
30 |
|
31 |
- **Extracted LoRA adapters from special-purpose merges**
|
32 |
+
- **Custom base models and model_stocks, with LoRAs from from [huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2](https://huggingface.co/huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2) to minimize IFEVAL loss often seen in model_stock merges**
|
33 |
- **Separate branches for aggressive breadcrumbs and conservative DELLA merges**
|
34 |
- **Highly targeted weight/density gradients for every 2-4 layers, at each stage**
|
35 |
+
- **Finalization through SLERP+TIES merges recombining the breadcrumbs and DELLA branches to taste**
|
36 |
|
37 |
Lamarck's performance comes from an ancestry that goes back through careful merges to select finetuning work, upcycled and combined. Through intermediate merges, [arcee-ai/Virtuoso-Small](https://huggingface.co/arcee-ai/Virtuoso-Small) [sthenno-com/miscii-14b-1225](https://huggingface.co/sthenno-com/miscii-14b-1225) and [VAGOsolutions/SauerkrautLM-v2-14b-DPO](https://huggingface.co/VAGOsolutions/SauerkrautLM-v2-14b-DPO) are emphasized in early layers for extra BBH; later layers add synergistic influence from [deepseek-ai/DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), [Krystalan/DRT-o1-14B](https://huggingface.co/Krystalan/DRT-o1-14B), [EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2), and [CultriX/Qwen2.5-14B-Wernicke](https://huggingface.co/CultriX/Qwen2.5-14B-Wernicke).
|
38 |
|