sometimesanotion
/

Lamarck-14B-v0.7

@@ -15,64 +15,19 @@ language:
 - en
 pipeline_tag: text-generation
 ---
-# output
-This is an experimental merge which pushes the merge techniques behind [sometimesanotion/Lamarck-14B-v0.6](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.6) further, and adds a merge of DeepSeek's R1 distillation to its mid to upper layers in addition to modifications which yielded promising results in previous 0.7 release candidates.  How this will interact with Lamarck's merge from reasoning-heavy Qwenvergence models is unknown.
-Initial tests and sample output are very promising and suggest potential synergy.  If this model lives up to expectations, I will document its lineage and design.
-## Merge Details
-### Merge Method
-This model was merged using the [TIES](https://arxiv.org/abs/2306.01708) merge method using sometimesanotion/finalize-slerp as a base.
-### Models Merged
-The following models were included in the merge:
-### Configuration
-The following YAML configuration was used to produce this model:
-```yaml
-name:                Lamarck-14B-v0.7-rc4-r1
-merge_method:        ties
-base_model:          sometimesanotion/finalize-slerp
-tokenizer_source:    base
-dtype:               float32
-out_dtype:           bfloat16
-parameters:
-  density:           1.00
-  weight:            1.00
-  int8_mask:         true
-  normalize:         true
-  rescale:           false
-slices:
-  - sources:
-    - { layer_range: [  0,  2 ], model: sometimesanotion/finalize-slerp }
-  - sources:
-    - { layer_range: [  2,  6 ], model: sometimesanotion/finalize-slerp }
-  - sources:
-    - { layer_range: [  6, 10 ], model: sometimesanotion/finalize-slerp }
-  - sources:
-    - { layer_range: [ 10, 14 ], model: sometimesanotion/finalize-slerp }
-  - sources:
-    - { layer_range: [ 14, 18 ], model: sometimesanotion/finalize-slerp }
-  - sources:
-    - { layer_range: [ 18, 22 ], model: sometimesanotion/finalize-slerp }
-  - sources:
-    - { layer_range: [ 22, 26 ], model: sometimesanotion/finalize-slerp }
-  - sources:
-    - { layer_range: [ 26, 30 ], model: sometimesanotion/finalize-slerp }
-  - sources:
-    - { layer_range: [ 30, 34 ], model: sometimesanotion/finalize-slerp }
-  - sources:
-    - { layer_range: [ 34, 38 ], model: sometimesanotion/finalize-slerp }
-  - sources:
-    - { layer_range: [ 38, 42 ], model: sometimesanotion/finalize-slerp }
-  - sources:
-    - { layer_range: [ 42, 46 ], model: sometimesanotion/finalize-slerp }
-  - sources:
-    - { layer_range: [ 46, 48 ], model: sometimesanotion/finalize-slerp }
-```

 - en
 pipeline_tag: text-generation
 ---
+![Lamarck.webp](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.7/resolve/main/Lamarck.webp)
+---
+> [!TIP] This update pushes the merge techniques behind [sometimesanotion/Lamarck-14B-v0.6](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.6) further, with notably better prose with underlying improvements in IFEVAL, MATH, and MUSR.
+Lamarck 14B v0.7:  A generalist merge focused on multi-step reasoning, prose, and multi-language ability.  It is based on components that have punched above their weight in the 14 billion parameter class.  It uses a custom toolchain to create and apply multiple sequences of complex merges:
+- **Extracted LoRA adapters from special-purpose merges**
+- **Custom base models and model_stocks of original models with LoRAs from from [huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2](https://huggingface.co/huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2) to minimize IFEVAL loss from model_stocks**
+- **Separate branches for aggressive breadcrumbs and conservative DELLA merges**
+- **Highly targeted weight/density gradients for every 2-4 layers**
+- **Finalization through SLERP merges recombining the separate branches as is most stable**
+Lamarck's performance comes from an ancestry that goes back through careful merges to select finetuning work, upcycled and combined.  Through intermediate merges, [arcee-ai/Virtuoso-Small](https://huggingface.co/arcee-ai/Virtuoso-Small) [sthenno-com/miscii-14b-1225](https://huggingface.co/sthenno-com/miscii-14b-1225) and [VAGOsolutions/SauerkrautLM-v2-14b-DPO](https://huggingface.co/VAGOsolutions/SauerkrautLM-v2-14b-DPO) are emphasized in early layers for extra BBH; later layers add synergistic influence from [deepseek-ai/DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), [Krystalan/DRT-o1-14B](https://huggingface.co/Krystalan/DRT-o1-14B), [EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2), and [CultriX/Qwen2.5-14B-Wernicke](https://huggingface.co/CultriX/Qwen2.5-14B-Wernicke).
+Kudoes to @arcee-ai, @deepseek-ai, @Krystalan, @underwoods, @VAGOSolutions, @CultriX, @sthenno-com, and @rombodawg whose models had the most influence.  [Vimarckoso v3](https://huggingface.co/sometimesanotion/Qwen2.5-14B-Vimarckoso-v3) has the model card which documents its extended lineage.