sometimesanotion commited on
Commit
e9db55f
·
verified ·
1 Parent(s): ec64395

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -56
README.md CHANGED
@@ -15,64 +15,19 @@ language:
15
  - en
16
  pipeline_tag: text-generation
17
  ---
18
- # output
19
-
20
- This is an experimental merge which pushes the merge techniques behind [sometimesanotion/Lamarck-14B-v0.6](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.6) further, and adds a merge of DeepSeek's R1 distillation to its mid to upper layers in addition to modifications which yielded promising results in previous 0.7 release candidates. How this will interact with Lamarck's merge from reasoning-heavy Qwenvergence models is unknown.
21
-
22
- Initial tests and sample output are very promising and suggest potential synergy. If this model lives up to expectations, I will document its lineage and design.
23
- ## Merge Details
24
- ### Merge Method
25
-
26
- This model was merged using the [TIES](https://arxiv.org/abs/2306.01708) merge method using sometimesanotion/finalize-slerp as a base.
27
-
28
- ### Models Merged
29
-
30
- The following models were included in the merge:
31
 
 
32
 
33
- ### Configuration
34
 
35
- The following YAML configuration was used to produce this model:
 
 
 
 
36
 
37
- ```yaml
38
- name: Lamarck-14B-v0.7-rc4-r1
39
- merge_method: ties
40
- base_model: sometimesanotion/finalize-slerp
41
- tokenizer_source: base
42
- dtype: float32
43
- out_dtype: bfloat16
44
- parameters:
45
- density: 1.00
46
- weight: 1.00
47
- int8_mask: true
48
- normalize: true
49
- rescale: false
50
- slices:
51
- - sources:
52
- - { layer_range: [ 0, 2 ], model: sometimesanotion/finalize-slerp }
53
- - sources:
54
- - { layer_range: [ 2, 6 ], model: sometimesanotion/finalize-slerp }
55
- - sources:
56
- - { layer_range: [ 6, 10 ], model: sometimesanotion/finalize-slerp }
57
- - sources:
58
- - { layer_range: [ 10, 14 ], model: sometimesanotion/finalize-slerp }
59
- - sources:
60
- - { layer_range: [ 14, 18 ], model: sometimesanotion/finalize-slerp }
61
- - sources:
62
- - { layer_range: [ 18, 22 ], model: sometimesanotion/finalize-slerp }
63
- - sources:
64
- - { layer_range: [ 22, 26 ], model: sometimesanotion/finalize-slerp }
65
- - sources:
66
- - { layer_range: [ 26, 30 ], model: sometimesanotion/finalize-slerp }
67
- - sources:
68
- - { layer_range: [ 30, 34 ], model: sometimesanotion/finalize-slerp }
69
- - sources:
70
- - { layer_range: [ 34, 38 ], model: sometimesanotion/finalize-slerp }
71
- - sources:
72
- - { layer_range: [ 38, 42 ], model: sometimesanotion/finalize-slerp }
73
- - sources:
74
- - { layer_range: [ 42, 46 ], model: sometimesanotion/finalize-slerp }
75
- - sources:
76
- - { layer_range: [ 46, 48 ], model: sometimesanotion/finalize-slerp }
77
 
78
- ```
 
15
  - en
16
  pipeline_tag: text-generation
17
  ---
18
+ ![Lamarck.webp](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.7/resolve/main/Lamarck.webp)
19
+ ---
 
 
 
 
 
 
 
 
 
 
 
20
 
21
+ > [!TIP] This update pushes the merge techniques behind [sometimesanotion/Lamarck-14B-v0.6](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.6) further, with notably better prose with underlying improvements in IFEVAL, MATH, and MUSR.
22
 
23
+ Lamarck 14B v0.7: A generalist merge focused on multi-step reasoning, prose, and multi-language ability. It is based on components that have punched above their weight in the 14 billion parameter class. It uses a custom toolchain to create and apply multiple sequences of complex merges:
24
 
25
+ - **Extracted LoRA adapters from special-purpose merges**
26
+ - **Custom base models and model_stocks of original models with LoRAs from from [huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2](https://huggingface.co/huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2) to minimize IFEVAL loss from model_stocks**
27
+ - **Separate branches for aggressive breadcrumbs and conservative DELLA merges**
28
+ - **Highly targeted weight/density gradients for every 2-4 layers**
29
+ - **Finalization through SLERP merges recombining the separate branches as is most stable**
30
 
31
+ Lamarck's performance comes from an ancestry that goes back through careful merges to select finetuning work, upcycled and combined. Through intermediate merges, [arcee-ai/Virtuoso-Small](https://huggingface.co/arcee-ai/Virtuoso-Small) [sthenno-com/miscii-14b-1225](https://huggingface.co/sthenno-com/miscii-14b-1225) and [VAGOsolutions/SauerkrautLM-v2-14b-DPO](https://huggingface.co/VAGOsolutions/SauerkrautLM-v2-14b-DPO) are emphasized in early layers for extra BBH; later layers add synergistic influence from [deepseek-ai/DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), [Krystalan/DRT-o1-14B](https://huggingface.co/Krystalan/DRT-o1-14B), [EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2), and [CultriX/Qwen2.5-14B-Wernicke](https://huggingface.co/CultriX/Qwen2.5-14B-Wernicke).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
+ Kudoes to @arcee-ai, @deepseek-ai, @Krystalan, @underwoods, @VAGOSolutions, @CultriX, @sthenno-com, and @rombodawg whose models had the most influence. [Vimarckoso v3](https://huggingface.co/sometimesanotion/Qwen2.5-14B-Vimarckoso-v3) has the model card which documents its extended lineage.