Lamarck-14B-v0.7 / README.md
sometimesanotion's picture
Update README.md
6b6863e verified
|
raw
history blame
3.37 kB
metadata
base_model:
  - sometimesanotion/Lamarck-14B-v0.6
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
  - sometimesanotion/Lamarck-14B-v0.3
  - sometimesanotion/Qwenvergence-14B-v9
  - sometimesanotion/Qwenvergence-14B-v3-Prose
  - arcee-ai/Virtuoso-Small
library_name: transformers
tags:
  - mergekit
  - merge
license: apache-2.0
language:
  - en
pipeline_tag: text-generation
metrics:
  - accuracy

Lamarck.webp

This version of the model has broken the 41.0 average maximum for 14B parameter models, and as of this writing, ranks #8 among models under 70B parameters on the Open LLM Leaderboard. Given the respectable performance in the 32B range, I think Lamarck deserves his shades. A little layer analysis in the 14B range goes a long, long way.

Lamarck 14B v0.7: A generalist merge focused on multi-step reasoning, prose, and multi-language ability. It is based on components that have punched above their weight in the 14 billion parameter class. It uses a custom toolchain to create and apply multiple sequences of complex merges:

  • Extracted LoRA adapters from special-purpose merges
  • Custom base models and model_stocks of original models with LoRAs from from huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2 to minimize IFEVAL loss often seen in model_stock merges
  • Separate branches for aggressive breadcrumbs and conservative DELLA merges
  • Highly targeted weight/density gradients for every 2-4 layers
  • Finalization through SLERP merges recombining the separate branches for stable performance

Lamarck's performance comes from an ancestry that goes back through careful merges to select finetuning work, upcycled and combined. Through intermediate merges, arcee-ai/Virtuoso-Small sthenno-com/miscii-14b-1225 and VAGOsolutions/SauerkrautLM-v2-14b-DPO are emphasized in early layers for extra BBH; later layers add synergistic influence from deepseek-ai/DeepSeek-R1-Distill-Qwen-14B, Krystalan/DRT-o1-14B, EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2, and CultriX/Qwen2.5-14B-Wernicke.

More subjectively, its prose and translation abilities are boosted by repeated re-emphasis of Krystalan/DRT-o1-14B and underwoods/medius-erebus-magnum-14b. Other models found in sometimesanotion/Qwenvergence-14B-v3-Prose have their impact on prose quality - and surprising synergy of reasoning.

Kudoes to @arcee-ai, @deepseek-ai, @Krystalan, @underwoods, @VAGOSolutions, @CultriX, @sthenno-com, and @rombodawg whose models had the most influence. Vimarckoso v3 has the model card which documents its extended lineage.