sometimesanotion PRO

sometimesanotion

AI & ML interests

Agentic LLM services, model merging, finetunes, distillation

Recent Activity

Organizations

Hugging Face Discord Community's profile picture

Posts 5

view post
Post
2721
I'd like to draw your attention to a Lamarck-based experiment which uses Arcee AI's newly published arcee_fusion merge method for three out of its four merges. Yes, just four. This is a simple one, and its recipe is fully open:

sometimesanotion/Lamarck-14B-v0.7-Fusion

It unifies three branches, all of which feature models which bring Lamarck-14B-v0.7 and Qwenvergence-14B-v12-Prose together. One side features @jpacifico 's jpacifico/Chocolatine-2-14B-Instruct-v2.0.3 and the other features @suayptalha 's suayptalha/Lamarckvergence-14B paired with my models which were their merge ancestors.

A fusion merge - of a fusion merge and a SLERP of a fusion and older merge - should demonstrate the new merge method's behavior in interesting ways, especially in the first 1/4th of the model where the SLERP has less impact.

I welcome you to kick the tires and learn from it. It has prose quality near Qwenvergence v12's - as you'd expect.

Thank you, @mradermacher and @MaziyarPanahi , for the first-day quantizations! Your work helped get me started. https://huggingface.co/models?other=base_model:quantized:sometimesanotion/Lamarck-14B-v0.7-Fusion
view post
Post
741
I am really pleased to see jpacifico/Chocolatine-2-14B-Instruct-v2.0.3 take #4 on the 14B segment of the Open LLM leaderboard. It is a fine-tune of a merge of Arcee's arcee-ai/Virtuoso-Small-v2, and my sometimesanotion/Lamarck-14B-v0.7 and sometimesanotion/Qwenvergence-14B-v12-Prose-DS. Don't let the numbers fool you, in its element, it's quite smooth. I really enjoy merges of Lamarck with near siblings like this one.

Don't be surprised when it's challenging to bring in the full reasoning strength of a reason-heavy prose model like Qwenvergence v12-DS into a high IFEVAL model like Lamarck or Virtuoso Small v2. That's a lot of work to get right, because IFEVAL, precise reasoning, and prose quality are often in tension against each other. Gaining as much as this did is really respectable, and fine-tuning it makes it a more stable base for the coming iterations.

datasets

None public yet