@sometimesanotion on Hugging Face: "I'm just saving today's 14B parameter chart, because big things are about to…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

posted an update 7 days ago

Post

2504

I'm just saving today's 14B parameter chart, because big things are about to hit. Lamarck v0.7 has been surpassed by at least two models I know of, and in ways that promise good things to come for the whole scene. I am taking my time to enjoy the progress, and Lamarck v0.8 will come when it's clearly keeping up and keeping its flavor.

There is no one best model for everyone, regardless of these rankings. I aim to make Lamarck good at coding, translating, and rigorously critiquing rhetoric and logic. Always check out the authors' notes on models to see if their intent is close to your use case!

Inschrift-Spruch-Raum

6 days ago

•

edited 6 days ago

The list has been updated, and your Lamarck still holds the top spot, which is surprising
At the same time, we also see the great potential of Virtuoso-Small-v2 and Qwenconvergence-14B-v12-Prose-DS
However, I do not have a need to format the output. From this perspective, in my mind, Qwenconvergence-14B-v12-Prose-DS is truly the number one
👍👍👍
Unfortunately, although I have quantified Qwengenge-14B-v12-Prose-DS, I had to give up uploading it due to the impact of the network environment

sometimesanotion

6 days ago

•

edited 6 days ago

Wow! I have to check what I saw in comparator and based estimates off of. There's no doubt that Virtuoso Small v2 is a great model, and I'm already working on a Qwenvergence based on it. It's as awesome at IFEVAL and BBH as I'd thought.

Qwenvergence is the model_stock that makes the bases to blend in varying proportions across Lamarck's layers. Yet, it's not mere raw material. I'm getting really outstanding results out of Qwenvergence-14B-v12-Prose-DS's successor which includes Virtuoso Small v2. It's playing very nicely with the other components!

sometimesanotion

6 days ago

•

edited 6 days ago

@Inschrift-Spruch-Raum is right. My estimates based off of comparator are only partially correct. Are percentile calculations on the leaderboard changing? Regardless, this graph shows why nobody needs to give up on their models, especially when each one's making a specialized contribution. Diversity is a benefit to us all.

I really like how this class of models proves that MUSR isn't just what you get when you throw IFEVAL into a blender. 😆

CultriX

6 days ago

Would you mind doing a writeup about your customized mergekit workflow, or do you prefer to keep some of the secret sauce to yourself? ;)

sometimesanotion

6 days ago

•

edited 6 days ago

I've spilled some of the beans in little separate doses, because I've hoped to prompt people to fill in the blanks with unique ideas rather than inspire a lot of copypasta. There's a lot of stuff that is just unique to my own workflow, but there's also some reaaaaally long and detailed YAML.

I do feel that what happens between the model_stock + LoRAs and the SLERP+TIES has been loosely described. It really is just a bit of general info about which layers influence what metric, like multiple gradients overlaid. I tend to keep densities under 40 or even 30, because if there's a strong core model, each extra model needs to leave headroom for the others.

Hit me up, though, I'm particularly grateful for your contribution!

In this post