There is no one best model for everyone, regardless of these rankings. I aim to make Lamarck good at coding, translating, and rigorously critiquing rhetoric and logic. Always check out the authors' notes on models to see if their intent is close to your use case!
Join the conversation
Join the community of Machine Learners and AI enthusiasts.
Sign UpThere is no one best model for everyone, regardless of these rankings. I aim to make Lamarck good at coding, translating, and rigorously critiquing rhetoric and logic. Always check out the authors' notes on models to see if their intent is close to your use case!
The list has been updated, and your Lamarck still holds the top spot, which is surprising
At the same time, we also see the great potential of Virtuoso-Small-v2 and Qwenconvergence-14B-v12-Prose-DS
However, I do not have a need to format the output. From this perspective, in my mind, Qwenconvergence-14B-v12-Prose-DS is truly the number one
πππ
Unfortunately, although I have quantified Qwengenge-14B-v12-Prose-DS, I had to give up uploading it due to the impact of the network environment
Wow! I have to check what I saw in comparator and based estimates off of. There's no doubt that Virtuoso Small v2 is a great model, and I'm already working on a Qwenvergence based on it. It's as awesome at IFEVAL and BBH as I'd thought.
Qwenvergence is the model_stock that makes the bases to blend in varying proportions across Lamarck's layers. Yet, it's not mere raw material. I'm getting really outstanding results out of Qwenvergence-14B-v12-Prose-DS's successor which includes Virtuoso Small v2. It's playing very nicely with the other components!
@Inschrift-Spruch-Raum is right. My estimates based off of comparator are only partially correct. Are percentile calculations on the leaderboard changing? Regardless, this graph shows why nobody needs to give up on their models, especially when each one's making a specialized contribution. Diversity is a benefit to us all.
I really like how this class of models proves that MUSR isn't just what you get when you throw IFEVAL into a blender. π
Would you mind doing a writeup about your customized mergekit workflow, or do you prefer to keep some of the secret sauce to yourself? ;)
I've spilled some of the beans in little separate doses, because I've hoped to prompt people to fill in the blanks with unique ideas rather than inspire a lot of copypasta. There's a lot of stuff that is just unique to my own workflow, but there's also some reaaaaally long and detailed YAML.
I do feel that what happens between the model_stock + LoRAs and the SLERP+TIES has been loosely described. It really is just a bit of general info about which layers influence what metric, like multiple gradients overlaid. I tend to keep densities under 40 or even 30, because if there's a strong core model, each extra model needs to leave headroom for the others.
Hit me up, though, I'm particularly grateful for your contribution!