34 4 124

sometimesanotion PRO

sometimesanotion

AI & ML interests

Agentic LLM services, model merging, finetunes, distillation

Recent Activity

new activity about 4 hours ago

sometimesanotion/Lamarck-14B-v0.7:Censored

reacted to CultriX's post with 🔥 about 4 hours ago

# Multi-Agent Collaboration for Coding Tasks - Updated Space! This version does not rely on AutoGen. The user simply enters his OPENAI_API_KEY and a task and the Space goes to work, employing a - 1. prompt-enhancer agent, - 2. an orchestrator agent, - 3. a coder agent, - 4. a code-reviewing agent and -5. a code documentation generator agent. See below image for an example workflow: https://huggingface.co/spaces/CultriX/MultiAgent-CodeTask

replied to their post about 4 hours ago

I'm just saving today's 14B parameter chart, because big things are about to hit. Lamarck v0.7 has been surpassed by at least two models I know of, and in ways that promise good things to come for the whole scene. I am taking my time to enjoy the progress, and Lamarck v0.8 will come when it's clearly keeping up and keeping its flavor. There is no one best model for everyone, regardless of these rankings. I aim to make Lamarck good at coding, translating, and rigorously critiquing rhetoric and logic. Always check out the authors' notes on models to see if their intent is close to your use case!

View all activity

Organizations

sometimesanotion's activity

New activity in sometimesanotion/Lamarck-14B-v0.7 about 4 hours ago

Censored

#2 opened about 12 hours ago by

jongames

reacted to CultriX's post with 🔥 about 4 hours ago

Post

584

# Multi-Agent Collaboration for Coding Tasks - Updated Space!

This version does not rely on AutoGen.
The user simply enters his OPENAI_API_KEY and a task and the Space goes to work, employing a
- 1. prompt-enhancer agent,
- 2. an orchestrator agent,
- 3. a coder agent,
- 4. a code-reviewing agent and
-5. a code documentation generator agent.

See below image for an example workflow:

CultriX/MultiAgent-CodeTask

1 reply

replied to their post about 4 hours ago

Okay, this has become a major component of how I build model_stocks that keep IFEVAL high even while merging distantly related models, and this is the reason for some TIES merges to "qwenvergify" models you might have seen.

Here's the basic idea:
https://www.arcee.ai/blog/use-mergekit-to-extract-lora-adapters-from-any-fine-tuned-model

But not as many models are inter-compatible for LoRAs as you'd expect, because there are minor variations in size among some important finetunes. I get the train tracks to a standard width, as it were, and make them intercompatible with the "qwenvergify" TIES merges between two models, weight 1.0 for the model of interest and weight 0.0 for any Qwenvergence or Lamarck model for the tiny bit of infill. You now have all models intercompatible for what is akin to a super-high-precision DELLA merge of the most significant parts of the model, the most IFEVAL-preserving parts of the model. A rank 512 adapter extracts around 30% of the most defining aspects of the model, but captures around 90% of its distinct performance. A rank 128 adapter captures around 8% of the model, but about 70% of its distinct performance.

I arrived at this while thinking about the implication of @rombodawg 's "Continuous Fine Tuning" strategy, and reading I-forget-which-arxiv-paper and I really need to find that again. It's like the opposite side of the coin from how rombodawg uses it. I use it at the beginning to get a large model_stock started. He uses it to extract most of your merge at the end and apply it to a target model to avoid catastrophic forgetting.

There. Now you know the methodology behind my merge YAML that produced https://huggingface.co/sometimesanotion/Qwenvergence-14B-v13-Prose-DS - or, the model that calls itself "Qwenconceited-14B-v13-DeepSuffering". 😆

Adapters from a strong IFEVAL+BBH model applied to the majority of the models in the model_stock merge, in a mixture of rank sizes between 32 and 128, get them on the same page for core operation. Applying a Virtuoso or Chocolatine-based LoRA to just any model out there could cause instability, but the model_stock smooths many varying levels of adapter merges out.

That's enough for you to digest for now, and @rombowdawg might be interested to know he inspired such a different strategy from anything he's shared.

updated a model about 8 hours ago

sometimesanotion/Lamarck-14B-v0.7

Text Generation • Updated about 8 hours ago • 1.02k • 23

liked a model about 9 hours ago

sthenno/tempesthenno-icy-0130

Text Generation • Updated 3 days ago • 52 • 8

liked a model about 10 hours ago

FINGU-AI/Chocolatine-Fusion-14B

Text Generation • Updated 2 days ago • 27 • 1

replied to their post about 11 hours ago

You can reach me on Discord, my username is as you'd expect.

Once I show you how Qwentinuum broke the barrier and finally got stabilized, and made Vimarckoso v3, you'll see why I'm being a little careful. It takes multiple steps to reliably tame weighty breadcrumbs merges, and I'm using Makefiles to make sure nothing gets skipped. That's not so easily posted to a modelcard! If people misuse parts of my recipe, especially with more CoT models out there, we'll get spammed with a lot of unstable models.

But the rewards of getting it right!

upvoted an article about 11 hours ago

Article

KV Caching Explained: Optimizing Transformer Inference Efficiency

•

6 days ago

• 23

updated a model about 12 hours ago

sometimesanotion/Qwenvergence-14B-v13-Prose-DS

Text Generation • Updated about 12 hours ago • 6 • 2

New activity in sometimesanotion/Qwenvergence-14B-v11 about 12 hours ago

What is the instruct template?

#1 opened about 12 hours ago by

Poro7

New activity in CultriX/Qwen2.5-14B-Qwentangledv2 about 12 hours ago

This is promising

#1 opened about 13 hours ago by

sometimesanotion

liked a model about 14 hours ago

CultriX/Qwen2.5-14B-Qwentangledv2

Text Generation • Updated about 14 hours ago • 3 • 2

replied to their post about 15 hours ago

@Inschrift-Spruch-Raum , I have a treat for you. I'm glad you liked v12 of Qwenvergence Prose - but I've struck gold. 13 just might be your lucky number!
https://huggingface.co/sometimesanotion/Qwenvergence-14B-v13-Prose-DS

replied to their post about 15 hours ago

I've really been pondering that, and it's almost certainly because of the blend of R1 and Krystalan/DRT-o1-14B. We have two different CoT lineages feeding into one model - wonderful, until it's not! DRT is a bit hard to give up. I think this is where we finally have done all we can do with merging, however fancy, and get down to fine-tuning, because if DRT and DS's influences sync up, it'll be magic.

published a model about 15 hours ago

sometimesanotion/Qwenvergence-14B-v13-Prose-DS

Text Generation • Updated about 12 hours ago • 6 • 2

replied to their post about 15 hours ago

I've spilled some of the beans in little separate doses, because I've hoped to prompt people to fill in the blanks with unique ideas rather than inspire a lot of copypasta. There's a lot of stuff that is just unique to my own workflow, but there's also some reaaaaally long and detailed YAML.

I do feel that what happens between the model_stock + LoRAs and the SLERP+TIES has been loosely described. It really is just a bit of general info about which layers influence what metric, like multiple gradients overlaid. I tend to keep densities under 40 or even 30, because if there's a strong core model, each extra model needs to leave headroom for the others.

Hit me up, though, I'm particularly grateful for your contribution!

updated a model about 17 hours ago

sometimesanotion/Qwenvergence-14B-v12-Prose-DS

Text Generation • Updated about 17 hours ago • 30 • 4

updated a model about 18 hours ago

sometimesanotion/Qwenvergence-14B-v11

Text Generation • Updated about 18 hours ago • 28 • 3

replied to their post about 19 hours ago

While Arcee beats Lamarck 0.7 and
tempesthenno-ppo-ckpt40 for IFEVAL, BBH, and MATH, you score 23.55% higher on GPQA, 1.96% higher on MUSR, and 2.49% higher on MUSR than Virtuoso Small v2.

Plus, I'm thinking you fine-tune for use cases Arcee and I don't.

replied to their post about 22 hours ago

@Inschrift-Spruch-Raum is right. My estimates based off of comparator are only partially correct. Are percentile calculations on the leaderboard changing? Regardless, this graph shows why nobody needs to give up on their models, especially when each one's making a specialized contribution. Diversity is a benefit to us all.

I really like how this class of models proves that MUSR isn't just what you get when you throw IFEVAL into a blender. 😆