sometimesanotion (sometimesanotion)

replied to csabakecskemeti's post about 13 hours ago

I've been doing all my LoRA work on AMD hardware with Linux; I'm looking forward to your notes! I sometimes still do it on CPU because it's easy to renice the task priority so the foreground tasks stay snappy.

The main challenge I have is keeping a solid ROCm bitsandbytes install when other packages want updates.

reacted to csabakecskemeti's post with 🚀 about 13 hours ago

Post

2296

Testing Training on AMD/ROCm the first time!

I've got my hands on an AMD Instinct MI100. It's about the same price used as a V100 but on paper has more TOPS (V100 14TOPS vs MI100 23TOPS) also the HBM has faster clock so the memory bandwidth is 1.2TB/s.
For quantized inference it's a beast (MI50 was also surprisingly fast)

For LORA training with this quick test I could not make the bnb config works so I'm running the FT on the fill size model.

Will share all the install, setup and setting I've learned in a blog post, together with the cooling shroud 3D design.

8 replies

·

replied to their post about 13 hours ago

Thank you! If you liked this, and want 14B models with even smoother command of French, I'm a huge fan of https://huggingface.co/jpacifico/Chocolatine-2-14B-Instruct-v2.0.3!

posted an update 2 days ago

Post

4047

I'd like to draw your attention to a Lamarck-based experiment which uses Arcee AI's newly published arcee_fusion merge method for three out of its four merges. Yes, just four. This is a simple one, and its recipe is fully open:

sometimesanotion/Lamarck-14B-v0.7-Fusion

It unifies three branches, all of which feature models which bring Lamarck-14B-v0.7 and Qwenvergence-14B-v12-Prose together. One side features @jpacifico 's jpacifico/Chocolatine-2-14B-Instruct-v2.0.3 and the other features @suayptalha 's suayptalha/Lamarckvergence-14B paired with my models which were their merge ancestors.

A fusion merge - of a fusion merge and a SLERP of a fusion and older merge - should demonstrate the new merge method's behavior in interesting ways, especially in the first 1/4th of the model where the SLERP has less impact.

I welcome you to kick the tires and learn from it. It has prose quality near Qwenvergence v12's - as you'd expect.

Thank you, @mradermacher and @MaziyarPanahi , for the first-day quantizations! Your work helped get me started. https://huggingface.co/models?other=base_model:quantized:sometimesanotion/Lamarck-14B-v0.7-Fusion

4 replies

·

replied to jjokah's post 2 days ago

Right-sizing language models is something I'm really here for. I find that a 1.5B parameter model fronting simple questions from a backing RAG source that a larger model gradually works on is more scalable. Classic information sources and stores can be QA'd, and they don't have such huge energy footprints.

AI will work out better if we give humans, classic code, SLMs, and frontier LLMs the roles they're right-sized for, and ensure data privacy and individual dignity at every stage of the contract.

reacted to jjokah's post with 👍 2 days ago

Post

4519

The past few years have been a blast for artificial intelligence, with large language models (LLMs) stunning everyone with their capabilities and powering everything from chatbots to code assistants. However, not all applications demand the massive size and complexity of LLMs, the computational power required makes them impractical for many use cases. This is why Small Language Models (SLMs) entered the scene to make powerful AI models more accessible by shrinking in size.

In this article we went through what SLMs are, how they are made small, their benefits and limitations, real-world use cases, and how they can be used on mobile and desktop devices.
https://huggingface.co/blog/jjokah/small-language-model

2 replies

·

replied to their post 13 days ago

This comment has been hidden

posted an update 18 days ago

Post

755

I am really pleased to see jpacifico/Chocolatine-2-14B-Instruct-v2.0.3 take #4 on the 14B segment of the Open LLM leaderboard. It is a fine-tune of a merge of Arcee's arcee-ai/Virtuoso-Small-v2, and my sometimesanotion/Lamarck-14B-v0.7 and sometimesanotion/Qwenvergence-14B-v12-Prose-DS. Don't let the numbers fool you, in its element, it's quite smooth. I really enjoy merges of Lamarck with near siblings like this one.

Don't be surprised when it's challenging to bring in the full reasoning strength of a reason-heavy prose model like Qwenvergence v12-DS into a high IFEVAL model like Lamarck or Virtuoso Small v2. That's a lot of work to get right, because IFEVAL, precise reasoning, and prose quality are often in tension against each other. Gaining as much as this did is really respectable, and fine-tuning it makes it a more stable base for the coming iterations.

1 reply

·

reacted to sequelbox's post with ➕ 21 days ago

Post

1899

New sneak preview of my next release! Raiden is a deepseek-ai/DeepSeek-R1 synthetic dataset that uses creative-reasoning and analytic-reasoning prompts!

This preview release has the first 5.8k rows, all responses generated using DeepSeek's 685b parameter R1 model: sequelbox/Raiden-DSR1-PREVIEW

Enjoy this look at R1's reasoning skills! Full dataset coming soon.

reacted to CultriX's post with 🔥 21 days ago

Post

1757

# Multi-Agent Collaboration for Coding Tasks - Updated Space!

This version does not rely on AutoGen.
The user simply enters his OPENAI_API_KEY and a task and the Space goes to work, employing a
- 1. prompt-enhancer agent,
- 2. an orchestrator agent,
- 3. a coder agent,
- 4. a code-reviewing agent and
-5. a code documentation generator agent.

See below image for an example workflow:

CultriX/MultiAgent-CodeTask

1 reply

·

replied to their post 21 days ago

Okay, this has become a major component of how I build model_stocks that keep IFEVAL high even while merging distantly related models, and this is the reason for some TIES merges to "qwenvergify" models you might have seen.

Here's the basic idea:
https://www.arcee.ai/blog/use-mergekit-to-extract-lora-adapters-from-any-fine-tuned-model

But not as many models are inter-compatible for LoRAs as you'd expect, because there are minor variations in size among some important finetunes. I get the train tracks to a standard width, as it were, and make them intercompatible with the "qwenvergify" TIES merges between two models, weight 1.0 for the model of interest and weight 0.0 for any Qwenvergence or Lamarck model for the tiny bit of infill. You now have all models intercompatible for what is akin to a super-high-precision DELLA merge of the most significant parts of the model, the most IFEVAL-preserving parts of the model. A rank 512 adapter extracts around 30% of the most defining aspects of the model, but captures around 90% of its distinct performance. A rank 128 adapter captures around 8% of the model, but about 70% of its distinct performance.

I arrived at this while thinking about the implication of @rombodawg 's "Continuous Fine Tuning" strategy, and reading I-forget-which-arxiv-paper and I really need to find that again. It's like the opposite side of the coin from how rombodawg uses it. I use it at the beginning to get a large model_stock started. He uses it to extract most of your merge at the end and apply it to a target model to avoid catastrophic forgetting.

There. Now you know the methodology behind my merge YAML that produced https://huggingface.co/sometimesanotion/Qwenvergence-14B-v13-Prose-DS - or, the model that calls itself "Qwenconceited-14B-v13-DeepSuffering". 😆

Adapters from a strong IFEVAL+BBH model applied to the majority of the models in the model_stock merge, in a mixture of rank sizes between 32 and 128, get them on the same page for core operation. Applying a Virtuoso or Chocolatine-based LoRA to just any model out there could cause instability, but the model_stock smooths many varying levels of adapter merges out.

That's enough for you to digest for now, and @rombodawg might be interested to know he inspired such a different strategy from anything he's shared.

replied to their post 21 days ago

You can reach me on Discord, my username is as you'd expect.

Once I show you how Qwentinuum broke the barrier and finally got stabilized, and made Vimarckoso v3, you'll see why I'm being a little careful. It takes multiple steps to reliably tame weighty breadcrumbs merges, and I'm using Makefiles to make sure nothing gets skipped. That's not so easily posted to a modelcard! If people misuse parts of my recipe, especially with more CoT models out there, we'll get spammed with a lot of unstable models.

But the rewards of getting it right!