15 47 309

alkinun

AtAndDev

AI & ML interests

LLMs, Alignment, Merging, Unsloth, DPO, SFT, ORPO, SPIN..

Recent Activity

liked a model 3 days ago

briaai/RMBG-2.0

liked a Space 3 days ago

not-lain/background-removal

reacted to InferenceIllusionist's post with 🔥 4 days ago

MilkDropLM-32b-v0.3: Unlocking Next-Gen Visuals ✨ Stoked to release the latest iteration of our MilkDropLM project! This new release is based on the powerful Qwen2.5-Coder-32B-Instruct model using the same great dataset that powered our 7b model. What's new? - Genome Unlocked: Deeper understanding of preset relationships for more accurate and creative generations. - Preset Revival: Breathe new life into old presets with our upgraded model! - Loop-B-Gone: Say goodbye to pesky loops and hello to smooth generation. - Natural Chats: Engage in more natural sounding conversations with our LLM than ever before. Released under Apache 2.0, because sharing is caring! Try it out: https://huggingface.co/InferenceIllusionist/MilkDropLM-32b-v0.3 Shoutout to @superwatermelon for his invaluable insights and collab, and to all those courageous members in the community that have tested and provided feedback before!

View all activity

Organizations

AtAndDev's activity

liked a model 3 days ago

briaai/RMBG-2.0

Image Segmentation • Updated 2 days ago • 230k • 534

liked a Space 3 days ago

Running on Zero

1.07k

🌘w🌖

Background Removal

reacted to InferenceIllusionist's post with 🔥 4 days ago

Post

1849

MilkDropLM-32b-v0.3: Unlocking Next-Gen Visuals ✨

Stoked to release the latest iteration of our MilkDropLM project! This new release is based on the powerful Qwen2.5-Coder-32B-Instruct model using the same great dataset that powered our 7b model.

What's new?

- Genome Unlocked: Deeper understanding of preset relationships for more accurate and creative generations.

- Preset Revival: Breathe new life into old presets with our upgraded model!

- Loop-B-Gone: Say goodbye to pesky loops and hello to smooth generation.

- Natural Chats: Engage in more natural sounding conversations with our LLM than ever before.

Released under Apache 2.0, because sharing is caring!

Try it out: InferenceIllusionist/MilkDropLM-32b-v0.3

Shoutout to @superwatermelon for his invaluable insights and collab, and to all those courageous members in the community that have tested and provided feedback before!

reacted to suayptalha's post with ❤️👍🔥 4 days ago

Post

1551

🚀 FastLlama Series is Live!

🦾 Experience faster, lighter, and smarter language models! The new FastLlama makes Meta's LLaMA models work with smaller file sizes, lower system requirements, and higher performance. The model supports 8 languages, including English, German, and Spanish.

🤖 Built on the LLaMA 3.2-1B-Instruct model, fine-tuned with Hugging Face's SmolTalk and MetaMathQA-50k datasets, and powered by LoRA (Low-Rank Adaptation) for groundbreaking mathematical reasoning.

💻 Its compact size makes it versatile for a wide range of applications!
💬 Chat with the model:
🔗 Chat Link: suayptalha/Chat-with-FastLlama
🔗 Model Link: suayptalha/FastLlama-3.2-1B-Instruct

reacted to anton-l's post with 🔥🚀 4 days ago

Post

1964

Introducing 📐𝐅𝐢𝐧𝐞𝐌𝐚𝐭𝐡: the best public math pre-training dataset with 50B+ tokens!
HuggingFaceTB/finemath

Math remains challenging for LLMs and by training on FineMath we see considerable gains over other math datasets, especially on GSM8K and MATH.

We build the dataset by:
🛠️ carefully extracting math data from Common Crawl;
🔎 iteratively filtering and recalling high quality math pages using a classifier trained on synthetic annotations to identify math reasoning and deduction.

We conducted a series of ablations comparing the performance of Llama-3.2-3B-Base after continued pre-training on FineMath and observe notable gains compared to the baseline model and other public math datasets.

We hope this helps advance the performance of LLMs on math and reasoning! 🚀
We’re also releasing all the ablation models as well as the evaluation code.

HuggingFaceTB/finemath-6763fb8f71b6439b653482c2

reacted to etemiz's post with 😔🧠👀 4 days ago

Post

2266

As more synthetic datasets are made, we move slowly away from human alignment.

4 replies

upvoted a paper 4 days ago

Instruction Pre-Training: Language Models are Supervised Multitask Learners

Paper • 2406.14491 • Published Jun 20 • 86

reacted to m-ric's post with 👍 4 days ago

Post

2024

𝐇𝐮𝐠𝐠𝐢𝐧𝐠 𝐅𝐚𝐜𝐞 𝐫𝐞𝐥𝐞𝐚𝐬𝐞𝐬 𝐏𝐢𝐜𝐨𝐭𝐫𝐨𝐧, 𝐚 𝐦𝐢𝐜𝐫𝐨𝐬𝐜𝐨𝐩𝐢𝐜 𝐥𝐢𝐛 𝐭𝐡𝐚𝐭 𝐬𝐨𝐥𝐯𝐞𝐬 𝐋𝐋𝐌 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝟒𝐃 𝐩𝐚𝐫𝐚𝐥𝐥𝐞𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧 🥳

🕰️ Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years.

👴🏻 If they had needed all this time, we would have GPU stories from the time of Pharaoh 𓂀: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons "

🛠️ But instead, they just parallelized the training on 24k H100s, which made it take just a few months.
This required parallelizing across 4 dimensions: data, tensor, context, pipeline.
And it is infamously hard to do, making for bloated code repos that hold together only by magic.

🤏 𝗕𝘂𝘁 𝗻𝗼𝘄 𝘄𝗲 𝗱𝗼𝗻'𝘁 𝗻𝗲𝗲𝗱 𝗵𝘂𝗴𝗲 𝗿𝗲𝗽𝗼𝘀 𝗮𝗻𝘆𝗺𝗼𝗿𝗲! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry.
And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening!

⚡ 𝗜𝘁'𝘀 𝘁𝗶𝗻𝘆, 𝘆𝗲𝘁 𝗽𝗼𝘄𝗲𝗿𝗳𝘂𝗹:
Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this)

Go take a look 👉 https://github.com/huggingface/picotron/tree/main/picotron

1 reply

replied to their post 6 days ago

gotcha

liked a Space 6 days ago

Running on T4

184

🐤

Canary 1b

liked 2 models 6 days ago

openai/whisper-large-v3

Automatic Speech Recognition • Updated Aug 12 • 4.16M • • 3.87k

nvidia/canary-1b

Automatic Speech Recognition • Updated May 8 • 69.3k • 337

reacted to akhaliq's post with 🚀👀🔥 6 days ago

Post

2366

Google drops Gemini 2.0 Flash Thinking

a new experimental model that unlocks stronger reasoning capabilities and shows its thoughts. The model plans (with thoughts visible), can solve complex problems with Flash speeds, and more

now available in anychat, try it out: akhaliq/anychat