onekq (Yi Cui)

posted an update 1 day ago

Post

1272

So 🐋DeepSeek🐋 hits the mainstream media. But it has been a star in our little cult for at least 6 months. Its meteoric success is not overnight, but two years in the making.

To learn their history, just look at their 🤗 repo https://huggingface.co/deepseek-ai

* End of 2023, they launched the first model (pretrained by themselves) following Llama 2 architecture
* June 2024, v2 (MoE architecture) surpassed Gemini 1.5, but behind Mistral
* September, v2.5 surpassed GPT 4o mini
* December, v3 surpassed GPT 4o
* Now R1 surpassed o1

Most importantly, if you think DeepSeek success is singular and unrivaled, that's WRONG. The following models are also near or equal the o1 bar.

* Minimax-01
* Kimi k1.5
* Doubao 1.5 pro

reacted to clem's post with 🔥 1 day ago

Post

1616

The 🐳 just crossed 10,000 followers on HF

https://huggingface.co/deepseek-ai

replied to their post 2 days ago

My conclusion is the same. The R1 paper already reported lower success rate of the distilled models. This is not surprising since we cannot expect the same outcomes out of a much smaller model.

Here is the problem. The small models released by frontier labs are always generic, i.e. decent but lower performance than the flagship model on every benchmark. But we GPU deplorables often want a specialized model which is excellent on only one thing, hence the disappointment.

I guess we will have to help ourselves on this one. Distill an opinionated dataset from the flagship model to a small model of your choice, then hill climb the benchmark you care about.

replied to their post 3 days ago

1000% agree.

Also reasoning models sure spit out lots of tokens. The same benchmark cost 4x or 5x the money and time to run than regular LLMs. Exciting time for inference players.

Have you tried the distilled models of R1(Qwen and Llama)?

replied to their post 4 days ago

+1

Also the velocity of progress. I have wanted to learn Monte Carlo Tree Search and process rewards etc. and haven't got the time. I guess now I can skip them 🤗

posted an update 5 days ago

Post

2560

This is historical. 🎉

DeepSeek 🐋R1🐋 surpassed OpenAI 🍓o1🍓 on the dual leaderboard. What a year for the open source!

onekq-ai/WebApp1K-models-leaderboard

posted an update 6 days ago

Post

4556

🐋DeepSeek 🐋 is the real OpenAI 😯

6 replies

·

posted an update 12 days ago

Post

1643

Qwen 2.5 Coder 32b is a dime among nickels. Amazing performance for its size, so much so it earns a spot in the duo leaderboard. The day of small models is here.

onekq-ai/WebApp1K-models-leaderboard
Qwen/Qwen2.5-Coder-32B-Instruct

posted an update 30 days ago

Post

3046

🐋 DeepSeek 🐋v3 achieves a solid 7 point jump than v2.5, surpassing GPT-4o, but is still behind 🍓 o1 🍓and Claude 3.5.

onekq-ai/WebApp1K-models-leaderboard

posted an update 3 months ago

Post

596

October version of Claude 3.5 lifts SOTA (set by its June version) by 7 points.
onekq-ai/WebApp1K-models-leaderboard

Closed sourced models are widening the gap again.

Note: Our frontier leaderboard now uses double test scenarios because the single-scenario test suit has been saturated.

posted an update 3 months ago

Post

1859

I'm now working on finetuning of coding models. If you are GPU-hungry like me, you will find quantized models very helpful. But quantization for finetuning and inference are different and incompatible. So I made two collections here.

Inference (GGUF, via Ollama, CPU is enough)
onekq-ai/ollama-ready-coding-models-67118c3cfa1af2cf04a926d6

Finetuning (Bitsandbytes, QLora, GPU is needed)
onekq-ai/qlora-ready-coding-models-67118771ce001b8f4cf946b2

For quantization, the inference models are far more popular on HF than finetuning models. I use https://huggingface.co/QuantFactory to generate inference models (GGUF), and there are a few other choices.

But there hasn't been such a service for finetuning models. DIY isn't too hard though. I made a few myself and you can find the script in the model cards. If the original model is small enough, you can even do it on a free T4 (available via Google Colab).

If you know a (small) coding model worthy of quantization, please let me know and I'd love to add it to the collections.

reacted to fdaudens's post with 🔥 4 months ago

Post

1806

Exciting news in AI: Molmo, a groundbreaking family of open-source multimodal models, has just been announced! 🚀

Key points:
- Closes the gap with proprietary systems on benchmarks & human evals
- Trained on high-quality data (< 1M image-text pairs vs billions)
- Introduces pointing capability for rich interactions
- Fully open weights, data, and training code

The 72B model outperforms several proprietary systems, while the 1B model nearly matches GPT-4V. Small is indeed the new big in AI!

There's an interactive demo available using Molmo-7B-D. Definitely worth checking out to see its capabilities firsthand.

All model weights, data, and code will be released soon. This is a significant step towards truly open, cutting-edge multimodal AI.
The future of AI research and applications is looking brighter than ever! 🤖🖼️

👉 Demo: https://molmo.allenai.org/
👉 Models: allenai/molmo-66f379e6fe3b8ef090a8ca19

#AI #MachineLearning #OpenSource #ComputerVision

reacted to victor's post with 👍🤗 4 months ago

Post

5659

🙋 Calling all Hugging Face users! We want to hear from YOU!

What feature or improvement would make the biggest impact on Hugging Face?

Whether it's the Hub, better documentation, new integrations, or something completely different – we're all ears!

Your feedback shapes the future of Hugging Face. Drop your ideas in the comments below! 👇

171 replies

·

reacted to their post with 🧠 4 months ago

Post

2568

Here is my latest study on OpenAI🍓o1🍓.
A Case Study of Web App Coding with OpenAI Reasoning Models (2409.13773)

I wrote an easy-to-read blogpost to explain finding.
https://huggingface.co/blog/onekq/daily-software-engineering-work-reasoning-models

INSTRUCTION FOLLOWING is the key.

100% instruction following + Reasoning = new SOTA

But if the model misses or misunderstands one instruction, it can perform far worse than non-reasoning models.

posted an update 4 months ago

Post

2568

Here is my latest study on OpenAI🍓o1🍓.
A Case Study of Web App Coding with OpenAI Reasoning Models (2409.13773)

I wrote an easy-to-read blogpost to explain finding.
https://huggingface.co/blog/onekq/daily-software-engineering-work-reasoning-models

INSTRUCTION FOLLOWING is the key.

100% instruction following + Reasoning = new SOTA

But if the model misses or misunderstands one instruction, it can perform far worse than non-reasoning models.

posted an update 4 months ago

Post

430

Announce 🎉 WebApp1K-Duo 🎉
onekq-ai/WebApp1K-Duo-React

This is to keep up the challenge after OpenAI o1 models saturated the WebApp1K benchmark. The new benchmark brings SOTA to 67%. Let the hill climbing commence!
onekq-ai/WebApp1K-models-leaderboard

PS: I will publish more findings soon.

reacted to KingNish's post with 👍 4 months ago

Post

3583

Mistral Nemo is better than many models in 1st grader level reasoning.

replied to zhabotorabi's post 4 months ago

the Mistral API? the model name is probably diffrent. I used mistral-large-2 but had to use the name mistral-large-latest. The team will help you via chat.

posted an update 4 months ago

Post

555

🐋 DeepSeek 🐋2.5 is hands-down the best open-source model, leaving its peers way behind. It even beats GPT-4o mini.

onekq-ai/WebApp1K-models-leaderboard

The inference of the official API is painfully slow though. I heard the team is short on GPUs (well, who isn't).

Yi Cui

AI & ML interests

Recent Activity

Articles

Does Daily Software Engineering Work Need Reasoning Models?

All LLMs Write Great Code, But Some Make (A Lot) Fewer Mistakes

Organizations

onekq's activity