So 🐋DeepSeek🐋 hits the mainstream media. But it has been a star in our little cult for at least 6 months. Its meteoric success is not overnight, but two years in the making.
* End of 2023, they launched the first model (pretrained by themselves) following Llama 2 architecture * June 2024, v2 (MoE architecture) surpassed Gemini 1.5, but behind Mistral * September, v2.5 surpassed GPT 4o mini * December, v3 surpassed GPT 4o * Now R1 surpassed o1
Most importantly, if you think DeepSeek success is singular and unrivaled, that's WRONG. The following models are also near or equal the o1 bar.
Qwen 2.5 Coder 32b is a dime among nickels. Amazing performance for its size, so much so it earns a spot in the duo leaderboard. The day of small models is here.