9 85 579

Anthonny OLIME

Citaman

Citaman

AI & ML interests

None yet

Recent Activity

liked a model 13 minutes ago

unsloth/DeepSeek-R1-GGUF

liked a model 22 minutes ago

deepseek-ai/Janus-Pro-7B

liked a model 34 minutes ago

deepseek-ai/Janus-Pro-1B

View all activity

Organizations

Citaman's activity

liked a model 13 minutes ago

unsloth/DeepSeek-R1-GGUF

Text Generation • Updated 37 minutes ago • 27.2k • 88

liked a model 22 minutes ago

deepseek-ai/Janus-Pro-7B

Any-to-Any • Updated 41 minutes ago • 81

liked a model 34 minutes ago

deepseek-ai/Janus-Pro-1B

Any-to-Any • Updated 41 minutes ago • 22

liked a model 37 minutes ago

THUDM/glm-4-9b-chat-1m-hf

Text Generation • Updated about 10 hours ago • 78 • 5

liked a dataset 38 minutes ago

THUDM/T1

Viewer • Updated 7 days ago • 10k • 17 • 1

upvoted a paper about 1 hour ago

Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling

Paper • 2501.11651 • Published 7 days ago • 1

liked a model about 18 hours ago

baichuan-inc/Baichuan-M1-14B-Instruct

Updated 2 days ago • 4.05k • 23

liked a model about 19 hours ago

baichuan-inc/Baichuan-Omni-1d5

Updated 1 day ago • 37 • 18

reacted to merve's post with 🔥 3 days ago

Post

3440

Oof, what a week! 🥵 So many things have happened, let's recap! merve/jan-24-releases-6793d610774073328eac67a9

Multimodal 💬
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG 💗
- UI-TARS are new models by ByteDance to unlock agentic GUI control 🤯 in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark

LLMs 📖
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! 🤯
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)

Audio 🗣️
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO

Image/Video/3D Generation ⏯️
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images