LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers Paper • 2502.15007 • Published 17 days ago • 160
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation Paper • 2502.14846 • Published 17 days ago • 13
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 21 days ago • 141
view article Article What is test-time compute and how to scale it? By Kseniase and 1 other • about 1 month ago • 53
view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control Feb 4 • 110
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training Paper • 2501.18511 • Published Jan 30 • 19
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Paper • 2501.16975 • Published Jan 28 • 26
view article Article **Topic 24: What is Cosmos World Foundation Model Platform?** By Kseniase and 1 other • Jan 23 • 6
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper • 2501.06282 • Published Jan 10 • 48
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper • 2501.06186 • Published Jan 10 • 61
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published Jan 8 • 260
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models Paper • 2501.03262 • Published Jan 4 • 92
view article Article 🐺🐦⬛ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark By wolfram • Jan 2 • 40
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment Paper • 2412.19326 • Published Dec 26, 2024 • 18
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper • 2412.18319 • Published Dec 24, 2024 • 37