DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks Paper • 2502.17157 • Published 1 day ago • 39
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers Paper • 2502.15007 • Published 5 days ago • 132
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities Paper • 2401.12168 • Published Jan 22, 2024 • 27
Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path? Paper • 2502.15657 • Published 4 days ago • 4
Think Inside the JSON: Reinforcement Strategy for Strict LLM Schema Adherence Paper • 2502.14905 • Published 7 days ago • 8
PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data Paper • 2502.14397 • Published 6 days ago • 33
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published 5 days ago • 161
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published 5 days ago • 115
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation Paper • 2502.14846 • Published 5 days ago • 13
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published 5 days ago • 91
view article Article Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita 🔥 8 days ago • 89
view article Article PaliGemma 2 Mix - New Instruction Vision Language Models by Google 7 days ago • 59
Crowd Comparative Reasoning: Unlocking Comprehensive Evaluations for LLM-as-a-Judge Paper • 2502.12501 • Published 8 days ago • 6
HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation Paper • 2502.09838 • Published 12 days ago • 9
RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm Paper • 2502.12513 • Published 8 days ago • 15