Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam Paper • 2502.17055 • Published 1 day ago • 11
Can Community Notes Replace Professional Fact-Checkers? Paper • 2502.14132 • Published 6 days ago • 5
Forecasting Open-Weight AI Model Growth on Hugging Face Paper • 2502.15987 • Published 4 days ago • 8
Slamming: Training a Speech Language Model on One GPU in a Day Paper • 2502.15814 • Published 6 days ago • 37
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks Paper • 2502.17157 • Published 1 day ago • 39
GCC: Generative Color Constancy via Diffusing a Color Checker Paper • 2502.17435 • Published 1 day ago • 19
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models Paper • 2502.16614 • Published 2 days ago • 19
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment Paper • 2502.16894 • Published 1 day ago • 16
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning Paper • 2502.17407 • Published 1 day ago • 14
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models Paper • 2502.16033 • Published 4 days ago • 14
Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration Paper • 2502.17110 • Published 1 day ago • 10
RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers Paper • 2502.15894 • Published 4 days ago • 11
Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation Paper • 2502.16707 • Published 2 days ago • 8
VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing Paper • 2502.17258 • Published 1 day ago • 44