Submitted by akhaliq 43 VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing · 4 authors 3
Submitted by Canyu 39 DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks · 8 authors 2
Submitted by gallilmaimon 37 Slamming: Training a Speech Language Model on One GPU in a Day · 3 authors 1
Submitted by CheeryLJH 19 CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models · 18 authors 1
Submitted by Facico 16 Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment · 7 authors 2
Submitted by amphora 14 Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning · 4 authors 1
Submitted by xw-eric 14 Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models · 8 authors 1
Submitted by TianjinHuang 11 Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam · 11 authors 1
Submitted by akhaliq 11 RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers · 6 authors 2
Submitted by xhyandwyy 10 Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration · 7 authors 1
Submitted by jianlanluo 8 Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation · 6 authors 1
Submitted by irenesolaiman 8 Beyond Release: Access Considerations for Generative AI Systems · 7 authors 1
Submitted by callanwu 7 Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties · 5 authors 3
Submitted by GPaolo 6 TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning · 5 authors 1
Submitted by dalime 3 Investigating the Impact of Quantization Methods on the Safety and Reliability of Large Language Models · 6 authors 1
Submitted by zouharvi 2 Early-Exit and Instant Confidence Translation Quality Estimation · 5 authors 2
Submitted by ludolara 1 Diagnosing COVID-19 Severity from Chest X-Ray Images Using ViT and CNN Architectures · 4 authors 1
Submitted by nielsr 1 M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment · 6 authors 1