DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion Paper • 2503.01183 • Published 6 days ago • 26
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs Paper • 2503.01743 • Published 6 days ago • 65
LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation Paper • 2502.20583 • Published 10 days ago • 11
Predictive Data Selection: The Data That Predicts Is the Data That Teaches Paper • 2503.00808 • Published 7 days ago • 51
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers Paper • 2502.15007 • Published 17 days ago • 160
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published 17 days ago • 177
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published 17 days ago • 128
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation Paper • 2502.13128 • Published 19 days ago • 37
Soundwave: Less is More for Speech-Text Alignment in LLMs Paper • 2502.12900 • Published 19 days ago • 76
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? Paper • 2502.12115 • Published 20 days ago • 42
Expect the Unexpected: FailSafe Long Context QA for Finance Paper • 2502.06329 • Published 27 days ago • 126
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published 27 days ago • 142
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published 30 days ago • 121
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 199
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Paper • 2502.02492 • Published Feb 4 • 61