Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models Paper • 2502.15499 • Published 13 days ago • 13
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs Paper • 2502.17422 • Published 10 days ago • 7
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve? Paper • 2502.17535 • Published 10 days ago • 8
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers Paper • 2502.15007 • Published 14 days ago • 157
Predictive Data Selection: The Data That Predicts Is the Data That Teaches Paper • 2503.00808 • Published 4 days ago • 49