Large Language Models Think Too Fast To Explore Effectively Paper • 2501.18009 • Published Jan 29 • 23
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28 • 108
Intuitive physics understanding emerges from self-supervised pretraining on natural videos Paper • 2502.11831 • Published 17 days ago • 18
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity Paper • 2502.13063 • Published 16 days ago • 65
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers Paper • 2502.15007 • Published 14 days ago • 157