Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning Paper • 2502.14768 • Published 17 days ago • 44
WebGames: Challenging General-Purpose Web-Browsing AI Agents Paper • 2502.18356 • Published 12 days ago • 11
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO Paper • 2502.14669 • Published 17 days ago • 11
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) Dec 9, 2022 • 191
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models Paper • 2502.09604 • Published 24 days ago • 32
Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance Paper • 2502.08127 • Published 26 days ago • 50
Scaling Pre-training to One Hundred Billion Data for Vision Language Models Paper • 2502.07617 • Published 26 days ago • 29
TransMLA: Multi-head Latent Attention Is All You Need Paper • 2502.07864 • Published 26 days ago • 46
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models Paper • 2502.04404 • Published Feb 6 • 23
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published 27 days ago • 142
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 199