DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning Paper • 2411.04983 • Published Nov 7, 2024 • 6
Reward-Guided Speculative Decoding for Efficient LLM Reasoning Paper • 2501.19324 • Published 3 days ago • 23
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training Paper • 2501.18511 • Published 4 days ago • 15
CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation Paper • 2501.16609 • Published 7 days ago • 5
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding Paper • 2501.16411 • Published 7 days ago • 17
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs Paper • 2501.18585 • Published 4 days ago • 39
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published 6 days ago • 88
Towards General-Purpose Model-Free Reinforcement Learning Paper • 2501.16142 • Published 7 days ago • 23
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 12 days ago • 284
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model Paper • 2501.12368 • Published 13 days ago • 39