DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning Paper • 2411.04983 • Published Nov 7, 2024 • 6
Reward-Guided Speculative Decoding for Efficient LLM Reasoning Paper • 2501.19324 • Published 3 days ago • 23
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training Paper • 2501.18511 • Published 4 days ago • 15
CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation Paper • 2501.16609 • Published 7 days ago • 5
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding Paper • 2501.16411 • Published 7 days ago • 17
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs Paper • 2501.18585 • Published 4 days ago • 39
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published 6 days ago • 88
Towards General-Purpose Model-Free Reinforcement Learning Paper • 2501.16142 • Published 7 days ago • 23
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 12 days ago • 284
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model Paper • 2501.12368 • Published 13 days ago • 39
Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments Paper • 2501.10893 • Published 16 days ago • 23
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published 14 days ago • 89
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding Paper • 2411.04282 • Published Nov 6, 2024 • 33
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published Dec 9, 2024 • 79