Running 523 523 Scaling test-time compute 📈 Enhance math problem solving by scaling test-time compute
UNIVA-Bllossom/DeepSeek-llama3.3-Bllossom-70B Text Generation • Updated 11 days ago • 1.31k • 46
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 332
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published Dec 20, 2024 • 38