view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • 18 days ago • 43
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) Dec 9, 2022 • 173
Running 1.59k 1.59k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
SimpleRL Collection The collection for the Project "Simple Reinforcement Learning for Reasoning" • 2 items • Updated 7 days ago • 4
CodeI/O Collection Collection for CodeI/O @ https://codei-o.github.io/ • 15 items • Updated 13 days ago • 6
NuminaMath Collection Datasets and models for training SOTA math LLMs. See our GitHub for training & inference code: https://github.com/project-numina/aimo-progress-prize • 7 items • Updated 15 days ago • 75