view article Article Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial By open-r1 • 4 days ago • 25
view article Article 🦸🏻#9: Does AI Remember? The Role of Memory in Agentic Workflows By Kseniase • 2 days ago • 4
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 13 days ago • 288
view article Article Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO) By ariG23498 • 16 days ago • 13
view article Article Hugging Face and FriendliAI partner to supercharge model deployment on the Hub 14 days ago • 30
view article Article Finetuning Falcon 7b in a hybrid distributed fashion By Neo111x • Dec 31, 2024 • 5
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs Paper • 2402.14740 • Published Feb 22, 2024 • 13
view article Article Building a MusicGen API to Generate Custom Music Tracks Locally By theeseus-ai • Dec 4, 2024 • 2
view article Article Improving performance with Arena Learning in post training By satpalsr • Sep 11, 2024 • 5
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 124
view article Article A failed experiment: Infini-Attention, and why we should keep trying? Aug 14, 2024 • 57
view article Article Outperforming Claude 3.5 Sonnet with Phi-3-mini-4k for graph entity relationship extraction tasks By rcaulk • Aug 19, 2024 • 7