Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
ariG23498 
posted an update 5 days ago
Post
1791
Tried my hand at simplifying the derivations of Direct Preference Optimization.

I cover how one can reformulate RLHF into DPO. The idea of implicit reward modeling is chef's kiss.

Blog: https://huggingface.co/blog/ariG23498/rlhf-to-dpo
deleted
This comment has been hidden