Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
ariG23498 
posted an update Jan 19
Post
2081
Tried my hand at simplifying the derivations of Direct Preference Optimization.

I cover how one can reformulate RLHF into DPO. The idea of implicit reward modeling is chef's kiss.

Blog: https://huggingface.co/blog/ariG23498/rlhf-to-dpo
deleted
This comment has been hidden