Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
hba123 
posted an update 1 day ago
Post
859
Blindly applying algorithms without understanding the math behind them is not a good idea frmpv. So, I am on a quest to fix this!

I wrote my first hugging face article on how you would derive closed-form solutions for KL-regularised reinforcement learning problems - what is used for DPO.


Check it out: https://huggingface.co/blog/hba123/derivingdpo
In this post