@hba123 on Hugging Face: "Blindly applying algorithms without understanding the math behind them is not…"

Post

1816

Blindly applying algorithms without understanding the math behind them is not a good idea frmpv. So, I am on a quest to fix this!

I wrote my first hugging face article on how you would derive closed-form solutions for KL-regularised reinforcement learning problems - what is used for DPO.

Check it out: https://huggingface.co/blog/hba123/derivingdpo

Join the conversation