Is this DPO rlhf?

#1
by SaffalPoosh - opened

Hi did you train it with TRL-RLHF using DPO strategy or just a SFT tuning?

Sign up or log in to comment