Hi did you train it with TRL-RLHF using DPO strategy or just a SFT tuning?
· Sign up or log in to comment