Training with DPO ?

#11

by blanchon - opened Jan 23

Jan 23

Hello, your doing awesome work !

If I understand well this is train in an SFT fashion and not a DPO training.
Did you experiment training with DPO on the new preference dataset you published recently ?

davidberenstein1957

Data Is Better Together org Jan 23

Hi @blanchon . Thank you for the kind words. We have only fine-tuned SFT fashion but have not experimented with preference alignment techniques. We encourage the community to test those though and are happy to help publicise after.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment