Text-to-Image
Diffusers
diffusers-training
lora
flux
flux-diffusers
template:sd-lora

Training with DPO ?

#11
by blanchon - opened

Hello, your doing awesome work !

If I understand well this is train in an SFT fashion and not a DPO training.
Did you experiment training with DPO on the new preference dataset you published recently ?

Data Is Better Together org

Hi @blanchon . Thank you for the kind words. We have only fine-tuned SFT fashion but have not experimented with preference alignment techniques. We encourage the community to test those though and are happy to help publicise after.

Sign up or log in to comment