Training with DPO ?
#11
by
blanchon
- opened
Hello, your doing awesome work !
If I understand well this is train in an SFT fashion and not a DPO training.
Did you experiment training with DPO on the new preference dataset you published recently ?
Hi @blanchon . Thank you for the kind words. We have only fine-tuned SFT fashion but have not experimented with preference alignment techniques. We encourage the community to test those though and are happy to help publicise after.