Training/finetuning code?
Can you share finetuning code using DPO used, or an ETA on when the code will be available?
Hello @milsunone we'll be releasing the DPO training code soon in the Alignment Handbook we're working on: https://github.com/huggingface/alignment-handbook
In the meantime, you can adapt the script from TRL which is quite similar to what we'll release: https://github.com/huggingface/trl/blob/main/examples/scripts/dpo.py
Hello @milsunone we'll be releasing the DPO training code soon in the Alignment Handbook we're working on: https://github.com/huggingface/alignment-handbook
In the meantime, you can adapt the script from TRL which is quite similar to what we'll release: https://github.com/huggingface/trl/blob/main/examples/scripts/dpo.py
Great, could you also share what datasets are used during fine-tuning? It will be a great reference for fine-tune learning :)
How did you set the beta in DPO?