metadata
license: apache-2.0
datasets:
- adamo1139/rawrr_v1
tags:
- dpo
- qlora
- unsloth
Another QLoRA DPO training of Yi-34B-200K. This time with sequence length 500, lora_r 16 and lora alpha 32. I was able to squeeze that in using Unsloth, script I used is in this repo. It definitely has much stronger effect than my previous one that was with lora_r 4, lora_alpha 8 and sequence length 200, but I am not sure if I didn't overcook it. Will try to train this on AEZAKMI v2 now.
Credits for mlabonne (I was using his Mistral fine-tuning script pieces for dataset preparation), Daniel Han and Michael Han (Unsloth AI team)