adamo1139/Yi-34B-200K-rawrr1-LORA-DPO-experimental-r3

Another QLoRA DPO training of Yi-34B-200K. This time with sequence length 500, lora_r 16 and lora alpha 32. I was able to squeeze that in using Unsloth, script I used is in this repo. It definitely has much stronger effect than my previous one that was with lora_r 4, lora_alpha 8 and sequence length 200, but I am not sure if I didn't overcook it. Will try to train this on AEZAKMI v2 now.

Credits for mlabonne (I was using his Mistral fine-tuning script pieces for dataset preparation), Daniel Han and Michael Han (Unsloth AI team)

adamo1139
/

Yi-34B-200K-rawrr1-LORA-DPO-experimental-r3

Dataset used to train adamo1139/Yi-34B-200K-rawrr1-LORA-DPO-experimental-r3