Another QLoRA DPO training of Yi-34B-200K. This time with sequence length 500, lora_r 16 and lora alpha 32. I was able to squeeze that in using Unsloth, script I used is in this repo. It definitely has much stronger effect than my previous one that was with lora_r 4, lora_alpha 8 and sequence length 200, but I am not sure if I didn't overcook it. Will try to train this on AEZAKMI v2 now.

Credits for mlabonne (I was using his Mistral fine-tuning script pieces for dataset preparation), Daniel Han and Michael Han (Unsloth AI team)

made with Unsloth

Downloads last month
1,407
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Dataset used to train adamo1139/Yi-34B-200K-rawrr1-LORA-DPO-experimental-r3