README.md · adamo1139/Yi-34B-200K-rawrr1-LORA-DPO-experimental-r3 at main

metadata

license: apache-2.0
datasets:
  - adamo1139/rawrr_v1
tags:
  - dpo
  - qlora
  - unsloth

Another QLoRA DPO training of Yi-34B-200K. This time with sequence length 500, lora_r 16 and lora alpha 32. I was able to squeeze that in using Unsloth, script I used is in this repo. It definitely has much stronger effect than my previous one that was with lora_r 4, lora_alpha 8 and sequence length 200, but I am not sure if I didn't overcook it. Will try to train this on AEZAKMI v2 now.

Credits for mlabonne (I was using his Mistral fine-tuning script pieces for dataset preparation), Daniel Han and Michael Han (Unsloth AI team)