metadata
library_name: transformers
license: apache-2.0
datasets:
- Atsunori/HelpSteer2-DPO
- nbeerbower/Schule-DPO
- nbeerbower/Purpura-DPO
- nbeerbower/Arkhaios-DPO
- jondurbin/truthy-dpo-v0.1
- antiven0m/physical-reasoning-dpo
- flammenai/Date-DPO-NoAsterisks
base_model:
- nbeerbower/Nemoties-ChatML-12B
CaptainNemo-ChatML-12B
Nemoties-ChatML-12B finetuned on:
- Atsunori/HelpSteer2-DPO
- nbeerbower/Schule-DPO
- nbeerbower/Purpura-DPO
- nbeerbower/Arkhaios-DPO
- jondurbin/truthy-dpo-v0.1
- antiven0m/physical-reasoning-dpo
- flammenai/Date-DPO-NoAsterisks
Method
QLoRA ORPO tuned with 1x RTX A6000 for 2 epochs. Rank 64 LoRA, 2e-5 learning rate.