Trained on a different random sampling of the same datasets used by loyal-piano-m7, then with cDPO on a blend of RLHF datasets.
Several intermediate checkpoints (of cDPO training) are on branches.
Uses the Alpaca prompt format.
- Downloads last month
- 1,248
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.