Trained on a different random sampling of the same datasets used by loyal-piano-m7, then with cDPO on a blend of RLHF datasets.

Several intermediate checkpoints (of cDPO training) are on branches.

Uses the Alpaca prompt format.

Downloads last month: 1,248

Safetensors

Model size

7.24B params

Tensor type

BF16

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for chargoddard/servile-harpsichord-cdpo

Merges

1 model

Quantizations

1 model

chargoddard
/

servile-harpsichord-cdpo

Model tree for chargoddard/servile-harpsichord-cdpo

Datasets used to train chargoddard/servile-harpsichord-cdpo

Spaces using chargoddard/servile-harpsichord-cdpo 6