Description

Gemma-2-9b-it model finetuned by off-policy WPO. Details in WPO: Enhancing RLHF with Weighted Preference Optimization.

License

This model is licensed under the Zoom software license and is permitted for use only for noncommercial, educational, or academic research purposes.

Downloads last month
4
Safetensors
Model size
9.24B params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Collection including wzhouad/gemma-2-9b-it-WPO-FP