Model Details

This is an official implementation of ODIN-ppo-L230-7B model, which is a chat assistant trained by fine-tuning LLaMA on Open-Assistant dataset via PPO. The L230 means the output length in LIMA test set is ~230. ODIN is the reward model for the training.

Model Description

Model Sources

Downloads last month
3
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Collection including Lichang-Chen/ODIN-ppo-L230-best