lvkaokao
/

mistral-7b-finetuned-orca-dpo-v2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Is this DPO rlhf?

#1

by SaffalPoosh - opened Nov 15, 2023

Nov 15, 2023

Hi did you train it with TRL-RLHF using DPO strategy or just a SFT tuning?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment