Online RLHF - a RLHFlow Collection

RLHFlow 's Collections

Decision-Tree Reward Models

RLHFlow MATH Process Reward Model

Standard-format-preference-dataset

Mixture-of-preference-reward-modeling

RM-Bradley-Terry

PM-pair

RLHFLow Reward Models

Online RLHF

updated Jun 12, 2024

Datasets, code, and models for online RLHF (i.e., iterative DPO)

RLHFlow/prompt-collection-v0.1

Viewer • Updated May 8, 2024 • 179k • 53 • 9
RLHFlow/pair-preference-model-LLaMA3-8B

Text Generation • Updated Oct 14, 2024 • 1.89k • 38
sfairXC/FsfairX-LLaMA3-RM-v0.1

Text Classification • Updated Oct 14, 2024 • 5.7k • 54
RLHFlow/SFT-OpenHermes-2.5-Standard

Viewer • Updated Apr 24, 2024 • 1M • 39 • 2
RLHFlow/iterative-prompt-v1-iter2-20K

Viewer • Updated May 3, 2024 • 20k • 38 • 2
RLHFlow/iterative-prompt-v1-iter3-20K

Viewer • Updated May 3, 2024 • 20k • 28 • 3
RLHFlow/iterative-prompt-v1-iter1-20K

Viewer • Updated May 3, 2024 • 20k • 57 • 2
Salesforce/LLaMA-3-8B-SFR-Iterative-DPO-R

Text Generation • Updated 14 days ago • 127 • 78
RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13, 2024 • 67
Salesforce/LLaMA-3-8B-SFR-SFT-R

Text Generation • Updated 14 days ago • 26 • 8
RLHFlow/LLaMA3-SFT

Text Generation • Updated Nov 3, 2024 • 8.51k • 10
RLHFlow/LLaMA3-iterative-DPO-final

Text Generation • Updated Oct 14, 2024 • 6.63k • 40
RLHFlow/iterative-prompt-v1-iter4-20K

Viewer • Updated Jun 12, 2024 • 20k • 36
RLHFlow/iterative-prompt-v1-iter5-20K

Viewer • Updated Jun 12, 2024 • 20k • 31
RLHFlow/iterative-prompt-v1-iter6-20K

Viewer • Updated Jun 12, 2024 • 20k • 32
RLHFlow/iterative-prompt-v1-iter7-20K

Viewer • Updated Jun 12, 2024 • 20k • 33
RLHFlow/iterative-prompt-v1-iter8-20K

Viewer • Updated Jun 12, 2024 • 20k • 32
RLHFlow/iterative-prompt-v1-iter9-20K

Viewer • Updated Jun 12, 2024 • 19.9k • 37 • 1