Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
RLHFlow
's Collections
Decision-Tree Reward Models
RLHFlow MATH Process Reward Model
Standard-format-preference-dataset
Mixture-of-preference-reward-modeling
RM-Bradley-Terry
PM-pair
Online RLHF
RLHFLow Reward Models
SFT Models
Online RLHF
updated
Jun 12, 2024
Datasets, code, and models for online RLHF (i.e., iterative DPO)
Upvote
5
RLHFlow/prompt-collection-v0.1
Viewer
•
Updated
May 8, 2024
•
179k
•
53
•
9
RLHFlow/pair-preference-model-LLaMA3-8B
Text Generation
•
Updated
Oct 14, 2024
•
1.89k
•
38
sfairXC/FsfairX-LLaMA3-RM-v0.1
Text Classification
•
Updated
Oct 14, 2024
•
5.7k
•
54
RLHFlow/SFT-OpenHermes-2.5-Standard
Viewer
•
Updated
Apr 24, 2024
•
1M
•
39
•
2
RLHFlow/iterative-prompt-v1-iter2-20K
Viewer
•
Updated
May 3, 2024
•
20k
•
38
•
2
RLHFlow/iterative-prompt-v1-iter3-20K
Viewer
•
Updated
May 3, 2024
•
20k
•
28
•
3
RLHFlow/iterative-prompt-v1-iter1-20K
Viewer
•
Updated
May 3, 2024
•
20k
•
57
•
2
Salesforce/LLaMA-3-8B-SFR-Iterative-DPO-R
Text Generation
•
Updated
14 days ago
•
127
•
78
RLHF Workflow: From Reward Modeling to Online RLHF
Paper
•
2405.07863
•
Published
May 13, 2024
•
67
Salesforce/LLaMA-3-8B-SFR-SFT-R
Text Generation
•
Updated
14 days ago
•
26
•
8
RLHFlow/LLaMA3-SFT
Text Generation
•
Updated
Nov 3, 2024
•
8.51k
•
10
RLHFlow/LLaMA3-iterative-DPO-final
Text Generation
•
Updated
Oct 14, 2024
•
6.63k
•
40
RLHFlow/iterative-prompt-v1-iter4-20K
Viewer
•
Updated
Jun 12, 2024
•
20k
•
36
RLHFlow/iterative-prompt-v1-iter5-20K
Viewer
•
Updated
Jun 12, 2024
•
20k
•
31
RLHFlow/iterative-prompt-v1-iter6-20K
Viewer
•
Updated
Jun 12, 2024
•
20k
•
32
RLHFlow/iterative-prompt-v1-iter7-20K
Viewer
•
Updated
Jun 12, 2024
•
20k
•
33
RLHFlow/iterative-prompt-v1-iter8-20K
Viewer
•
Updated
Jun 12, 2024
•
20k
•
32
RLHFlow/iterative-prompt-v1-iter9-20K
Viewer
•
Updated
Jun 12, 2024
•
19.9k
•
37
•
1
Upvote
5
+1
Share collection
View history
Collection guide
Browse collections