Update README.md
Browse files
README.md
CHANGED
@@ -14,8 +14,9 @@ SAMPLE =[
|
|
14 |
|
15 |
The template is the same as `mistralai/Mistral-7B-Instruct-v0.2`.
|
16 |
|
17 |
-
The reward model can be used for iterative SFT/DPO
|
18 |
|
|
|
19 |
```
|
20 |
@article{dong2023raft,
|
21 |
title={Raft: Reward ranked finetuning for generative foundation model alignment},
|
|
|
14 |
|
15 |
The template is the same as `mistralai/Mistral-7B-Instruct-v0.2`.
|
16 |
|
17 |
+
The reward model can be used for iterative SFT/DPO.
|
18 |
|
19 |
+
Please cite them if you found this RM helpful,
|
20 |
```
|
21 |
@article{dong2023raft,
|
22 |
title={Raft: Reward ranked finetuning for generative foundation model alignment},
|