Update README.md
Browse files
README.md
CHANGED
@@ -54,7 +54,7 @@ Fine-tuning datasets for this model are based on [Stack Exchange Paired](https:/
|
|
54 |
**DPO Training:** [https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/rl](https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/rl)
|
55 |
|
56 |
### Training Procedure
|
57 |
-
The model was first fine-tuned on the Stack Exchange question and answer pairs and then fine-tuned via the DPO training procedure using
|
58 |
It is trained to respond to prompts with the following template:
|
59 |
|
60 |
```
|
|
|
54 |
**DPO Training:** [https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/rl](https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/rl)
|
55 |
|
56 |
### Training Procedure
|
57 |
+
The model was first fine-tuned on the Stack Exchange question and answer pairs and then fine-tuned via the DPO training procedure using the SFT model as the reference model.
|
58 |
It is trained to respond to prompts with the following template:
|
59 |
|
60 |
```
|