revert naming paths
Browse files
README.md
CHANGED
@@ -11,18 +11,18 @@ datasets:
|
|
11 |
- tatsu-lab/alpaca
|
12 |
---
|
13 |
|
14 |
-
# StableVicuna-13B
|
15 |
|
16 |
## Model Description
|
17 |
|
18 |
-
StableVicuna-13B is a [Vicuna-13B](https://vicuna.lmsys.org/) model fine-tuned using reinforcement learning from human feedback (RLHF) via Proximal Policy Optimization (PPO) on various conversational and instructional datasets.
|
19 |
|
20 |
### Apply Delta Weights
|
21 |
|
22 |
-
StableVicuna-13B cannot be used from the `
|
23 |
|
24 |
```sh
|
25 |
-
python3 apply_delta.py --base /path/to/model_weights/llama-13b --target stable-vicuna-13b --delta
|
26 |
```
|
27 |
|
28 |
|
@@ -81,7 +81,7 @@ The reward model used during RLHF was also trained on [OpenAssistant Conversatio
|
|
81 |
|
82 |
### Training Procedure
|
83 |
|
84 |
-
`
|
85 |
|
86 |
| Hyperparameter | Value |
|
87 |
|-------------------|---------|
|
@@ -118,7 +118,7 @@ The base LLaMA model is trained on various data, some of which may contain offen
|
|
118 |
|
119 |
## Acknowledgements
|
120 |
|
121 |
-
This work would not have been possible without the support of [
|
122 |
|
123 |
## Citations
|
124 |
|
|
|
11 |
- tatsu-lab/alpaca
|
12 |
---
|
13 |
|
14 |
+
# StableVicuna-13B
|
15 |
|
16 |
## Model Description
|
17 |
|
18 |
+
StableVicuna-13B is a [Vicuna-13B v1.0](https://vicuna.lmsys.org/) model fine-tuned using reinforcement learning from human feedback (RLHF) via Proximal Policy Optimization (PPO) on various conversational and instructional datasets.
|
19 |
|
20 |
### Apply Delta Weights
|
21 |
|
22 |
+
StableVicuna-13B cannot be used from the `CarperAI/stable-vicuna-13b-delta` weights alone. To obtain the correct model, one must add back the difference between LLaMA 13B and `CarperAI/stable-vicuna-13b-delta` weights. We provide the [`apply_delta.py`](https://huggingface.co/CarperAI/stable-vicuna-13b-delta/raw/main/apply_delta.py) script to automate the conversion, which you can run as:
|
23 |
|
24 |
```sh
|
25 |
+
python3 apply_delta.py --base /path/to/model_weights/llama-13b --target stable-vicuna-13b --delta CarperAI/stable-vicuna-13b-delta
|
26 |
```
|
27 |
|
28 |
|
|
|
81 |
|
82 |
### Training Procedure
|
83 |
|
84 |
+
`CarperAI/sstable-vicuna-13b-delta` was trained using PPO as implemented in [`trlX`](https://github.com/CarperAI/trlx/blob/main/trlx/trainer/accelerate_ppo_trainer.py) with the following configuration:
|
85 |
|
86 |
| Hyperparameter | Value |
|
87 |
|-------------------|---------|
|
|
|
118 |
|
119 |
## Acknowledgements
|
120 |
|
121 |
+
This work would not have been possible without the support of [Stability AI](https://stability.ai/).
|
122 |
|
123 |
## Citations
|
124 |
|