Update README.md
Browse files
README.md
CHANGED
@@ -25,7 +25,34 @@ This model requires the following prompt template:
|
|
25 |
<|prompter|> prompt goes here
|
26 |
<|assistant|>:
|
27 |
```
|
28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
|
30 |
Load text-generation-webui as you normally do.
|
31 |
|
@@ -38,51 +65,14 @@ Load text-generation-webui as you normally do.
|
|
38 |
7. In the **Model drop-down**: choose this model: `stable-vicuna-13B-GPTQ`.
|
39 |
8. Click **Reload the Model** in the top right.
|
40 |
9. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
|
41 |
-
|
42 |
-
## Provided files
|
43 |
-
|
44 |
-
I have uploaded two versions of the GPTQ.
|
45 |
-
|
46 |
-
**Compatible file - stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors**
|
47 |
-
|
48 |
-
In the `main` branch - the default one - you will find `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`
|
49 |
-
|
50 |
-
This will work with all versions of GPTQ-for-LLaMa. It has maximum compatibility
|
51 |
-
|
52 |
-
It was created without the `--act-order` parameter. It may have slightly lower inference quality compared to the other file, but is guaranteed to work on all versions of GPTQ-for-LLaMa and text-generation-webui.
|
53 |
-
|
54 |
-
* `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`
|
55 |
-
* Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
|
56 |
-
* Works with text-generation-webui one-click-installers
|
57 |
-
* Parameters: Groupsize = 128g. No act-order.
|
58 |
-
* Command used to create the GPTQ:
|
59 |
-
```
|
60 |
-
CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.no-act-order.safetensors
|
61 |
-
```
|
62 |
-
|
63 |
-
**Latest file - stable-vicuna-13B-GPTQ-4bit.latest.act-order.safetensors**
|
64 |
-
|
65 |
-
Created for more recent versions of GPTQ-for-LLaMa, and uses the `--act-order` flag for maximum theoretical performance.
|
66 |
-
|
67 |
-
To access this file, please switch to the `latest` branch fo this repo and download from there.
|
68 |
-
|
69 |
-
* `stable-vicuna-13B-GPTQ-4bit.latest.act-order.safetensors`
|
70 |
-
* Only works with recent GPTQ-for-LLaMa code
|
71 |
-
* **Does not** work with text-generation-webui one-click-installers
|
72 |
-
* Parameters: Groupsize = 128g. **act-order**.
|
73 |
-
* Offers highest quality quantisation, but requires recent GPTQ-for-LLaMa code
|
74 |
-
* Command used to create the GPTQ:
|
75 |
-
```
|
76 |
-
CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.act-order.safetensors
|
77 |
-
```
|
78 |
|
79 |
## Manual instructions for `text-generation-webui`
|
80 |
|
81 |
-
|
82 |
|
83 |
[Instructions on using GPTQ 4bit files in text-generation-webui are here](https://github.com/oobabooga/text-generation-webui/wiki/GPTQ-models-\(4-bit-mode\)).
|
84 |
|
85 |
-
The
|
86 |
|
87 |
If you want to use the act-order `safetensors` files and need to update the Triton branch of GPTQ-for-LLaMa, here are the commands I used to clone the Triton branch of GPTQ-for-LLaMa, clone text-generation-webui, and install GPTQ into the UI:
|
88 |
```
|
@@ -98,12 +88,26 @@ git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
|
|
98 |
Then install this model into `text-generation-webui/models` and launch the UI as follows:
|
99 |
```
|
100 |
cd text-generation-webui
|
101 |
-
python server.py --model
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
102 |
```
|
103 |
|
104 |
The above commands assume you have installed all dependencies for GPTQ-for-LLaMa and text-generation-webui. Please see their respective repositories for further information.
|
105 |
|
106 |
-
If you can't update GPTQ-for-LLaMa or don't want to,
|
107 |
|
108 |
# Original model card
|
109 |
|
|
|
25 |
<|prompter|> prompt goes here
|
26 |
<|assistant|>:
|
27 |
```
|
28 |
+
|
29 |
+
## CHOICE OF MODELS
|
30 |
+
|
31 |
+
Two sets of models are provided:
|
32 |
+
|
33 |
+
* Groupsize = 1024
|
34 |
+
* Should work reliably in 24GB VRAM
|
35 |
+
* Groupsize = 128
|
36 |
+
* May require more than 24GB VRAM, depending on response length
|
37 |
+
* In my testing it ran out of VRAM on a 24GB card around 1500 tokens returned.
|
38 |
+
|
39 |
+
For each model, two versions are available:
|
40 |
+
* `compat.no-act-order.safetensor`
|
41 |
+
* Works with all versions of GPTQ-for-LLaMa, including the version in text-generation-webui one-click-installers
|
42 |
+
* `latest.act-order.safetensors`
|
43 |
+
* uses `--act-order` for higher inference quality
|
44 |
+
* requires more recent GPTQ-for-LLaMa code, therefore will not currently work with one-click-installers
|
45 |
+
|
46 |
+
## HOW TO CHOOSE YOUR MODEL
|
47 |
+
|
48 |
+
I have used branches to separate the models:
|
49 |
+
|
50 |
+
* Branch: **main** = groupsize 1024, `compat.no-act-order.safetensor` file
|
51 |
+
* Branch: **1024-latest** = groupsize 1024, `latest.no-act-order.safetensor` file
|
52 |
+
* Branch: **128-compat** = groupsize 128, `compat.no-act-order.safetensor` file
|
53 |
+
* Branch: **128-latest** = groupsize 128, `latest.no-act-order.safetensor` file
|
54 |
+
|
55 |
+
## How to easily download and the 1024g compat model in text-generation-webui
|
56 |
|
57 |
Load text-generation-webui as you normally do.
|
58 |
|
|
|
65 |
7. In the **Model drop-down**: choose this model: `stable-vicuna-13B-GPTQ`.
|
66 |
8. Click **Reload the Model** in the top right.
|
67 |
9. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
68 |
|
69 |
## Manual instructions for `text-generation-webui`
|
70 |
|
71 |
+
The `compat.no-act-order.safetensors` files can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
|
72 |
|
73 |
[Instructions on using GPTQ 4bit files in text-generation-webui are here](https://github.com/oobabooga/text-generation-webui/wiki/GPTQ-models-\(4-bit-mode\)).
|
74 |
|
75 |
+
The `latest.act-order.safetensors` files were created using `--act-order` to give the maximum possible quantisation quality, but this means it requires that the latest GPTQ-for-LLaMa is used inside the UI.
|
76 |
|
77 |
If you want to use the act-order `safetensors` files and need to update the Triton branch of GPTQ-for-LLaMa, here are the commands I used to clone the Triton branch of GPTQ-for-LLaMa, clone text-generation-webui, and install GPTQ into the UI:
|
78 |
```
|
|
|
88 |
Then install this model into `text-generation-webui/models` and launch the UI as follows:
|
89 |
```
|
90 |
cd text-generation-webui
|
91 |
+
python server.py --model OpenAssistant-SFT-7-Llama-30B-GPTQ --wbits 4 --groupsize 128 --model_type Llama # add any other command line args you want
|
92 |
+
```
|
93 |
+
|
94 |
+
To update the CUDA branch of GPTQ-for-LLaMa, you can do the following. **This requires a C/C++ compiler and the CUDA toolkit installed!**
|
95 |
+
```
|
96 |
+
# Clone text-generation-webui, if you don't already have it
|
97 |
+
git clone https://github.com/oobabooga/text-generation-webui
|
98 |
+
# Make a repositories directory
|
99 |
+
mkdir text-generation-webui/repositories
|
100 |
+
cd text-generation-webui/repositories
|
101 |
+
# Clone the latest GPTQ-for-LLaMa code inside text-generation-webui
|
102 |
+
git clone -b cuda https://github.com/qwopqwop200/GPTQ-for-LLaMa
|
103 |
+
cd GPTQ-for-LLaMa
|
104 |
+
pip uninstall quant-cuda # uninstall existing CUDA version
|
105 |
+
python setup_cuda.py install # install latest version
|
106 |
```
|
107 |
|
108 |
The above commands assume you have installed all dependencies for GPTQ-for-LLaMa and text-generation-webui. Please see their respective repositories for further information.
|
109 |
|
110 |
+
If you can't update GPTQ-for-LLaMa or don't want to, please use a `compat.no-act-order.safetensor` file.
|
111 |
|
112 |
# Original model card
|
113 |
|