TheBloke commited on
Commit
05c51c6
·
1 Parent(s): 6d3dd74

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -42
README.md CHANGED
@@ -25,7 +25,34 @@ This model requires the following prompt template:
25
  <|prompter|> prompt goes here
26
  <|assistant|>:
27
  ```
28
- ## How to easily download and use this model in text-generation-webui
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  Load text-generation-webui as you normally do.
31
 
@@ -38,51 +65,14 @@ Load text-generation-webui as you normally do.
38
  7. In the **Model drop-down**: choose this model: `stable-vicuna-13B-GPTQ`.
39
  8. Click **Reload the Model** in the top right.
40
  9. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
41
-
42
- ## Provided files
43
-
44
- I have uploaded two versions of the GPTQ.
45
-
46
- **Compatible file - stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors**
47
-
48
- In the `main` branch - the default one - you will find `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`
49
-
50
- This will work with all versions of GPTQ-for-LLaMa. It has maximum compatibility
51
-
52
- It was created without the `--act-order` parameter. It may have slightly lower inference quality compared to the other file, but is guaranteed to work on all versions of GPTQ-for-LLaMa and text-generation-webui.
53
-
54
- * `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`
55
- * Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
56
- * Works with text-generation-webui one-click-installers
57
- * Parameters: Groupsize = 128g. No act-order.
58
- * Command used to create the GPTQ:
59
- ```
60
- CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.no-act-order.safetensors
61
- ```
62
-
63
- **Latest file - stable-vicuna-13B-GPTQ-4bit.latest.act-order.safetensors**
64
-
65
- Created for more recent versions of GPTQ-for-LLaMa, and uses the `--act-order` flag for maximum theoretical performance.
66
-
67
- To access this file, please switch to the `latest` branch fo this repo and download from there.
68
-
69
- * `stable-vicuna-13B-GPTQ-4bit.latest.act-order.safetensors`
70
- * Only works with recent GPTQ-for-LLaMa code
71
- * **Does not** work with text-generation-webui one-click-installers
72
- * Parameters: Groupsize = 128g. **act-order**.
73
- * Offers highest quality quantisation, but requires recent GPTQ-for-LLaMa code
74
- * Command used to create the GPTQ:
75
- ```
76
- CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.act-order.safetensors
77
- ```
78
 
79
  ## Manual instructions for `text-generation-webui`
80
 
81
- File `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
82
 
83
  [Instructions on using GPTQ 4bit files in text-generation-webui are here](https://github.com/oobabooga/text-generation-webui/wiki/GPTQ-models-\(4-bit-mode\)).
84
 
85
- The other `safetensors` model file was created using `--act-order` to give the maximum possible quantisation quality, but this means it requires that the latest GPTQ-for-LLaMa is used inside the UI.
86
 
87
  If you want to use the act-order `safetensors` files and need to update the Triton branch of GPTQ-for-LLaMa, here are the commands I used to clone the Triton branch of GPTQ-for-LLaMa, clone text-generation-webui, and install GPTQ into the UI:
88
  ```
@@ -98,12 +88,26 @@ git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
98
  Then install this model into `text-generation-webui/models` and launch the UI as follows:
99
  ```
100
  cd text-generation-webui
101
- python server.py --model stable-vicuna-13B-GPTQ --wbits 4 --groupsize 128 --model_type Llama # add any other command line args you want
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
  ```
103
 
104
  The above commands assume you have installed all dependencies for GPTQ-for-LLaMa and text-generation-webui. Please see their respective repositories for further information.
105
 
106
- If you can't update GPTQ-for-LLaMa or don't want to, you can use `stable-vicuna-13B-GPTQ-4bit.no-act-order.safetensors` as mentioned above, which should work without any upgrades to text-generation-webui.
107
 
108
  # Original model card
109
 
 
25
  <|prompter|> prompt goes here
26
  <|assistant|>:
27
  ```
28
+
29
+ ## CHOICE OF MODELS
30
+
31
+ Two sets of models are provided:
32
+
33
+ * Groupsize = 1024
34
+ * Should work reliably in 24GB VRAM
35
+ * Groupsize = 128
36
+ * May require more than 24GB VRAM, depending on response length
37
+ * In my testing it ran out of VRAM on a 24GB card around 1500 tokens returned.
38
+
39
+ For each model, two versions are available:
40
+ * `compat.no-act-order.safetensor`
41
+ * Works with all versions of GPTQ-for-LLaMa, including the version in text-generation-webui one-click-installers
42
+ * `latest.act-order.safetensors`
43
+ * uses `--act-order` for higher inference quality
44
+ * requires more recent GPTQ-for-LLaMa code, therefore will not currently work with one-click-installers
45
+
46
+ ## HOW TO CHOOSE YOUR MODEL
47
+
48
+ I have used branches to separate the models:
49
+
50
+ * Branch: **main** = groupsize 1024, `compat.no-act-order.safetensor` file
51
+ * Branch: **1024-latest** = groupsize 1024, `latest.no-act-order.safetensor` file
52
+ * Branch: **128-compat** = groupsize 128, `compat.no-act-order.safetensor` file
53
+ * Branch: **128-latest** = groupsize 128, `latest.no-act-order.safetensor` file
54
+
55
+ ## How to easily download and the 1024g compat model in text-generation-webui
56
 
57
  Load text-generation-webui as you normally do.
58
 
 
65
  7. In the **Model drop-down**: choose this model: `stable-vicuna-13B-GPTQ`.
66
  8. Click **Reload the Model** in the top right.
67
  9. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
 
69
  ## Manual instructions for `text-generation-webui`
70
 
71
+ The `compat.no-act-order.safetensors` files can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
72
 
73
  [Instructions on using GPTQ 4bit files in text-generation-webui are here](https://github.com/oobabooga/text-generation-webui/wiki/GPTQ-models-\(4-bit-mode\)).
74
 
75
+ The `latest.act-order.safetensors` files were created using `--act-order` to give the maximum possible quantisation quality, but this means it requires that the latest GPTQ-for-LLaMa is used inside the UI.
76
 
77
  If you want to use the act-order `safetensors` files and need to update the Triton branch of GPTQ-for-LLaMa, here are the commands I used to clone the Triton branch of GPTQ-for-LLaMa, clone text-generation-webui, and install GPTQ into the UI:
78
  ```
 
88
  Then install this model into `text-generation-webui/models` and launch the UI as follows:
89
  ```
90
  cd text-generation-webui
91
+ python server.py --model OpenAssistant-SFT-7-Llama-30B-GPTQ --wbits 4 --groupsize 128 --model_type Llama # add any other command line args you want
92
+ ```
93
+
94
+ To update the CUDA branch of GPTQ-for-LLaMa, you can do the following. **This requires a C/C++ compiler and the CUDA toolkit installed!**
95
+ ```
96
+ # Clone text-generation-webui, if you don't already have it
97
+ git clone https://github.com/oobabooga/text-generation-webui
98
+ # Make a repositories directory
99
+ mkdir text-generation-webui/repositories
100
+ cd text-generation-webui/repositories
101
+ # Clone the latest GPTQ-for-LLaMa code inside text-generation-webui
102
+ git clone -b cuda https://github.com/qwopqwop200/GPTQ-for-LLaMa
103
+ cd GPTQ-for-LLaMa
104
+ pip uninstall quant-cuda # uninstall existing CUDA version
105
+ python setup_cuda.py install # install latest version
106
  ```
107
 
108
  The above commands assume you have installed all dependencies for GPTQ-for-LLaMa and text-generation-webui. Please see their respective repositories for further information.
109
 
110
+ If you can't update GPTQ-for-LLaMa or don't want to, please use a `compat.no-act-order.safetensor` file.
111
 
112
  # Original model card
113