TheBloke
/

OpenAssistant-SFT-7-Llama-30B-GPTQ

@@ -25,7 +25,34 @@ This model requires the following prompt template:
 <|prompter|> prompt goes here
 <|assistant|>:
 ```
-## How to easily download and use this model in text-generation-webui
 Load text-generation-webui as you normally do.
@@ -38,51 +65,14 @@ Load text-generation-webui as you normally do.
 7. In the **Model drop-down**: choose this model: `stable-vicuna-13B-GPTQ`.
 8. Click **Reload the Model** in the top right.
 9. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
-## Provided files
-I have uploaded two versions of the GPTQ.
-**Compatible file - stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors**
-In the `main` branch - the default one - you will find `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`
-This will work with all versions of GPTQ-for-LLaMa. It has maximum compatibility
-It was created without the `--act-order` parameter. It may have slightly lower inference quality compared to the other file, but is guaranteed to work on all versions of GPTQ-for-LLaMa and text-generation-webui.
-* `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`
-  * Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
-  * Works with text-generation-webui one-click-installers
-  * Parameters: Groupsize = 128g. No act-order.
-  * Command used to create the GPTQ:
-    ```
-    CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.no-act-order.safetensors
-    ```
-**Latest file - stable-vicuna-13B-GPTQ-4bit.latest.act-order.safetensors**
-Created for more recent versions of GPTQ-for-LLaMa, and uses the `--act-order` flag for maximum theoretical performance.
-To access this file, please switch to the `latest` branch fo this repo and download from there.
-* `stable-vicuna-13B-GPTQ-4bit.latest.act-order.safetensors`
-  * Only works with recent GPTQ-for-LLaMa code
-  * **Does not** work with text-generation-webui one-click-installers
-  * Parameters: Groupsize = 128g. **act-order**.
-  * Offers highest quality quantisation, but requires recent GPTQ-for-LLaMa code
-  * Command used to create the GPTQ:
-    ```
-    CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.act-order.safetensors
-    ```
 ## Manual instructions for `text-generation-webui`
-File `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
 [Instructions on using GPTQ 4bit files in text-generation-webui are here](https://github.com/oobabooga/text-generation-webui/wiki/GPTQ-models-\(4-bit-mode\)).
-The other `safetensors` model file was created using `--act-order` to give the maximum possible quantisation quality, but this means it requires that the latest GPTQ-for-LLaMa is used inside the UI.
 If you want to use the act-order `safetensors` files and need to update the Triton branch of GPTQ-for-LLaMa, here are the commands I used to clone the Triton branch of GPTQ-for-LLaMa, clone text-generation-webui, and install GPTQ into the UI:
 ```
@@ -98,12 +88,26 @@ git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
 Then install this model into `text-generation-webui/models` and launch the UI as follows:
 ```
 cd text-generation-webui
-python server.py --model stable-vicuna-13B-GPTQ --wbits 4 --groupsize 128 --model_type Llama # add any other command line args you want
 ```
 The above commands assume you have installed all dependencies for GPTQ-for-LLaMa and text-generation-webui. Please see their respective repositories for further information.
-If you can't update GPTQ-for-LLaMa or don't want to, you can use `stable-vicuna-13B-GPTQ-4bit.no-act-order.safetensors` as mentioned above, which should work without any upgrades to text-generation-webui.
 # Original model card

 <|prompter|> prompt goes here
 <|assistant|>:
 ```
+## CHOICE OF MODELS
+Two sets of models are provided:
+* Groupsize = 1024
+  * Should work reliably in 24GB VRAM
+* Groupsize = 128
+  * May require more than 24GB VRAM, depending on response length
+  * In my testing it ran out of VRAM on a 24GB card around 1500 tokens returned.
+For each model, two versions are available:
+* `compat.no-act-order.safetensor`
+  * Works with all versions of GPTQ-for-LLaMa, including the version in text-generation-webui one-click-installers
+* `latest.act-order.safetensors`
+  * uses `--act-order` for higher inference quality
+  * requires more recent GPTQ-for-LLaMa code, therefore will not currently work with one-click-installers
+## HOW TO CHOOSE YOUR MODEL
+I have used branches to separate the models:
+* Branch: **main** = groupsize 1024, `compat.no-act-order.safetensor` file
+* Branch: **1024-latest** = groupsize 1024, `latest.no-act-order.safetensor` file
+* Branch: **128-compat** = groupsize 128, `compat.no-act-order.safetensor` file
+* Branch: **128-latest** = groupsize 128, `latest.no-act-order.safetensor` file
+## How to easily download and the 1024g compat model in text-generation-webui
 Load text-generation-webui as you normally do.
 7. In the **Model drop-down**: choose this model: `stable-vicuna-13B-GPTQ`.
 8. Click **Reload the Model** in the top right.
 9. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
 ## Manual instructions for `text-generation-webui`
+The `compat.no-act-order.safetensors` files can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
 [Instructions on using GPTQ 4bit files in text-generation-webui are here](https://github.com/oobabooga/text-generation-webui/wiki/GPTQ-models-\(4-bit-mode\)).
+The `latest.act-order.safetensors` files were created using `--act-order` to give the maximum possible quantisation quality, but this means it requires that the latest GPTQ-for-LLaMa is used inside the UI.
 If you want to use the act-order `safetensors` files and need to update the Triton branch of GPTQ-for-LLaMa, here are the commands I used to clone the Triton branch of GPTQ-for-LLaMa, clone text-generation-webui, and install GPTQ into the UI:
 ```
 Then install this model into `text-generation-webui/models` and launch the UI as follows:
 ```
 cd text-generation-webui
+python server.py --model OpenAssistant-SFT-7-Llama-30B-GPTQ --wbits 4 --groupsize 128 --model_type Llama # add any other command line args you want
+```
+To update the CUDA branch of GPTQ-for-LLaMa, you can do the following. **This requires a C/C++ compiler and the CUDA toolkit installed!**
+```
+# Clone text-generation-webui, if you don't already have it
+git clone https://github.com/oobabooga/text-generation-webui
+# Make a repositories directory
+mkdir text-generation-webui/repositories
+cd text-generation-webui/repositories
+# Clone the latest GPTQ-for-LLaMa code inside text-generation-webui
+git clone -b cuda https://github.com/qwopqwop200/GPTQ-for-LLaMa
+cd GPTQ-for-LLaMa
+pip uninstall quant-cuda # uninstall existing CUDA version
+python setup_cuda.py install # install latest version
 ```
 The above commands assume you have installed all dependencies for GPTQ-for-LLaMa and text-generation-webui. Please see their respective repositories for further information.
+If you can't update GPTQ-for-LLaMa or don't want to, please use a `compat.no-act-order.safetensor` file.
 # Original model card