TheBloke
/

wizardLM-7B-GPTQ

@@ -18,19 +18,33 @@ This repo contains 4bit GPTQ models for GPU inference, quantised using [GPTQ-for
 * [4bit GGML models for CPU inference](https://huggingface.co/TheBloke/wizardLM-7B-GGML)
 * [Unquantised model in HF format](https://huggingface.co/TheBloke/wizardLM-7B-HF)
 ## GIBBERISH OUTPUT IN `text-generation-webui`?
-Please read the Provided Files section below. You should use `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors` unless you are able to use the latest GPTQ-for-LLaMa code.
-If you're using a text-generation-webui one click installer, you MUST use `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors`.
 ## Provided files
-Two files are provided. **The second file will not work unless you use a recent version of GPTQ-for-LLaMa**
-Specifically, the second file uses `--act-order` for maximum quantisation quality and will not work with oobabooga's fork of GPTQ-for-LLaMa. Therefore at this time it will also not work with `text-generation-webui` one-click installers.
-Unless you are able to use the latest GPTQ-for-LLaMa code, please use `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors`.
 * `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors`
   * Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
@@ -50,7 +64,7 @@ Unless you are able to use the latest GPTQ-for-LLaMa code, please use `wizardLM-
     CUDA_VISIBLE_DEVICES=0 python3 llama.py wizardLM-7B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors wizardLM-7B-GPTQ-4bit-128g.act-order.safetensors
     ```
-## How to run in `text-generation-webui`
 File `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).

 * [4bit GGML models for CPU inference](https://huggingface.co/TheBloke/wizardLM-7B-GGML)
 * [Unquantised model in HF format](https://huggingface.co/TheBloke/wizardLM-7B-HF)
+## How to easily download and use this model in text-generation-webui
+Load text-generation-webui as you normally do.
+1. Click the **Model tab**.
+2. Under **Download custom model or LoRA**, enter this repo name: `TheBloke/wizardLM-7B-GPTQ`.
+3. Click **Download**.
+4. Wait until it says it's finished downloading.
+5. As this is a GPTQ model, fill in the `GPTQ parameters` on the right: `Bits = 4`, `Groupsize = 128`, `model_type = Llama`
+6. Now click the **Refresh** icon next to **Model** in the top left.
+7. In the **Model drop-down**: choose this model: `wizardLM-7B-GPTQ`.
+8. Click **Reload the Model** in the top right.
+9. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
 ## GIBBERISH OUTPUT IN `text-generation-webui`?
+Please read the Provided Files section below. You should use `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors` unless you are able to use the latest GPTQ-for-LLaMa code.
+If you're using a text-generation-webui one click installer, you MUST use `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors`.
 ## Provided files
+Two files are provided. **The 'latest' file will not work unless you use a recent version of GPTQ-for-LLaMa**
+Specifically, the 'latest' file uses `--act-order` for maximum quantisation quality and will not work with oobabooga's fork of GPTQ-for-LLaMa. Therefore at this time it will also not work with `text-generation-webui` one-click installers.
+The 'compat' file will be used by default in text-generation-webui so you don't need to do anything special to use it.  If you want to use the 'latest' file, please remove the 'cmopat' file - but only do this if you are able to use the latest GPTQ-for-LLaMa code.
 * `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors`
   * Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
     CUDA_VISIBLE_DEVICES=0 python3 llama.py wizardLM-7B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors wizardLM-7B-GPTQ-4bit-128g.act-order.safetensors
     ```
+## How to install manually in `text-generation-webui` and update GPTQ-for-LLaMa if necessary
 File `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).