TheBloke commited on
Commit
2180bec
1 Parent(s): 17cde00

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -6
README.md CHANGED
@@ -18,19 +18,33 @@ This repo contains 4bit GPTQ models for GPU inference, quantised using [GPTQ-for
18
  * [4bit GGML models for CPU inference](https://huggingface.co/TheBloke/wizardLM-7B-GGML)
19
  * [Unquantised model in HF format](https://huggingface.co/TheBloke/wizardLM-7B-HF)
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ## GIBBERISH OUTPUT IN `text-generation-webui`?
22
 
23
- Please read the Provided Files section below. You should use `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors` unless you are able to use the latest GPTQ-for-LLaMa code.
24
 
25
- If you're using a text-generation-webui one click installer, you MUST use `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors`.
26
 
27
  ## Provided files
28
 
29
- Two files are provided. **The second file will not work unless you use a recent version of GPTQ-for-LLaMa**
30
 
31
- Specifically, the second file uses `--act-order` for maximum quantisation quality and will not work with oobabooga's fork of GPTQ-for-LLaMa. Therefore at this time it will also not work with `text-generation-webui` one-click installers.
32
 
33
- Unless you are able to use the latest GPTQ-for-LLaMa code, please use `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors`.
34
 
35
  * `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors`
36
  * Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
@@ -50,7 +64,7 @@ Unless you are able to use the latest GPTQ-for-LLaMa code, please use `wizardLM-
50
  CUDA_VISIBLE_DEVICES=0 python3 llama.py wizardLM-7B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors wizardLM-7B-GPTQ-4bit-128g.act-order.safetensors
51
  ```
52
 
53
- ## How to run in `text-generation-webui`
54
 
55
  File `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
56
 
 
18
  * [4bit GGML models for CPU inference](https://huggingface.co/TheBloke/wizardLM-7B-GGML)
19
  * [Unquantised model in HF format](https://huggingface.co/TheBloke/wizardLM-7B-HF)
20
 
21
+ ## How to easily download and use this model in text-generation-webui
22
+
23
+ Load text-generation-webui as you normally do.
24
+
25
+ 1. Click the **Model tab**.
26
+ 2. Under **Download custom model or LoRA**, enter this repo name: `TheBloke/wizardLM-7B-GPTQ`.
27
+ 3. Click **Download**.
28
+ 4. Wait until it says it's finished downloading.
29
+ 5. As this is a GPTQ model, fill in the `GPTQ parameters` on the right: `Bits = 4`, `Groupsize = 128`, `model_type = Llama`
30
+ 6. Now click the **Refresh** icon next to **Model** in the top left.
31
+ 7. In the **Model drop-down**: choose this model: `wizardLM-7B-GPTQ`.
32
+ 8. Click **Reload the Model** in the top right.
33
+ 9. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
34
+
35
  ## GIBBERISH OUTPUT IN `text-generation-webui`?
36
 
37
+ Please read the Provided Files section below. You should use `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors` unless you are able to use the latest GPTQ-for-LLaMa code.
38
 
39
+ If you're using a text-generation-webui one click installer, you MUST use `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors`.
40
 
41
  ## Provided files
42
 
43
+ Two files are provided. **The 'latest' file will not work unless you use a recent version of GPTQ-for-LLaMa**
44
 
45
+ Specifically, the 'latest' file uses `--act-order` for maximum quantisation quality and will not work with oobabooga's fork of GPTQ-for-LLaMa. Therefore at this time it will also not work with `text-generation-webui` one-click installers.
46
 
47
+ The 'compat' file will be used by default in text-generation-webui so you don't need to do anything special to use it. If you want to use the 'latest' file, please remove the 'cmopat' file - but only do this if you are able to use the latest GPTQ-for-LLaMa code.
48
 
49
  * `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors`
50
  * Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
 
64
  CUDA_VISIBLE_DEVICES=0 python3 llama.py wizardLM-7B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors wizardLM-7B-GPTQ-4bit-128g.act-order.safetensors
65
  ```
66
 
67
+ ## How to install manually in `text-generation-webui` and update GPTQ-for-LLaMa if necessary
68
 
69
  File `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
70