Text Generation
Transformers
Safetensors
English
llama
causal-lm
text-generation-inference
4-bit precision
gptq
TheBloke commited on
Commit
e202309
1 Parent(s): b2e5792

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -8
README.md CHANGED
@@ -26,19 +26,21 @@ This model works best with the following prompt template:
26
 
27
  ## GIBBERISH OUTPUT IN `text-generation-webui`?
28
 
29
- Please read the Provided Files section below. You should use `stable-vicuna-13B-GPTQ-4bit.no-act-order.safetensors` unless you are able to use the latest GPTQ-for-LLaMa code.
30
 
31
- If you're using a text-generation-webui one click installer, you MUST use `stable-vicuna-13B-GPTQ-4bit.no-act-order.safetensors`.
32
 
33
  ## Provided files
34
 
35
- Two files are provided. **The second file will not work unless you use a recent version of GPTQ-for-LLaMa**
 
 
36
 
37
  Specifically, the second file uses `--act-order` for maximum quantisation quality and will not work with oobabooga's fork of GPTQ-for-LLaMa. Therefore at this time it will also not work with `text-generation-webui` one-click installers.
38
 
39
- Unless you are able to use the latest GPTQ-for-LLaMa code, please use `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors`.
40
 
41
- * `stable-vicuna-13B-GPTQ-4bit.no-act-order.safetensors`
42
  * Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
43
  * Works with text-generation-webui one-click-installers
44
  * Works on Windows
@@ -47,7 +49,7 @@ Unless you are able to use the latest GPTQ-for-LLaMa code, please use `wizardLM-
47
  ```
48
  CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.no-act-order.safetensors
49
  ```
50
- * `stable-vicuna-13B-GPTQ-4bit.act-order.safetensors`
51
  * Only works with recent GPTQ-for-LLaMa code
52
  * **Does not** work with text-generation-webui one-click-installers
53
  * Parameters: Groupsize = 128g. act-order.
@@ -57,9 +59,23 @@ Unless you are able to use the latest GPTQ-for-LLaMa code, please use `wizardLM-
57
  CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.act-order.safetensors
58
  ```
59
 
60
- ## How to run in `text-generation-webui`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
 
62
- File `stable-vicuna-13B-GPTQ-4bit.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
63
 
64
  [Instructions on using GPTQ 4bit files in text-generation-webui are here](https://github.com/oobabooga/text-generation-webui/wiki/GPTQ-models-\(4-bit-mode\)).
65
 
 
26
 
27
  ## GIBBERISH OUTPUT IN `text-generation-webui`?
28
 
29
+ Please read the Provided Files section below. You should use `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors` unless you are able to use the latest GPTQ-for-LLaMa code.
30
 
31
+ If you're using a text-generation-webui one click installer, you MUST use `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`.
32
 
33
  ## Provided files
34
 
35
+ Two files are provided. **The 'latest' file will not work unless you use a recent version of GPTQ-for-LLaMa**
36
+
37
+ If you do an automatic download with `text-generation-webui` it will pick the 'compat' file which should work for everyone.
38
 
39
  Specifically, the second file uses `--act-order` for maximum quantisation quality and will not work with oobabooga's fork of GPTQ-for-LLaMa. Therefore at this time it will also not work with `text-generation-webui` one-click installers.
40
 
41
+ Unless you are able to use the latest GPTQ-for-LLaMa code, please use `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`.
42
 
43
+ * `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`
44
  * Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
45
  * Works with text-generation-webui one-click-installers
46
  * Works on Windows
 
49
  ```
50
  CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.no-act-order.safetensors
51
  ```
52
+ * `stable-vicuna-13B-GPTQ-4bit.latest.act-order.safetensors`
53
  * Only works with recent GPTQ-for-LLaMa code
54
  * **Does not** work with text-generation-webui one-click-installers
55
  * Parameters: Groupsize = 128g. act-order.
 
59
  CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.act-order.safetensors
60
  ```
61
 
62
+ ## How to easily download and use a model in text-generation-webui
63
+
64
+ Load text-generation-webui as you normally do.
65
+
66
+ 1. Click the **Model tab**.
67
+ 2. Under **Download custom model or LoRA**, enter the repo name to download: `TheBloke/stable-vicuna-13B-GPTQ`.
68
+ 3. Click **Download**.
69
+ 4. Wait until it says it's finished downloading.
70
+ 5. As this is a GPTQ model, fill in the `GPTQ parameters` on the right: `Bits = 4`, `Groupsize = 128`, `model_type = Llama`
71
+ 6. Now click the **Refresh** icon next to **Model** in the top left.
72
+ 7. In the **Model drop-down**: choose the model you just downloaded, eg `stable-vicuna-13B-GPTQ`.
73
+ 8. Click **Reload the Model** in the top right.
74
+ 9. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
75
+
76
+ ## Manual instructions for `text-generation-webui`
77
 
78
+ File `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
79
 
80
  [Instructions on using GPTQ 4bit files in text-generation-webui are here](https://github.com/oobabooga/text-generation-webui/wiki/GPTQ-models-\(4-bit-mode\)).
81