Text Generation
Transformers
Safetensors
English
llama
causal-lm
text-generation-inference
4-bit precision
gptq
TheBloke commited on
Commit
2ec34de
1 Parent(s): b149ae1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -17
README.md CHANGED
@@ -38,28 +38,18 @@ Load text-generation-webui as you normally do.
38
  8. Click **Reload the Model** in the top right.
39
  9. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
40
 
41
- Note that the automatic download will currently download two model files, and you only need one.
42
-
43
- Feel free to delete `stable-vicuna-13B-GPTQ-4bit.latest.no-act-order.safetensors` after it's downloaded (unless you know you can use latest GPTQ-for-LLaMa code as described below, in which case delete the 'compat' file instead.)
44
-
45
- I will soon improve this repo so only one file is downloaded.
46
-
47
- ## GIBBERISH OUTPUT IN `text-generation-webui`?
48
-
49
- If you're installing the model files manually, please read the Provided Files section below. You should use `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors` unless you are able to use the latest GPTQ-for-LLaMa code.
50
 
51
- If you're using a text-generation-webui one click installer, you MUST use `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`.
52
 
53
- ## Provided files
54
 
55
- Two files are provided. **The 'latest' file will not work unless you use a recent version of GPTQ-for-LLaMa**
56
 
57
- If you do an automatic download with `text-generation-webui` as described above it will pick the 'compat' file which should work for everyone.
58
 
59
- The 'latest' file uses `--act-order` for maximum quantisation quality and will not work with oobabooga's fork of GPTQ-for-LLaMa. Therefore at this time it will also not work with `text-generation-webui` one-click installers.
60
 
61
- Unless you are able to use the latest GPTQ-for-LLaMa code, please use `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`.
62
-
63
  * `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`
64
  * Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
65
  * Works with text-generation-webui one-click-installers
@@ -69,6 +59,13 @@ Unless you are able to use the latest GPTQ-for-LLaMa code, please use `stable-vi
69
  ```
70
  CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.no-act-order.safetensors
71
  ```
 
 
 
 
 
 
 
72
  * `stable-vicuna-13B-GPTQ-4bit.latest.act-order.safetensors`
73
  * Only works with recent GPTQ-for-LLaMa code
74
  * **Does not** work with text-generation-webui one-click-installers
@@ -78,7 +75,7 @@ Unless you are able to use the latest GPTQ-for-LLaMa code, please use `stable-vi
78
  ```
79
  CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.act-order.safetensors
80
  ```
81
-
82
  ## Manual instructions for `text-generation-webui`
83
 
84
  File `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
 
38
  8. Click **Reload the Model** in the top right.
39
  9. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
40
 
41
+ ## Provided files
 
 
 
 
 
 
 
 
42
 
43
+ I have uploaded two versions of the GPTQ.
44
 
45
+ **Compatible file - stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors**
46
 
47
+ In the `main` branch - the default one - you will find `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`
48
 
49
+ This will work with all versions of GPTQ-for-LLaMa. It has maximum compatibility
50
 
51
+ It was created without the `--act-order` parameter. It may have slightly lower inference quality compared to the other file, but is guaranteed to work on all versions of GPTQ-for-LLaMa and text-generation-webui.
52
 
 
 
53
  * `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors`
54
  * Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
55
  * Works with text-generation-webui one-click-installers
 
59
  ```
60
  CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.no-act-order.safetensors
61
  ```
62
+
63
+ **Latest file - stable-vicuna-13B-GPTQ-4bit.latest.act-order.safetensors**
64
+
65
+ Created for more recent versions of GPTQ-for-LLaMa, and uses the `--act-order` flag for maximum theoretical performance.
66
+
67
+ To access this file, please switch to the `latest` branch fo this repo and download from there.
68
+
69
  * `stable-vicuna-13B-GPTQ-4bit.latest.act-order.safetensors`
70
  * Only works with recent GPTQ-for-LLaMa code
71
  * **Does not** work with text-generation-webui one-click-installers
 
75
  ```
76
  CUDA_VISIBLE_DEVICES=0 python3 llama.py stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors stable-vicuna-13B-GPTQ-4bit.act-order.safetensors
77
  ```
78
+
79
  ## Manual instructions for `text-generation-webui`
80
 
81
  File `stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).