Update README.md
Browse files
README.md
CHANGED
@@ -18,16 +18,6 @@ This repo contains 4bit GPTQ models for GPU inference, quantised using [GPTQ-for
|
|
18 |
* [4bit GGML models for CPU inference](https://huggingface.co/TheBloke/wizardLM-7B-GGML)
|
19 |
* [Unquantised model in HF format](https://huggingface.co/TheBloke/wizardLM-7B-HF)
|
20 |
|
21 |
-
## PERFORMANCE ISSUES?
|
22 |
-
|
23 |
-
We were having significant performance problems with these GPTQs until **ionizedTexasMan** figured out the problem.
|
24 |
-
|
25 |
-
The first upload of this model had `use_cache: false` in config.json. This caused significantly lower performance, especially with CUDA GPTQ-for-LLaMa.
|
26 |
-
|
27 |
-
This is now fixed in the repo. If you already downloaded `config.json`, please re-download it or manually edit it to `use_cache: true`.
|
28 |
-
|
29 |
-
With that change these GPTQs will now perform as well as any other 7B.
|
30 |
-
|
31 |
## GIBBERISH OUTPUT IN `text-generation-webui`?
|
32 |
|
33 |
Please read the Provided Files section below. You should use `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors` unless you are able to use the latest GPTQ-for-LLaMa code.
|
@@ -42,16 +32,15 @@ Specifically, the second file uses `--act-order` for maximum quantisation qualit
|
|
42 |
|
43 |
Unless you are able to use the latest GPTQ-for-LLaMa code, please use `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors`.
|
44 |
|
45 |
-
* `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors`
|
46 |
* Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
|
47 |
* Works with text-generation-webui one-click-installers
|
48 |
-
* Works on Windows
|
49 |
* Parameters: Groupsize = 128g. No act-order.
|
50 |
* Command used to create the GPTQ:
|
51 |
```
|
52 |
CUDA_VISIBLE_DEVICES=0 python3 llama.py wizardLM-7B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors
|
53 |
```
|
54 |
-
* `wizardLM-7B-GPTQ-4bit-128g.act-order.safetensors`
|
55 |
* Only works with recent GPTQ-for-LLaMa code
|
56 |
* **Does not** work with text-generation-webui one-click-installers
|
57 |
* Parameters: Groupsize = 128g. act-order.
|
@@ -63,7 +52,7 @@ Unless you are able to use the latest GPTQ-for-LLaMa code, please use `wizardLM-
|
|
63 |
|
64 |
## How to run in `text-generation-webui`
|
65 |
|
66 |
-
File `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
|
67 |
|
68 |
[Instructions on using GPTQ 4bit files in text-generation-webui are here](https://github.com/oobabooga/text-generation-webui/wiki/GPTQ-models-\(4-bit-mode\)).
|
69 |
|
@@ -88,7 +77,7 @@ python server.py --model wizardLM-7B-GPTQ --wbits 4 --groupsize 128 --model_type
|
|
88 |
|
89 |
The above commands assume you have installed all dependencies for GPTQ-for-LLaMa and text-generation-webui. Please see their respective repositories for further information.
|
90 |
|
91 |
-
If you can't update GPTQ-for-LLaMa or don't want to, you can use `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors` as mentioned above, which should work without any upgrades to text-generation-webui.
|
92 |
|
93 |
# Original model info
|
94 |
|
|
|
18 |
* [4bit GGML models for CPU inference](https://huggingface.co/TheBloke/wizardLM-7B-GGML)
|
19 |
* [Unquantised model in HF format](https://huggingface.co/TheBloke/wizardLM-7B-HF)
|
20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
## GIBBERISH OUTPUT IN `text-generation-webui`?
|
22 |
|
23 |
Please read the Provided Files section below. You should use `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors` unless you are able to use the latest GPTQ-for-LLaMa code.
|
|
|
32 |
|
33 |
Unless you are able to use the latest GPTQ-for-LLaMa code, please use `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors`.
|
34 |
|
35 |
+
* `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors`
|
36 |
* Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
|
37 |
* Works with text-generation-webui one-click-installers
|
|
|
38 |
* Parameters: Groupsize = 128g. No act-order.
|
39 |
* Command used to create the GPTQ:
|
40 |
```
|
41 |
CUDA_VISIBLE_DEVICES=0 python3 llama.py wizardLM-7B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors
|
42 |
```
|
43 |
+
* `wizardLM-7B-GPTQ-4bit-128g.latest.act-order.safetensors`
|
44 |
* Only works with recent GPTQ-for-LLaMa code
|
45 |
* **Does not** work with text-generation-webui one-click-installers
|
46 |
* Parameters: Groupsize = 128g. act-order.
|
|
|
52 |
|
53 |
## How to run in `text-generation-webui`
|
54 |
|
55 |
+
File `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
|
56 |
|
57 |
[Instructions on using GPTQ 4bit files in text-generation-webui are here](https://github.com/oobabooga/text-generation-webui/wiki/GPTQ-models-\(4-bit-mode\)).
|
58 |
|
|
|
77 |
|
78 |
The above commands assume you have installed all dependencies for GPTQ-for-LLaMa and text-generation-webui. Please see their respective repositories for further information.
|
79 |
|
80 |
+
If you can't update GPTQ-for-LLaMa or don't want to, you can use `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors` as mentioned above, which should work without any upgrades to text-generation-webui.
|
81 |
|
82 |
# Original model info
|
83 |
|