TheBloke commited on
Commit
c5c455d
·
1 Parent(s): e5d377f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -15
README.md CHANGED
@@ -18,16 +18,6 @@ This repo contains 4bit GPTQ models for GPU inference, quantised using [GPTQ-for
18
  * [4bit GGML models for CPU inference](https://huggingface.co/TheBloke/wizardLM-7B-GGML)
19
  * [Unquantised model in HF format](https://huggingface.co/TheBloke/wizardLM-7B-HF)
20
 
21
- ## PERFORMANCE ISSUES?
22
-
23
- We were having significant performance problems with these GPTQs until **ionizedTexasMan** figured out the problem.
24
-
25
- The first upload of this model had `use_cache: false` in config.json. This caused significantly lower performance, especially with CUDA GPTQ-for-LLaMa.
26
-
27
- This is now fixed in the repo. If you already downloaded `config.json`, please re-download it or manually edit it to `use_cache: true`.
28
-
29
- With that change these GPTQs will now perform as well as any other 7B.
30
-
31
  ## GIBBERISH OUTPUT IN `text-generation-webui`?
32
 
33
  Please read the Provided Files section below. You should use `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors` unless you are able to use the latest GPTQ-for-LLaMa code.
@@ -42,16 +32,15 @@ Specifically, the second file uses `--act-order` for maximum quantisation qualit
42
 
43
  Unless you are able to use the latest GPTQ-for-LLaMa code, please use `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors`.
44
 
45
- * `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors`
46
  * Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
47
  * Works with text-generation-webui one-click-installers
48
- * Works on Windows
49
  * Parameters: Groupsize = 128g. No act-order.
50
  * Command used to create the GPTQ:
51
  ```
52
  CUDA_VISIBLE_DEVICES=0 python3 llama.py wizardLM-7B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors
53
  ```
54
- * `wizardLM-7B-GPTQ-4bit-128g.act-order.safetensors`
55
  * Only works with recent GPTQ-for-LLaMa code
56
  * **Does not** work with text-generation-webui one-click-installers
57
  * Parameters: Groupsize = 128g. act-order.
@@ -63,7 +52,7 @@ Unless you are able to use the latest GPTQ-for-LLaMa code, please use `wizardLM-
63
 
64
  ## How to run in `text-generation-webui`
65
 
66
- File `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
67
 
68
  [Instructions on using GPTQ 4bit files in text-generation-webui are here](https://github.com/oobabooga/text-generation-webui/wiki/GPTQ-models-\(4-bit-mode\)).
69
 
@@ -88,7 +77,7 @@ python server.py --model wizardLM-7B-GPTQ --wbits 4 --groupsize 128 --model_type
88
 
89
  The above commands assume you have installed all dependencies for GPTQ-for-LLaMa and text-generation-webui. Please see their respective repositories for further information.
90
 
91
- If you can't update GPTQ-for-LLaMa or don't want to, you can use `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors` as mentioned above, which should work without any upgrades to text-generation-webui.
92
 
93
  # Original model info
94
 
 
18
  * [4bit GGML models for CPU inference](https://huggingface.co/TheBloke/wizardLM-7B-GGML)
19
  * [Unquantised model in HF format](https://huggingface.co/TheBloke/wizardLM-7B-HF)
20
 
 
 
 
 
 
 
 
 
 
 
21
  ## GIBBERISH OUTPUT IN `text-generation-webui`?
22
 
23
  Please read the Provided Files section below. You should use `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors` unless you are able to use the latest GPTQ-for-LLaMa code.
 
32
 
33
  Unless you are able to use the latest GPTQ-for-LLaMa code, please use `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors`.
34
 
35
+ * `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors`
36
  * Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
37
  * Works with text-generation-webui one-click-installers
 
38
  * Parameters: Groupsize = 128g. No act-order.
39
  * Command used to create the GPTQ:
40
  ```
41
  CUDA_VISIBLE_DEVICES=0 python3 llama.py wizardLM-7B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors
42
  ```
43
+ * `wizardLM-7B-GPTQ-4bit-128g.latest.act-order.safetensors`
44
  * Only works with recent GPTQ-for-LLaMa code
45
  * **Does not** work with text-generation-webui one-click-installers
46
  * Parameters: Groupsize = 128g. act-order.
 
52
 
53
  ## How to run in `text-generation-webui`
54
 
55
+ File `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
56
 
57
  [Instructions on using GPTQ 4bit files in text-generation-webui are here](https://github.com/oobabooga/text-generation-webui/wiki/GPTQ-models-\(4-bit-mode\)).
58
 
 
77
 
78
  The above commands assume you have installed all dependencies for GPTQ-for-LLaMa and text-generation-webui. Please see their respective repositories for further information.
79
 
80
+ If you can't update GPTQ-for-LLaMa or don't want to, you can use `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors` as mentioned above, which should work without any upgrades to text-generation-webui.
81
 
82
  # Original model info
83