TheBloke
/

wizardLM-7B-GPTQ

@@ -18,16 +18,6 @@ This repo contains 4bit GPTQ models for GPU inference, quantised using [GPTQ-for
 * [4bit GGML models for CPU inference](https://huggingface.co/TheBloke/wizardLM-7B-GGML)
 * [Unquantised model in HF format](https://huggingface.co/TheBloke/wizardLM-7B-HF)
-## PERFORMANCE ISSUES?
-We were having significant performance problems with these GPTQs until **ionizedTexasMan** figured out the problem.
-The first upload of this model had `use_cache: false` in config.json.  This caused significantly lower performance, especially with CUDA GPTQ-for-LLaMa.
-This is now fixed in the repo. If you already downloaded `config.json`, please re-download it or manually edit it to `use_cache: true`.
-With that change these GPTQs will now perform as well as any other 7B.
 ## GIBBERISH OUTPUT IN `text-generation-webui`?
 Please read the Provided Files section below. You should use `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors` unless you are able to use the latest GPTQ-for-LLaMa code.
@@ -42,16 +32,15 @@ Specifically, the second file uses `--act-order` for maximum quantisation qualit
 Unless you are able to use the latest GPTQ-for-LLaMa code, please use `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors`.
-* `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors`
   * Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
   * Works with text-generation-webui one-click-installers
-  * Works on Windows
   * Parameters: Groupsize = 128g. No act-order.
   * Command used to create the GPTQ:
     ```
     CUDA_VISIBLE_DEVICES=0 python3 llama.py wizardLM-7B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors
     ```
-* `wizardLM-7B-GPTQ-4bit-128g.act-order.safetensors`
   * Only works with recent GPTQ-for-LLaMa code
   * **Does not** work with text-generation-webui one-click-installers
   * Parameters: Groupsize = 128g. act-order.
@@ -63,7 +52,7 @@ Unless you are able to use the latest GPTQ-for-LLaMa code, please use `wizardLM-
 ## How to run in `text-generation-webui`
-File `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
 [Instructions on using GPTQ 4bit files in text-generation-webui are here](https://github.com/oobabooga/text-generation-webui/wiki/GPTQ-models-\(4-bit-mode\)).
@@ -88,7 +77,7 @@ python server.py --model wizardLM-7B-GPTQ --wbits 4 --groupsize 128 --model_type
 The above commands assume you have installed all dependencies for GPTQ-for-LLaMa and text-generation-webui. Please see their respective repositories for further information.
-If you can't update GPTQ-for-LLaMa or don't want to, you can use `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors` as mentioned above, which should work without any upgrades to text-generation-webui.
 # Original model info

 * [4bit GGML models for CPU inference](https://huggingface.co/TheBloke/wizardLM-7B-GGML)
 * [Unquantised model in HF format](https://huggingface.co/TheBloke/wizardLM-7B-HF)
 ## GIBBERISH OUTPUT IN `text-generation-webui`?
 Please read the Provided Files section below. You should use `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors` unless you are able to use the latest GPTQ-for-LLaMa code.
 Unless you are able to use the latest GPTQ-for-LLaMa code, please use `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors`.
+* `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors`
   * Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
   * Works with text-generation-webui one-click-installers
   * Parameters: Groupsize = 128g. No act-order.
   * Command used to create the GPTQ:
     ```
     CUDA_VISIBLE_DEVICES=0 python3 llama.py wizardLM-7B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors
     ```
+* `wizardLM-7B-GPTQ-4bit-128g.latest.act-order.safetensors`
   * Only works with recent GPTQ-for-LLaMa code
   * **Does not** work with text-generation-webui one-click-installers
   * Parameters: Groupsize = 128g. act-order.
 ## How to run in `text-generation-webui`
+File `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
 [Instructions on using GPTQ 4bit files in text-generation-webui are here](https://github.com/oobabooga/text-generation-webui/wiki/GPTQ-models-\(4-bit-mode\)).
 The above commands assume you have installed all dependencies for GPTQ-for-LLaMa and text-generation-webui. Please see their respective repositories for further information.
+If you can't update GPTQ-for-LLaMa or don't want to, you can use `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors` as mentioned above, which should work without any upgrades to text-generation-webui.
 # Original model info