TheBloke
/

robin-33B-v2-GPTQ

@@ -17,17 +17,17 @@ license: other
 </div>
 <!-- header end -->
-# OptimalScale's Robin 33B GPTQ
-These files are GPTQ 4bit model files for [OptimalScale's Robin 33B](https://huggingface.co/OptimalScale/robin-33b-v2-delta).
 It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
 ## Repositories available
-* [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/robin-33B-GPTQ)
-* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/robin-33B-GGML)
-* [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/TheBloke/robin-33B-fp16)
 ## Prompt template
@@ -42,11 +42,11 @@ A chat between a curious human and an artificial intelligence assistant. The ass
 Please make sure you're using the latest version of text-generation-webui
 1. Click the **Model tab**.
-2. Under **Download custom model or LoRA**, enter `TheBloke/robin-33B-GPTQ`.
 3. Click **Download**.
 4. The model will start downloading. Once it's finished it will say "Done"
 5. In the top left, click the refresh icon next to **Model**.
-6. In the **Model** dropdown, choose the model you just downloaded: `robin-33B-GPTQ`
 7. The model will automatically load, and is now ready for use!
 8. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
   * Note that you do not need to and should not set manual GPTQ parameters any more. These are set automatically from the file `quantize_config.json`.
@@ -65,7 +65,7 @@ from transformers import AutoTokenizer, pipeline, logging
 from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
 import argparse
-model_name_or_path = "TheBloke/robin-33B-GPTQ"
 model_basename = "robin-33b-GPTQ-4bit--1g.act.order"
 use_triton = False
@@ -152,6 +152,6 @@ Thank you to all my generous patrons and donaters!
 <!-- footer end -->
-# Original model card: OptimalScale's Robin 33B
 No model card provided in source repository.

 </div>
 <!-- header end -->
+# OptimalScale's Robin 33B v2 GPTQ
+These files are GPTQ 4bit model files for [OptimalScale's Robin 33B v2](https://huggingface.co/OptimalScale/robin-33b-v2-delta).
 It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
 ## Repositories available
+* [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/robin-33B-v2-GPTQ)
+* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/robin-33B-v2-GGML)
+* [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/TheBloke/robin-33B-v2-fp16)
 ## Prompt template
 Please make sure you're using the latest version of text-generation-webui
 1. Click the **Model tab**.
+2. Under **Download custom model or LoRA**, enter `TheBloke/robin-33B-v2-GPTQ`.
 3. Click **Download**.
 4. The model will start downloading. Once it's finished it will say "Done"
 5. In the top left, click the refresh icon next to **Model**.
+6. In the **Model** dropdown, choose the model you just downloaded: `robin-33B-v2-GPTQ`
 7. The model will automatically load, and is now ready for use!
 8. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
   * Note that you do not need to and should not set manual GPTQ parameters any more. These are set automatically from the file `quantize_config.json`.
 from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
 import argparse
+model_name_or_path = "TheBloke/robin-33B-v2-GPTQ"
 model_basename = "robin-33b-GPTQ-4bit--1g.act.order"
 use_triton = False
 <!-- footer end -->
+# Original model card: OptimalScale's Robin 33B v2
 No model card provided in source repository.