Update README.md
Browse files
README.md
CHANGED
@@ -17,17 +17,17 @@ license: other
|
|
17 |
</div>
|
18 |
<!-- header end -->
|
19 |
|
20 |
-
# OptimalScale's Robin 33B GPTQ
|
21 |
|
22 |
-
These files are GPTQ 4bit model files for [OptimalScale's Robin 33B](https://huggingface.co/OptimalScale/robin-33b-v2-delta).
|
23 |
|
24 |
It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
|
25 |
|
26 |
## Repositories available
|
27 |
|
28 |
-
* [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/robin-33B-GPTQ)
|
29 |
-
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/robin-33B-GGML)
|
30 |
-
* [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/TheBloke/robin-33B-fp16)
|
31 |
|
32 |
## Prompt template
|
33 |
|
@@ -42,11 +42,11 @@ A chat between a curious human and an artificial intelligence assistant. The ass
|
|
42 |
Please make sure you're using the latest version of text-generation-webui
|
43 |
|
44 |
1. Click the **Model tab**.
|
45 |
-
2. Under **Download custom model or LoRA**, enter `TheBloke/robin-33B-GPTQ`.
|
46 |
3. Click **Download**.
|
47 |
4. The model will start downloading. Once it's finished it will say "Done"
|
48 |
5. In the top left, click the refresh icon next to **Model**.
|
49 |
-
6. In the **Model** dropdown, choose the model you just downloaded: `robin-33B-GPTQ`
|
50 |
7. The model will automatically load, and is now ready for use!
|
51 |
8. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
|
52 |
* Note that you do not need to and should not set manual GPTQ parameters any more. These are set automatically from the file `quantize_config.json`.
|
@@ -65,7 +65,7 @@ from transformers import AutoTokenizer, pipeline, logging
|
|
65 |
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
|
66 |
import argparse
|
67 |
|
68 |
-
model_name_or_path = "TheBloke/robin-33B-GPTQ"
|
69 |
model_basename = "robin-33b-GPTQ-4bit--1g.act.order"
|
70 |
|
71 |
use_triton = False
|
@@ -152,6 +152,6 @@ Thank you to all my generous patrons and donaters!
|
|
152 |
|
153 |
<!-- footer end -->
|
154 |
|
155 |
-
# Original model card: OptimalScale's Robin 33B
|
156 |
|
157 |
No model card provided in source repository.
|
|
|
17 |
</div>
|
18 |
<!-- header end -->
|
19 |
|
20 |
+
# OptimalScale's Robin 33B v2 GPTQ
|
21 |
|
22 |
+
These files are GPTQ 4bit model files for [OptimalScale's Robin 33B v2](https://huggingface.co/OptimalScale/robin-33b-v2-delta).
|
23 |
|
24 |
It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
|
25 |
|
26 |
## Repositories available
|
27 |
|
28 |
+
* [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/robin-33B-v2-GPTQ)
|
29 |
+
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/robin-33B-v2-GGML)
|
30 |
+
* [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/TheBloke/robin-33B-v2-fp16)
|
31 |
|
32 |
## Prompt template
|
33 |
|
|
|
42 |
Please make sure you're using the latest version of text-generation-webui
|
43 |
|
44 |
1. Click the **Model tab**.
|
45 |
+
2. Under **Download custom model or LoRA**, enter `TheBloke/robin-33B-v2-GPTQ`.
|
46 |
3. Click **Download**.
|
47 |
4. The model will start downloading. Once it's finished it will say "Done"
|
48 |
5. In the top left, click the refresh icon next to **Model**.
|
49 |
+
6. In the **Model** dropdown, choose the model you just downloaded: `robin-33B-v2-GPTQ`
|
50 |
7. The model will automatically load, and is now ready for use!
|
51 |
8. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
|
52 |
* Note that you do not need to and should not set manual GPTQ parameters any more. These are set automatically from the file `quantize_config.json`.
|
|
|
65 |
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
|
66 |
import argparse
|
67 |
|
68 |
+
model_name_or_path = "TheBloke/robin-33B-v2-GPTQ"
|
69 |
model_basename = "robin-33b-GPTQ-4bit--1g.act.order"
|
70 |
|
71 |
use_triton = False
|
|
|
152 |
|
153 |
<!-- footer end -->
|
154 |
|
155 |
+
# Original model card: OptimalScale's Robin 33B v2
|
156 |
|
157 |
No model card provided in source repository.
|