Update README.md
Browse files
README.md
CHANGED
@@ -1,11 +1,13 @@
|
|
1 |
---
|
2 |
datasets:
|
3 |
- tiiuae/falcon-refinedweb
|
|
|
4 |
language:
|
5 |
- en
|
6 |
inference: false
|
7 |
---
|
8 |
|
|
|
9 |
<div style="width: 100%;">
|
10 |
<img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
|
11 |
</div>
|
@@ -17,7 +19,7 @@ inference: false
|
|
17 |
<p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
|
18 |
</div>
|
19 |
</div>
|
20 |
-
|
21 |
|
22 |
# Falcon-40B-Instruct GPTQ
|
23 |
|
@@ -29,29 +31,27 @@ It is the result of quantising to 4bit using [AutoGPTQ](https://github.com/PanQi
|
|
29 |
|
30 |
Please note this is an experimental GPTQ model. Support for it is currently quite limited.
|
31 |
|
32 |
-
It is also expected to be **VERY SLOW**. This is unavoidable
|
33 |
|
34 |
-
|
35 |
|
36 |
-
|
37 |
-
2. `pip install einops`
|
38 |
|
39 |
-
|
40 |
|
41 |
-
|
42 |
|
43 |
-
To install AutoGPTQ please follow these instructions:
|
44 |
```
|
45 |
git clone https://github.com/PanQiWei/AutoGPTQ
|
46 |
cd AutoGPTQ
|
47 |
pip install .
|
48 |
```
|
49 |
|
50 |
-
These steps will require that you have the [Nvidia CUDA toolkit](https://developer.nvidia.com/cuda-12-0-1-download-archive) installed.
|
51 |
|
52 |
## text-generation-webui
|
53 |
|
54 |
-
There is
|
55 |
|
56 |
This requires text-generation-webui as of commit 204731952ae59d79ea3805a425c73dd171d943c3.
|
57 |
|
@@ -78,14 +78,9 @@ In this repo you can see two `.py` files - these are the files that get executed
|
|
78 |
|
79 |
## Simple Python example code
|
80 |
|
81 |
-
To run this code you need to install AutoGPTQ
|
82 |
-
```
|
83 |
-
git clone https://github.com/PanQiWei/AutoGPTQ
|
84 |
-
cd AutoGPTQ
|
85 |
-
pip install . # This step requires CUDA toolkit installed
|
86 |
-
```
|
87 |
-
And install einops:
|
88 |
```
|
|
|
89 |
pip install einops
|
90 |
```
|
91 |
|
@@ -96,7 +91,7 @@ from transformers import AutoTokenizer
|
|
96 |
from auto_gptq import AutoGPTQForCausalLM
|
97 |
|
98 |
# Download the model from HF and store it locally, then reference its location here:
|
99 |
-
quantized_model_dir = "/path/to/
|
100 |
|
101 |
from transformers import AutoTokenizer
|
102 |
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, use_fast=False)
|
@@ -113,13 +108,13 @@ print(tokenizer.decode(output[0]))
|
|
113 |
|
114 |
## Provided files
|
115 |
|
116 |
-
**gptq_model-4bit.safetensors**
|
117 |
|
118 |
This will work with AutoGPTQ as of commit `3cb1bf5` (`3cb1bf5a6d43a06dc34c6442287965d1838303d3`)
|
119 |
|
120 |
-
It was created with
|
121 |
|
122 |
-
* `gptq_model-4bit.safetensors`
|
123 |
* Works only with latest AutoGPTQ CUDA, compiled from source as of commit `3cb1bf5`
|
124 |
* At this time it does not work with AutoGPTQ Triton, but support will hopefully be added in time.
|
125 |
* Works with text-generation-webui using `--autogptq --trust_remote_code`
|
@@ -127,6 +122,7 @@ It was created with no groupsize to reduce VRAM requirements as much as possible
|
|
127 |
* Does not work with any version of GPTQ-for-LLaMa
|
128 |
* Parameters: Groupsize = 64. No act-order.
|
129 |
|
|
|
130 |
## Discord
|
131 |
|
132 |
For further support, and discussions on these models and AI in general, join us at: [TheBloke AI's Discord server](https://discord.gg/UBgz4VXf)
|
@@ -144,9 +140,11 @@ Donaters will get priority support on any and all AI/LLM/model questions, plus o
|
|
144 |
* Patreon: https://patreon.com/TheBlokeAI
|
145 |
* Ko-Fi: https://ko-fi.com/TheBlokeAI
|
146 |
|
147 |
-
**Patreon special mentions**: Aemon Algiz; Talal Aujan; Jonathan Leane; Illia Dulskyi; Khalefa Al-Ahmad;
|
148 |
-
senxiiz. Thank you all, and to all my other generous patrons and donaters.
|
149 |
|
|
|
|
|
|
|
150 |
# ✨ Original model card: Falcon-40B-Instruct
|
151 |
|
152 |
# ✨ Falcon-40B-Instruct
|
|
|
1 |
---
|
2 |
datasets:
|
3 |
- tiiuae/falcon-refinedweb
|
4 |
+
license: apache-2.0
|
5 |
language:
|
6 |
- en
|
7 |
inference: false
|
8 |
---
|
9 |
|
10 |
+
<!-- header start -->
|
11 |
<div style="width: 100%;">
|
12 |
<img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
|
13 |
</div>
|
|
|
19 |
<p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
|
20 |
</div>
|
21 |
</div>
|
22 |
+
<!-- header end -->
|
23 |
|
24 |
# Falcon-40B-Instruct GPTQ
|
25 |
|
|
|
31 |
|
32 |
Please note this is an experimental GPTQ model. Support for it is currently quite limited.
|
33 |
|
34 |
+
It is also expected to be **VERY SLOW**. This is currently unavoidable, but is being looked at.
|
35 |
|
36 |
+
## AutoGPTQ
|
37 |
|
38 |
+
AutoGPTQ is required: `pip install auto-gptq`
|
|
|
39 |
|
40 |
+
AutoGPTQ provides pre-compiled wheels for Windows and Linux, with CUDA toolkit 11.7 or 11.8.
|
41 |
|
42 |
+
If you are running CUDA toolkit 12.x, you will need to compile your own by following these instructions:
|
43 |
|
|
|
44 |
```
|
45 |
git clone https://github.com/PanQiWei/AutoGPTQ
|
46 |
cd AutoGPTQ
|
47 |
pip install .
|
48 |
```
|
49 |
|
50 |
+
These manual steps will require that you have the [Nvidia CUDA toolkit](https://developer.nvidia.com/cuda-12-0-1-download-archive) installed.
|
51 |
|
52 |
## text-generation-webui
|
53 |
|
54 |
+
There is provisional AutoGPTQ support in text-generation-webui.
|
55 |
|
56 |
This requires text-generation-webui as of commit 204731952ae59d79ea3805a425c73dd171d943c3.
|
57 |
|
|
|
78 |
|
79 |
## Simple Python example code
|
80 |
|
81 |
+
To run this code you need to install AutoGPTQ and einops:
|
|
|
|
|
|
|
|
|
|
|
|
|
82 |
```
|
83 |
+
pip install auto-gptq
|
84 |
pip install einops
|
85 |
```
|
86 |
|
|
|
91 |
from auto_gptq import AutoGPTQForCausalLM
|
92 |
|
93 |
# Download the model from HF and store it locally, then reference its location here:
|
94 |
+
quantized_model_dir = "/path/to/falcon40b-instruct-gptq"
|
95 |
|
96 |
from transformers import AutoTokenizer
|
97 |
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, use_fast=False)
|
|
|
108 |
|
109 |
## Provided files
|
110 |
|
111 |
+
**gptq_model-4bit-64g.safetensors**
|
112 |
|
113 |
This will work with AutoGPTQ as of commit `3cb1bf5` (`3cb1bf5a6d43a06dc34c6442287965d1838303d3`)
|
114 |
|
115 |
+
It was created with groupsize 64 to give higher inference quality, and without `desc_act` (act-order) to increase inference speed.
|
116 |
|
117 |
+
* `gptq_model-4bit-64g.safetensors`
|
118 |
* Works only with latest AutoGPTQ CUDA, compiled from source as of commit `3cb1bf5`
|
119 |
* At this time it does not work with AutoGPTQ Triton, but support will hopefully be added in time.
|
120 |
* Works with text-generation-webui using `--autogptq --trust_remote_code`
|
|
|
122 |
* Does not work with any version of GPTQ-for-LLaMa
|
123 |
* Parameters: Groupsize = 64. No act-order.
|
124 |
|
125 |
+
<!-- footer start -->
|
126 |
## Discord
|
127 |
|
128 |
For further support, and discussions on these models and AI in general, join us at: [TheBloke AI's Discord server](https://discord.gg/UBgz4VXf)
|
|
|
140 |
* Patreon: https://patreon.com/TheBlokeAI
|
141 |
* Ko-Fi: https://ko-fi.com/TheBlokeAI
|
142 |
|
143 |
+
**Patreon special mentions**: Aemon Algiz; Johann-Peter Hartmann; Talal Aujan; Jonathan Leane; Illia Dulskyi; Khalefa Al-Ahmad; senxiiz; Sebastain Graf; Eugene Pentland; Nikolai Manek; Luke Pendergrass.
|
|
|
144 |
|
145 |
+
Thank you to all my generous patrons and donaters.
|
146 |
+
<!-- footer end -->
|
147 |
+
|
148 |
# ✨ Original model card: Falcon-40B-Instruct
|
149 |
|
150 |
# ✨ Falcon-40B-Instruct
|