Update README.md
Browse files
README.md
CHANGED
@@ -13,7 +13,7 @@ inference: false
|
|
13 |
</div>
|
14 |
<div style="display: flex; justify-content: space-between; width: 100%;">
|
15 |
<div style="display: flex; flex-direction: column; align-items: flex-start;">
|
16 |
-
<p><a href="https://discord.gg/
|
17 |
</div>
|
18 |
<div style="display: flex; flex-direction: column; align-items: flex-end;">
|
19 |
<p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
|
@@ -32,7 +32,7 @@ It is the result of quantising to 4bit using [AutoGPTQ](https://github.com/PanQi
|
|
32 |
* [4-bit GPTQ model for GPU inference](https://huggingface.co/TheBloke/falcon-40b-instruct-GPTQ)
|
33 |
* [3-bit GPTQ model for GPU inference](https://huggingface.co/TheBloke/falcon-40b-instruct-3bit-GPTQ)
|
34 |
* [Unquantised bf16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/tiiuae/falcon-40b-instruct)
|
35 |
-
|
36 |
## EXPERIMENTAL
|
37 |
|
38 |
Please note this is an experimental GPTQ model. Support for it is currently quite limited.
|
@@ -126,7 +126,7 @@ It was created without groupsize to reduce VRAM requirements, and with `desc_act
|
|
126 |
|
127 |
* `gptq_model-4bit--1g.safetensors`
|
128 |
* Works only with latest AutoGPTQ CUDA, compiled from source as of commit `3cb1bf5`
|
129 |
-
* At this time it does not work with AutoGPTQ Triton, but support will hopefully be added in time.
|
130 |
* Works with text-generation-webui using `--autogptq --trust_remote_code`
|
131 |
* At this time it does NOT work with one-click-installers
|
132 |
* Does not work with any version of GPTQ-for-LLaMa
|
@@ -135,7 +135,9 @@ It was created without groupsize to reduce VRAM requirements, and with `desc_act
|
|
135 |
<!-- footer start -->
|
136 |
## Discord
|
137 |
|
138 |
-
For further support, and discussions on these models and AI in general, join us at:
|
|
|
|
|
139 |
|
140 |
## Thanks, and how to contribute.
|
141 |
|
@@ -143,18 +145,18 @@ Thanks to the [chirper.ai](https://chirper.ai) team!
|
|
143 |
|
144 |
I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.
|
145 |
|
146 |
-
If you're able and willing to contribute
|
147 |
|
148 |
-
Donaters will get priority support on any and all AI/LLM/model questions, plus other benefits.
|
149 |
|
150 |
* Patreon: https://patreon.com/TheBlokeAI
|
151 |
* Ko-Fi: https://ko-fi.com/TheBlokeAI
|
152 |
|
153 |
-
**Patreon special mentions**: Aemon Algiz
|
154 |
|
155 |
-
Thank you to all my generous patrons and donaters
|
156 |
<!-- footer end -->
|
157 |
-
|
158 |
# ✨ Original model card: Falcon-40B-Instruct
|
159 |
|
160 |
# ✨ Falcon-40B-Instruct
|
@@ -167,9 +169,9 @@ Thank you to all my generous patrons and donaters.
|
|
167 |
|
168 |
* **You are looking for a ready-to-use chat/instruct model based on [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b).**
|
169 |
* **Falcon-40B is the best open-source model available.** It outperforms [LLaMA](https://github.com/facebookresearch/llama), [StableLM](https://github.com/Stability-AI/StableLM), [RedPajama](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1), [MPT](https://huggingface.co/mosaicml/mpt-7b), etc. See the [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
|
170 |
-
* **It features an architecture optimized for inference**, with FlashAttention ([Dao et al., 2022](https://arxiv.org/abs/2205.14135)) and multiquery ([Shazeer et al., 2019](https://arxiv.org/abs/1911.02150)).
|
171 |
|
172 |
-
💬 **This is an instruct model, which may not be ideal for further finetuning.** If you are interested in building your own instruct/chat model, we recommend starting from [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b).
|
173 |
|
174 |
💸 **Looking for a smaller, less expensive model?** [Falcon-7B-Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) is Falcon-40B-Instruct's small brother!
|
175 |
|
@@ -228,7 +230,7 @@ Falcon-40B-Instruct has been finetuned on a chat dataset.
|
|
228 |
|
229 |
### Out-of-Scope Use
|
230 |
|
231 |
-
Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.
|
232 |
|
233 |
## Bias, Risks, and Limitations
|
234 |
|
@@ -274,7 +276,7 @@ for seq in sequences:
|
|
274 |
|
275 |
### Training Data
|
276 |
|
277 |
-
Falcon-40B-Instruct was finetuned on a 150M tokens from [Bai ze](https://github.com/project-baize/baize-chatbot) mixed with 5% of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) data.
|
278 |
|
279 |
|
280 |
The data was tokenized with the Falcon-[7B](https://huggingface.co/tiiuae/falcon-7b)/[40B](https://huggingface.co/tiiuae/falcon-40b) tokenizer.
|
@@ -287,7 +289,7 @@ The data was tokenized with the Falcon-[7B](https://huggingface.co/tiiuae/falcon
|
|
287 |
See the [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) for early results.
|
288 |
|
289 |
|
290 |
-
## Technical Specifications
|
291 |
|
292 |
For more information about pretraining, see [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b).
|
293 |
|
@@ -315,7 +317,7 @@ For multiquery, we are using an internal variant which uses independent key and
|
|
315 |
|
316 |
#### Hardware
|
317 |
|
318 |
-
Falcon-40B-Instruct was trained on AWS SageMaker, on 64 A100 40GB GPUs in P4d instances.
|
319 |
|
320 |
#### Software
|
321 |
|
|
|
13 |
</div>
|
14 |
<div style="display: flex; justify-content: space-between; width: 100%;">
|
15 |
<div style="display: flex; flex-direction: column; align-items: flex-start;">
|
16 |
+
<p><a href="https://discord.gg/Jq4vkcDakD">Chat & support: my new Discord server</a></p>
|
17 |
</div>
|
18 |
<div style="display: flex; flex-direction: column; align-items: flex-end;">
|
19 |
<p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
|
|
|
32 |
* [4-bit GPTQ model for GPU inference](https://huggingface.co/TheBloke/falcon-40b-instruct-GPTQ)
|
33 |
* [3-bit GPTQ model for GPU inference](https://huggingface.co/TheBloke/falcon-40b-instruct-3bit-GPTQ)
|
34 |
* [Unquantised bf16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/tiiuae/falcon-40b-instruct)
|
35 |
+
|
36 |
## EXPERIMENTAL
|
37 |
|
38 |
Please note this is an experimental GPTQ model. Support for it is currently quite limited.
|
|
|
126 |
|
127 |
* `gptq_model-4bit--1g.safetensors`
|
128 |
* Works only with latest AutoGPTQ CUDA, compiled from source as of commit `3cb1bf5`
|
129 |
+
* At this time it does not work with AutoGPTQ Triton, but support will hopefully be added in time.
|
130 |
* Works with text-generation-webui using `--autogptq --trust_remote_code`
|
131 |
* At this time it does NOT work with one-click-installers
|
132 |
* Does not work with any version of GPTQ-for-LLaMa
|
|
|
135 |
<!-- footer start -->
|
136 |
## Discord
|
137 |
|
138 |
+
For further support, and discussions on these models and AI in general, join us at:
|
139 |
+
|
140 |
+
[TheBloke AI's Discord server](https://discord.gg/Jq4vkcDakD)
|
141 |
|
142 |
## Thanks, and how to contribute.
|
143 |
|
|
|
145 |
|
146 |
I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.
|
147 |
|
148 |
+
If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects.
|
149 |
|
150 |
+
Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits.
|
151 |
|
152 |
* Patreon: https://patreon.com/TheBlokeAI
|
153 |
* Ko-Fi: https://ko-fi.com/TheBlokeAI
|
154 |
|
155 |
+
**Patreon special mentions**: Aemon Algiz, Dmitriy Samsonov, Nathan LeClaire, Trenton Dambrowitz, Mano Prime, David Flickinger, vamX, Nikolai Manek, senxiiz, Khalefa Al-Ahmad, Illia Dulskyi, Jonathan Leane, Talal Aujan, V. Lukas, Joseph William Delisle, Pyrater, Oscar Rangel, Lone Striker, Luke Pendergrass, Eugene Pentland, Sebastain Graf, Johann-Peter Hartman.
|
156 |
|
157 |
+
Thank you to all my generous patrons and donaters!
|
158 |
<!-- footer end -->
|
159 |
+
|
160 |
# ✨ Original model card: Falcon-40B-Instruct
|
161 |
|
162 |
# ✨ Falcon-40B-Instruct
|
|
|
169 |
|
170 |
* **You are looking for a ready-to-use chat/instruct model based on [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b).**
|
171 |
* **Falcon-40B is the best open-source model available.** It outperforms [LLaMA](https://github.com/facebookresearch/llama), [StableLM](https://github.com/Stability-AI/StableLM), [RedPajama](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1), [MPT](https://huggingface.co/mosaicml/mpt-7b), etc. See the [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
|
172 |
+
* **It features an architecture optimized for inference**, with FlashAttention ([Dao et al., 2022](https://arxiv.org/abs/2205.14135)) and multiquery ([Shazeer et al., 2019](https://arxiv.org/abs/1911.02150)).
|
173 |
|
174 |
+
💬 **This is an instruct model, which may not be ideal for further finetuning.** If you are interested in building your own instruct/chat model, we recommend starting from [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b).
|
175 |
|
176 |
💸 **Looking for a smaller, less expensive model?** [Falcon-7B-Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) is Falcon-40B-Instruct's small brother!
|
177 |
|
|
|
230 |
|
231 |
### Out-of-Scope Use
|
232 |
|
233 |
+
Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.
|
234 |
|
235 |
## Bias, Risks, and Limitations
|
236 |
|
|
|
276 |
|
277 |
### Training Data
|
278 |
|
279 |
+
Falcon-40B-Instruct was finetuned on a 150M tokens from [Bai ze](https://github.com/project-baize/baize-chatbot) mixed with 5% of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) data.
|
280 |
|
281 |
|
282 |
The data was tokenized with the Falcon-[7B](https://huggingface.co/tiiuae/falcon-7b)/[40B](https://huggingface.co/tiiuae/falcon-40b) tokenizer.
|
|
|
289 |
See the [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) for early results.
|
290 |
|
291 |
|
292 |
+
## Technical Specifications
|
293 |
|
294 |
For more information about pretraining, see [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b).
|
295 |
|
|
|
317 |
|
318 |
#### Hardware
|
319 |
|
320 |
+
Falcon-40B-Instruct was trained on AWS SageMaker, on 64 A100 40GB GPUs in P4d instances.
|
321 |
|
322 |
#### Software
|
323 |
|