TheBloke
/

falcon-40b-instruct-GPTQ

@@ -13,7 +13,7 @@ inference: false
 </div>
 <div style="display: flex; justify-content: space-between; width: 100%;">
     <div style="display: flex; flex-direction: column; align-items: flex-start;">
-        <p><a href="https://discord.gg/UBgz4VXf">Chat & support: my new Discord server</a></p>
     </div>
     <div style="display: flex; flex-direction: column; align-items: flex-end;">
         <p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
@@ -32,7 +32,7 @@ It is the result of quantising to 4bit using [AutoGPTQ](https://github.com/PanQi
 * [4-bit GPTQ model for GPU inference](https://huggingface.co/TheBloke/falcon-40b-instruct-GPTQ)
 * [3-bit GPTQ model for GPU inference](https://huggingface.co/TheBloke/falcon-40b-instruct-3bit-GPTQ)
 * [Unquantised bf16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/tiiuae/falcon-40b-instruct)
 ## EXPERIMENTAL
 Please note this is an experimental GPTQ model. Support for it is currently quite limited.
@@ -126,7 +126,7 @@ It was created without groupsize to reduce VRAM requirements, and with `desc_act
 * `gptq_model-4bit--1g.safetensors`
   * Works only with latest AutoGPTQ CUDA, compiled from source as of commit `3cb1bf5`
-    * At this time it does not work with AutoGPTQ Triton, but support will hopefully be added in time.
   * Works with text-generation-webui using `--autogptq --trust_remote_code`
     * At this time it does NOT work with one-click-installers
   * Does not work with any version of GPTQ-for-LLaMa
@@ -135,7 +135,9 @@ It was created without groupsize to reduce VRAM requirements, and with `desc_act
 <!-- footer start -->
 ## Discord
-For further support, and discussions on these models and AI in general, join us at: [TheBloke AI's Discord server](https://discord.gg/UBgz4VXf)
 ## Thanks, and how to contribute.
@@ -143,18 +145,18 @@ Thanks to the [chirper.ai](https://chirper.ai) team!
 I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.
-If you're able and willing to contribute, it'd be most gratefully received and will help me to keep providing models, and work on new AI projects.
-Donaters will get priority support on any and all AI/LLM/model questions, plus other benefits.
 * Patreon: https://patreon.com/TheBlokeAI
 * Ko-Fi: https://ko-fi.com/TheBlokeAI
-**Patreon special mentions**: Aemon Algiz; Johann-Peter Hartmann; Talal Aujan; Jonathan Leane; Illia Dulskyi; Khalefa Al-Ahmad; senxiiz; Sebastain Graf; Eugene Pentland; Nikolai Manek; Luke Pendergrass.
-Thank you to all my generous patrons and donaters.
 <!-- footer end -->
 # ✨ Original model card: Falcon-40B-Instruct
 # ✨ Falcon-40B-Instruct
@@ -167,9 +169,9 @@ Thank you to all my generous patrons and donaters.
 * **You are looking for a ready-to-use chat/instruct model based on [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b).**
 * **Falcon-40B is the best open-source model available.** It outperforms [LLaMA](https://github.com/facebookresearch/llama), [StableLM](https://github.com/Stability-AI/StableLM), [RedPajama](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1), [MPT](https://huggingface.co/mosaicml/mpt-7b), etc. See the [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
-* **It features an architecture optimized for inference**, with FlashAttention ([Dao et al., 2022](https://arxiv.org/abs/2205.14135)) and multiquery ([Shazeer et al., 2019](https://arxiv.org/abs/1911.02150)).
-💬 **This is an instruct model, which may not be ideal for further finetuning.** If you are interested in building your own instruct/chat model, we recommend starting from [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b).
 💸 **Looking for a smaller, less expensive model?** [Falcon-7B-Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) is Falcon-40B-Instruct's small brother!
@@ -228,7 +230,7 @@ Falcon-40B-Instruct has been finetuned on a chat dataset.
 ### Out-of-Scope Use
-Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.
 ## Bias, Risks, and Limitations
@@ -274,7 +276,7 @@ for seq in sequences:
 ### Training Data
-Falcon-40B-Instruct was finetuned on a 150M tokens from [Bai ze](https://github.com/project-baize/baize-chatbot) mixed with 5% of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) data.
 The data was tokenized with the Falcon-[7B](https://huggingface.co/tiiuae/falcon-7b)/[40B](https://huggingface.co/tiiuae/falcon-40b) tokenizer.
@@ -287,7 +289,7 @@ The data was tokenized with the Falcon-[7B](https://huggingface.co/tiiuae/falcon
 See the [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) for early results.
-## Technical Specifications
 For more information about pretraining, see [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b).
@@ -315,7 +317,7 @@ For multiquery, we are using an internal variant which uses independent key and
 #### Hardware
-Falcon-40B-Instruct was trained on AWS SageMaker, on 64 A100 40GB GPUs in P4d instances.
 #### Software

 </div>
 <div style="display: flex; justify-content: space-between; width: 100%;">
     <div style="display: flex; flex-direction: column; align-items: flex-start;">
+        <p><a href="https://discord.gg/Jq4vkcDakD">Chat & support: my new Discord server</a></p>
     </div>
     <div style="display: flex; flex-direction: column; align-items: flex-end;">
         <p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
 * [4-bit GPTQ model for GPU inference](https://huggingface.co/TheBloke/falcon-40b-instruct-GPTQ)
 * [3-bit GPTQ model for GPU inference](https://huggingface.co/TheBloke/falcon-40b-instruct-3bit-GPTQ)
 * [Unquantised bf16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/tiiuae/falcon-40b-instruct)
 ## EXPERIMENTAL
 Please note this is an experimental GPTQ model. Support for it is currently quite limited.
 * `gptq_model-4bit--1g.safetensors`
   * Works only with latest AutoGPTQ CUDA, compiled from source as of commit `3cb1bf5`
+    * At this time it does not work with AutoGPTQ Triton, but support will hopefully be added in time.
   * Works with text-generation-webui using `--autogptq --trust_remote_code`
     * At this time it does NOT work with one-click-installers
   * Does not work with any version of GPTQ-for-LLaMa
 <!-- footer start -->
 ## Discord
+For further support, and discussions on these models and AI in general, join us at:
+[TheBloke AI's Discord server](https://discord.gg/Jq4vkcDakD)
 ## Thanks, and how to contribute.
 I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.
+If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects.
+Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits.
 * Patreon: https://patreon.com/TheBlokeAI
 * Ko-Fi: https://ko-fi.com/TheBlokeAI
+**Patreon special mentions**: Aemon Algiz, Dmitriy Samsonov, Nathan LeClaire, Trenton Dambrowitz, Mano Prime, David Flickinger, vamX, Nikolai Manek, senxiiz, Khalefa Al-Ahmad, Illia Dulskyi, Jonathan Leane, Talal Aujan, V. Lukas, Joseph William Delisle, Pyrater, Oscar Rangel, Lone Striker, Luke Pendergrass, Eugene Pentland, Sebastain Graf, Johann-Peter Hartman.
+Thank you to all my generous patrons and donaters!
 <!-- footer end -->
 # ✨ Original model card: Falcon-40B-Instruct
 # ✨ Falcon-40B-Instruct
 * **You are looking for a ready-to-use chat/instruct model based on [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b).**
 * **Falcon-40B is the best open-source model available.** It outperforms [LLaMA](https://github.com/facebookresearch/llama), [StableLM](https://github.com/Stability-AI/StableLM), [RedPajama](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1), [MPT](https://huggingface.co/mosaicml/mpt-7b), etc. See the [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
+* **It features an architecture optimized for inference**, with FlashAttention ([Dao et al., 2022](https://arxiv.org/abs/2205.14135)) and multiquery ([Shazeer et al., 2019](https://arxiv.org/abs/1911.02150)).
+💬 **This is an instruct model, which may not be ideal for further finetuning.** If you are interested in building your own instruct/chat model, we recommend starting from [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b).
 💸 **Looking for a smaller, less expensive model?** [Falcon-7B-Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) is Falcon-40B-Instruct's small brother!
 ### Out-of-Scope Use
+Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.
 ## Bias, Risks, and Limitations
 ### Training Data
+Falcon-40B-Instruct was finetuned on a 150M tokens from [Bai ze](https://github.com/project-baize/baize-chatbot) mixed with 5% of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) data.
 The data was tokenized with the Falcon-[7B](https://huggingface.co/tiiuae/falcon-7b)/[40B](https://huggingface.co/tiiuae/falcon-40b) tokenizer.
 See the [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) for early results.
+## Technical Specifications
 For more information about pretraining, see [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b).
 #### Hardware
+Falcon-40B-Instruct was trained on AWS SageMaker, on 64 A100 40GB GPUs in P4d instances.
 #### Software