TheBloke
/

falcon-40b-instruct-GPTQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

TheBloke commited on Jun 2, 2023

Commit

b621e6b

•

1 Parent(s): 47832eb

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -39,7 +39,7 @@ Please note this is an experimental GPTQ model. Support for it is currently quit
 It is also expected to be **VERY SLOW**. This is currently unavoidable, but is being looked at.
-This is 4bit model requires at least 35GB VRAM to load. It can be used on 40GB or 48GB cards, but not less.
 Please be aware that you should currently expect around 0.7 tokens/s on 40B Falcon GPTQ.

 It is also expected to be **VERY SLOW**. This is currently unavoidable, but is being looked at.
+This 4bit model requires at least 35GB VRAM to load. It can be used on 40GB or 48GB cards, but not less.
 Please be aware that you should currently expect around 0.7 tokens/s on 40B Falcon GPTQ.