Panchovix
/

airoboros-l2-70b-gpt4-1.4.1_2.5bpw-h6-exl2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Panchovix commited on Sep 21, 2023

Commit

c61ae37

•

1 Parent(s): 3dc88c4

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -1,4 +1,8 @@
 ---
 license: other
 ---
-2.5 bit quantization of airoboros 70b 1.4.1 (https://huggingface.co/jondurbin/airoboros-l2-70b-gpt4-1.4.1), using exllama2.

 ---
 license: other
 ---
+2.5 bit quantization of airoboros 70b 1.4.1 (https://huggingface.co/jondurbin/airoboros-l2-70b-gpt4-1.4.1), using exllama2.
+Updated as 21 of September 2023, which should fix the bad ppl results.
+I suggest, if using Ubuntu, to use it with flash-attn. It reduces VRAM usage by a good margin, and is specially useful for this case (70B model on a single 24GB VRAM GPU)