m33393
/

llama-65b-gptq-cuda-4bit-32g-safetensors

Text Generation

Inference Endpoints

Model card Files Files and versions Community

m33393 commited on May 30, 2023

Commit

73b449d

•

1 Parent(s): 074723c

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -12,5 +12,5 @@ Converted to HF with `transformers 4.30.0.dev0`, then quantized to 4 bit with GP
 PPL should be marginally better than group size 128 at the cost of more VRAM. An A6000 should still be able to fit it all at full 2048 context.
 ---
-Note that this model is quantized under GPTQ's `cuda` branch. Which means this model should work with 0cc4m's KoboldAI fork:
 https://github.com/0cc4m/KoboldAI

 PPL should be marginally better than group size 128 at the cost of more VRAM. An A6000 should still be able to fit it all at full 2048 context.
 ---
+Note that this model was quantized under GPTQ's `cuda` branch. Which means it should work with 0cc4m's KoboldAI fork:
 https://github.com/0cc4m/KoboldAI