Update README.md
Browse files
README.md
CHANGED
@@ -9,7 +9,7 @@ Converted to HF with `transformers 4.30.0.dev0`, then quantized to 4 bit with GP
|
|
9 |
|
10 |
`python llama.py ../llama-65b-hf c4 --wbits 4 --true-sequential --act-order --groupsize 32 --save_safetensors 4bit-32g.safetensors`
|
11 |
|
12 |
-
PPL should be marginally better than group size 128
|
13 |
|
14 |
---
|
15 |
Note that this model is quantized under GPTQ's `cuda` branch. Which means this model should work with 0cc4m's KoboldAI fork:
|
|
|
9 |
|
10 |
`python llama.py ../llama-65b-hf c4 --wbits 4 --true-sequential --act-order --groupsize 32 --save_safetensors 4bit-32g.safetensors`
|
11 |
|
12 |
+
PPL should be marginally better than group size 128 at the cost of more VRAM. An A6000 should still be able to fit it all at full 2048 context.
|
13 |
|
14 |
---
|
15 |
Note that this model is quantized under GPTQ's `cuda` branch. Which means this model should work with 0cc4m's KoboldAI fork:
|