m33393 commited on
Commit
074723c
1 Parent(s): ab574cb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -9,7 +9,7 @@ Converted to HF with `transformers 4.30.0.dev0`, then quantized to 4 bit with GP
9
 
10
  `python llama.py ../llama-65b-hf c4 --wbits 4 --true-sequential --act-order --groupsize 32 --save_safetensors 4bit-32g.safetensors`
11
 
12
- PPL should be marginally better than group size 128
13
 
14
  ---
15
  Note that this model is quantized under GPTQ's `cuda` branch. Which means this model should work with 0cc4m's KoboldAI fork:
 
9
 
10
  `python llama.py ../llama-65b-hf c4 --wbits 4 --true-sequential --act-order --groupsize 32 --save_safetensors 4bit-32g.safetensors`
11
 
12
+ PPL should be marginally better than group size 128 at the cost of more VRAM. An A6000 should still be able to fit it all at full 2048 context.
13
 
14
  ---
15
  Note that this model is quantized under GPTQ's `cuda` branch. Which means this model should work with 0cc4m's KoboldAI fork: