|
--- |
|
license: other |
|
library_name: transformers |
|
tags: |
|
- safetensors |
|
- llama |
|
--- |
|
Converted to HF with `transformers 4.30.0.dev0`, then quantized to 4 bit with GPTQ (Group size `32`): |
|
|
|
`python llama.py ../llama-65b-hf c4 --wbits 4 --true-sequential --act-order --groupsize 32 --save_safetensors 4bit-32g.safetensors` |
|
|
|
PPL should be marginally better than group size 128 at the cost of more VRAM. An A6000 should still be able to fit it all at full 2048 context. |
|
|
|
--- |
|
Note that this model was quantized under GPTQ's `cuda` branch. Which means it should work with 0cc4m's KoboldAI fork: |
|
https://github.com/0cc4m/KoboldAI |