File size: 614 Bytes
8a9fbb7
 
65df199
 
 
 
8a9fbb7
fb4efcb
252aaee
fb4efcb
 
074723c
fb4efcb
 
73b449d
fb4efcb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
---
license: other
library_name: transformers
tags:
- safetensors
- llama
---
Converted to HF with `transformers 4.30.0.dev0`, then quantized to 4 bit with GPTQ (Group size `32`):

`python llama.py ../llama-65b-hf c4 --wbits 4 --true-sequential --act-order --groupsize 32 --save_safetensors 4bit-32g.safetensors`

PPL should be marginally better than group size 128 at the cost of more VRAM. An A6000 should still be able to fit it all at full 2048 context.

---
Note that this model was quantized under GPTQ's `cuda` branch. Which means it should work with 0cc4m's KoboldAI fork:
https://github.com/0cc4m/KoboldAI