File size: 1,437 Bytes
407481f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
---
license: mit
datasets:
- wikitext
---

[pythia-70m](https://huggingface.co/EleutherAI/pythia-70m) quantized to 4-bit using [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ).

To use, first install AutoGPTQ:

```shell
pip install auto-gptq
```

Then load the model from the hub:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

model_name = "smpanaro/pythia-70m-AutoGPTQ-4bit-128g"
model = AutoGPTQForCausalLM.from_quantized(model_name)
```


|Model|4-Bit Perplexity|16-Bit Perplexity|Delta|
|--|--|--|--|
|smpanaro/pythia-70m-AutoGPTQ-4bit-128g|49.125|-|-|
|[smpanaro/pythia-160m-AutoGPTQ-4bit-128g](https://huggingface.co/smpanaro/pythia-160m-AutoGPTQ-4bit-128g)|33.4375|23.3024|10.1351|
|[smpanaro/pythia-410m-AutoGPTQ-4bit-128g](https://huggingface.co/smpanaro/pythia-410m-AutoGPTQ-4bit-128g)|21.4688|13.9838|7.485|
|[smpanaro/pythia-1b-AutoGPTQ-4bit-128g](https://huggingface.co/smpanaro/pythia-1b-AutoGPTQ-4bit-128g)|12.0391|11.6178|0.4213|
|[smpanaro/pythia-1.4b-AutoGPTQ-4bit-128g](https://huggingface.co/smpanaro/pythia-1.4b-AutoGPTQ-4bit-128g)|10.9609|10.4391|0.5218|
|[smpanaro/pythia-2.8b-AutoGPTQ-4bit-128g](https://huggingface.co/smpanaro/pythia-2.8b-AutoGPTQ-4bit-128g)|9.8281|9.0028|0.8253|


<sub>Wikitext perplexity measured as in the [huggingface docs](https://huggingface.co/docs/transformers/en/perplexity), lower is better</sub>