File size: 1,298 Bytes
90843e0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
---
license: mit
datasets:
- wikitext
---
[gpt2-xl](https://huggingface.co/openai-community/gpt2-xl) quantized to 4-bit using [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ).
To use, first install AutoGPTQ:
```shell
pip install auto-gptq
```
Then load the model from the hub:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
model_name = "smpanaro/gpt2-AutoGPTQ-4bit-128g"
model = AutoGPTQForCausalLM.from_quantized(model_name, use_triton=True)
# Note: despite this model being quantized only using groups and desc_act=False, Triton still seems to be required.
```
|Model|4-Bit Perplexity|16-Bit Perplexity|Delta|
|--|--|--|--|
|[smpanaro/gpt2-AutoGPTQ-4bit-128g](https://huggingface.co/smpanaro/gpt2-AutoGPTQ-4bit-128g)|26.5000|25.1875|1.3125|
|[smpanaro/gpt2-medium-AutoGPTQ-4bit-128g](https://huggingface.co/smpanaro/gpt2-medium-AutoGPTQ-4bit-128g)|19.1719|18.4739|0.698|
|[smpanaro/gpt2-large-AutoGPTQ-4bit-128g](https://huggingface.co/smpanaro/gpt2-large-AutoGPTQ-4bit-128g)|16.6875|16.4541|0.2334|
|smpanaro/gpt2-xl-AutoGPTQ-4bit-128g|14.9297|14.7951|0.1346|
<sub>Wikitext perplexity measured as in the [huggingface docs](https://huggingface.co/docs/transformers/en/perplexity), lower is better</sub> |