Pythia GPTQ
Collection
Pythia model family quantized using AutoGPTQ.
•
7 items
•
Updated
pythia-410m quantized to 4-bit using AutoGPTQ.
To use, first install AutoGPTQ:
pip install auto-gptq
Then load the model from the hub:
from transformers import AutoModelForCausalLM, AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
model_name = "smpanaro/pythia-410m-AutoGPTQ-4bit-128g"
model = AutoGPTQForCausalLM.from_quantized(model_name)
Model | 4-Bit Perplexity | 16-Bit Perplexity | Delta |
---|---|---|---|
smpanaro/pythia-160m-AutoGPTQ-4bit-128g | 33.4375 | 23.3024 | 10.1351 |
smpanaro/pythia-410m-AutoGPTQ-4bit-128g | 21.4688 | 13.9838 | 7.485 |
Wikitext perplexity measured as in the huggingface docs, lower is better