Please tell me about gptq quantisation.

#1
by mjw98 - opened

Thanks for your great work. I would like to ask you about your process of quantification using gptq? I just used autogptq for quantisation, but I can't load it after quantisation using AutoModelForCausalLM.from_pretrained(). It can only be loaded using AutoGPTQForCausalLM.from_quantised, can you tell me where your modification is?

Probably just a typo, correct me if I'm wrong, hf transformer does not support gptq since hf has bitsandbytes 4 bit integrated

Hugging Face Transformers has supported GPTQ for a while now - at least six weeks. All my GPTQ examples use Transformers directly now. It uses AutoGPTQ for the kernels, so AutoGPTQ is still a required install.

@mjw98 Your issue is likely the model name. By default, AutoGPTQ saves the model with a name like gptq-4bit-128g.safetensors. This name cannot be loaded by Transformers. Transformers requires that the model is called model.safetensors.

When making your GPTQ model with AutoGPTQ, pass model_basename="model" and it will work.

Check out my simple AutoGPTQ wrapper script: https://github.com/TheBlokeAI/AIScripts/blob/main/quant_autogptq.py - it will set the basename to model automatically, so the output will be compatible with Transformers.

Thank you very much for your suggestion, but I still don't understand where to change the "model_basename", I tried the AutoGPTQ wrapper script you provided, but the output is still gptq-4bit-128g.safetensors. I use the following code,python quant_autogptq.py  "/content/llama-1B" "777" "c4" . I used the following code to quantize it and got the result as shown below
1bf832de4aaacf82ad1b49199fd3b2e.png
My guess is to change the safetensors’ filename and change "model_file_base_name": "gptq_model-4bit-128g" to "model_file_base_name": "model" in quantize_config.json, hope to get your guidance.

mjw98 changed discussion status to closed

Sign up or log in to comment