Please tell me about gptq quantisation.
Thanks for your great work. I would like to ask you about your process of quantification using gptq? I just used autogptq for quantisation, but I can't load it after quantisation using AutoModelForCausalLM.from_pretrained(). It can only be loaded using AutoGPTQForCausalLM.from_quantised, can you tell me where your modification is?
Probably just a typo, correct me if I'm wrong, hf transformer does not support gptq since hf has bitsandbytes 4 bit integrated
Hugging Face Transformers has supported GPTQ for a while now - at least six weeks. All my GPTQ examples use Transformers directly now. It uses AutoGPTQ for the kernels, so AutoGPTQ is still a required install.
@mjw98
Your issue is likely the model name. By default, AutoGPTQ saves the model with a name like gptq-4bit-128g.safetensors
. This name cannot be loaded by Transformers. Transformers requires that the model is called model.safetensors
.
When making your GPTQ model with AutoGPTQ, pass model_basename="model"
and it will work.
Check out my simple AutoGPTQ wrapper script: https://github.com/TheBlokeAI/AIScripts/blob/main/quant_autogptq.py - it will set the basename to model
automatically, so the output will be compatible with Transformers.
Thank you very much for your suggestion, but I still don't understand where to change the "model_basename", I tried the AutoGPTQ wrapper script you provided, but the output is still gptq-4bit-128g.safetensors. I use the following code,python quant_autogptq.py "/content/llama-1B" "777" "c4" . I used the following code to quantize it and got the result as shown below
My guess is to change the safetensors’ filename and change "model_file_base_name": "gptq_model-4bit-128g" to "model_file_base_name": "model" in quantize_config.json, hope to get your guidance.
kkkk