Rename quantize_config.json to quantization_config.json

#19
by aarnphm - opened

It seems like optimum.gptq.load_quantized_model loads quantization_config from quantization_config.json

Screenshot 2023-09-04 at 13.38.33.png

No it loads it from config.json: https://huggingface.co/TheBloke/Llama-2-7b-Chat-GPTQ/blob/main/config.json#L23-L32

quantize_config.json is for AutoGPTQ. The files have been tested with Transformers and Optimum and are fine.

TheBloke changed pull request status to closed

Hmm I don't think you can load this from a custom path. The path for loading the model into memory is fine, but then model.save_pretrained() to a path, the following:

from accelerate import init_empty_weights
from optimum.gptq import load_quantized_model
# disable exllama if gptq is loaded on CPU
disable_exllama = not torch.cuda.is_available()
with init_empty_weights():
  empty = auto_class.from_pretrained(llm.model_id, torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32, device_map='auto')
empty.tie_weights()
model = load_quantized_model(empty, save_folder="/path/to/saved", device_map='auto', disable_exllama=disable_exllama)

runs into the following issue:

    model = load_quantized_model(empty, save_folder="/home/ubuntu/gptq-13b-local", device_map='auto', disable_exllama=disable_exllama)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/.pyenv/versions/3.11.4/lib/python3.11/site-packages/optimum/gptq/quantizer.py", line 614, in load_quantized_model
    with open(os.path.join(save_folder, quant_config_name), "r", encoding="utf-8") as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/gptq-13b-local/quantization_config.json'

While I do believe this should also be fixed on optimum's load_quantized_model to check config.json, idk the release schedule from the optimum team so would be nice to also have a quantization_config.json

Could you raise this as an issue on the Optimum Github. They're doing a release soon to fix another issue related to GPTQ so maybe they'll look at this soon, or have already fixed it

Yes I have sent them a issue wrt to this

Sign up or log in to comment