TheBloke/Llama-2-7B-Chat-GPTQ · Rename quantize_config.json to quantization

Hmm I don't think you can load this from a custom path. The path for loading the model into memory is fine, but then model.save_pretrained() to a path, the following:

from accelerate import init_empty_weights
from optimum.gptq import load_quantized_model
# disable exllama if gptq is loaded on CPU
disable_exllama = not torch.cuda.is_available()
with init_empty_weights():
  empty = auto_class.from_pretrained(llm.model_id, torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32, device_map='auto')
empty.tie_weights()
model = load_quantized_model(empty, save_folder="/path/to/saved", device_map='auto', disable_exllama=disable_exllama)

runs into the following issue:

    model = load_quantized_model(empty, save_folder="/home/ubuntu/gptq-13b-local", device_map='auto', disable_exllama=disable_exllama)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/.pyenv/versions/3.11.4/lib/python3.11/site-packages/optimum/gptq/quantizer.py", line 614, in load_quantized_model
    with open(os.path.join(save_folder, quant_config_name), "r", encoding="utf-8") as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/gptq-13b-local/quantization_config.json'

While I do believe this should also be fixed on optimum's load_quantized_model to check config.json, idk the release schedule from the optimum team so would be nice to also have a quantization_config.json

TheBloke

Owner Sep 7, 2023

•

edited Sep 7, 2023

Could you raise this as an issue on the Optimum Github. They're doing a release soon to fix another issue related to GPTQ so maybe they'll look at this soon, or have already fixed it

aarnphm

Sep 7, 2023

Yes I have sent them a issue wrt to this