How to load this model?

#1
by Frz614 - opened

It seems that I can only load the quantized model by using vllm. I need to use "AutoFP8ForCausalLM.from_pretrained(local_model_path, quantize_config=quantize_config, local_files_only=True)" to load the the quantized model because I want to modify the quantize.py, but there is something wrong:" ValueError: Unknown quantization type, got fp8 - supported types are: ['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm', 'quanto', 'eetq', 'hqq']". It looks like the "BaseQuantizeConfig" class is not acceptable. Is there a way to load the model so I can modify the model file?

Neural Magic org

Hi @Frz614 there is no way to load already quantized checkpoints back into AutoFP8 at the moment. vLLM is the intended place for inference.

Neural Magic org

Use it in vLLM with vllm serve neuralmagic/Meta-Llama-3-8B-Instruct-FP8-KV --kv-cache-dtype fp8

mgoin changed discussion status to closed

Sign up or log in to comment