How to load this model?
#1
by
Frz614
- opened
It seems that I can only load the quantized model by using vllm. I need to use "AutoFP8ForCausalLM.from_pretrained(local_model_path, quantize_config=quantize_config, local_files_only=True)" to load the the quantized model because I want to modify the quantize.py, but there is something wrong:" ValueError: Unknown quantization type, got fp8 - supported types are: ['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm', 'quanto', 'eetq', 'hqq']". It looks like the "BaseQuantizeConfig" class is not acceptable. Is there a way to load the model so I can modify the model file?
Use it in vLLM with vllm serve neuralmagic/Meta-Llama-3-8B-Instruct-FP8-KV --kv-cache-dtype fp8
mgoin
changed discussion status to
closed