torch.bfloat16 is not supported for quantization method awq
Hey, I tried the vLLM example in the model card (just copied and pasted it) and I'm running into this error:
ValueError: torch.bfloat16 is not supported for quantization method awq. Supported dtypes: [torch.float16]
Is there a fix to be able to use the AWQ model with vLLM instead of AutoAWQ?
What version of vLLM are you using? I had thought that the latest supported bfloat16 with AWQ. 2.0, the first with AWQ support, definitely did not. But I thought it came later.
Either way, you should specify dtype="auto"
in either Python code or as a command line parameter. That will load it in bfloat16 if it can, otherwise float16.
This README hasn't been updated in a while - my newer README template include the dtype="auto"
parameter in the examples.
All my AWQ READMEs are going to be updated later today anyway when I update for Transformers AWQ support, so that will get changed then.
I'm using version 0.2.1.post1; I did a reinstall of it too just in case something got messed up during installation and the issue with bfloat16 still persisted.
I'll definitely specify the dtype in my Python code! :)
Thank you so much for your help, you're a legend. <3
Hi, you can apply the following workaround, edit config.json and change
"torch_dtype": "bfloat16" --> "torch_dtype": "float16",
Yeah but it's easier just to pass --dtype auto
or dtype="auto"
For me specifying auto didn't work i still got the same error. But specifiying dtype="float16" did work.