runtime error
s][A model.safetensors: 62%|βββββββ | 5.48G/8.91G [00:50<00:26, 129MB/s][A model.safetensors: 68%|βββββββ | 6.04G/8.91G [00:51<00:14, 192MB/s][A model.safetensors: 71%|βββββββ | 6.34G/8.91G [00:55<00:19, 134MB/s][A model.safetensors: 74%|ββββββββ | 6.59G/8.91G [00:57<00:17, 132MB/s][A model.safetensors: 80%|ββββββββ | 7.10G/8.91G [00:58<00:09, 187MB/s][A model.safetensors: 83%|βββββββββ | 7.39G/8.91G [01:02<00:10, 145MB/s][A model.safetensors: 86%|βββββββββ | 7.62G/8.91G [01:04<00:09, 138MB/s][A model.safetensors: 88%|βββββββββ | 7.84G/8.91G [01:05<00:07, 151MB/s][A model.safetensors: 93%|ββββββββββ| 8.29G/8.91G [01:06<00:03, 203MB/s][A model.safetensors: 96%|ββββββββββ| 8.59G/8.91G [01:07<00:01, 223MB/s][A model.safetensors: 100%|ββββββββββ| 8.91G/8.91G [01:07<00:00, 132MB/s] GPTBigCodeGPTQForCausalLM hasn't fused attention module yet, will skip inject fused attention. GPTBigCodeGPTQForCausalLM hasn't fused mlp module yet, will skip inject fused mlp. Traceback (most recent call last): File "/home/user/app/app.py", line 22, in <module> model = AutoGPTQForCausalLM.from_quantized(model_name_or_path, File "/home/user/.local/lib/python3.10/site-packages/auto_gptq/modeling/auto.py", line 119, in from_quantized return quant_func( File "/home/user/.local/lib/python3.10/site-packages/auto_gptq/modeling/_base.py", line 1036, in from_quantized model = autogptq_post_init(model, use_act_order=quantize_config.desc_act) File "/home/user/.local/lib/python3.10/site-packages/auto_gptq/modeling/_utils.py", line 380, in autogptq_post_init submodule.post_init(temp_dq = model.device_tensors[device]) File "/home/user/.local/lib/python3.10/site-packages/auto_gptq/nn_modules/qlinear/qlinear_exllamav2.py", line 140, in post_init assert self.qweight.device.type == "cuda" AssertionError
Container logs:
Fetching error logs...