Running with ExLlama and GPTQ-for-LLaMa in text-generation-webui gives errors

#3
by perelmanych - opened

Hi thanks for your work! In my case only AutoGPTQ works, others give the following errors.

With ExLlama:
Traceback (most recent call last): File “C:\AI\oobabooga_windows\text-generation-webui\server.py”, line 68, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name, loader) File “C:\AI\oobabooga_windows\text-generation-webui\modules\models.py”, line 78, in load_model output = load_func_maploader File “C:\AI\oobabooga_windows\text-generation-webui\modules\models.py”, line 293, in ExLlama_loader model, tokenizer = ExllamaModel.from_pretrained(model_name) File “C:\AI\oobabooga_windows\text-generation-webui\modules\exllama.py”, line 49, in from_pretrained config = ExLlamaConfig(str(model_config_path)) File “C:\AI\oobabooga_windows\installer_files\env\lib\site-packages\exllama\model.py”, line 52, in init self.pad_token_id = read_config[“pad_token_id”] KeyError: ‘pad_token_id’

I have tried to add 'pad_token_id' manually, but then I got errors about other missing tokens.

With GPTQ-for-LLaMa:
Traceback (most recent call last): File “C:\AI\oobabooga_windows\text-generation-webui\server.py”, line 68, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name, loader) File “C:\AI\oobabooga_windows\text-generation-webui\modules\models.py”, line 78, in load_model output = load_func_maploader File “C:\AI\oobabooga_windows\text-generation-webui\modules\models.py”, line 279, in GPTQ_loader model = modules.GPTQ_loader.load_quantized(model_name) File “C:\AI\oobabooga_windows\text-generation-webui\modules\GPTQ_loader.py”, line 177, in load_quantized model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold) File “C:\AI\oobabooga_windows\text-generation-webui\modules\GPTQ_loader.py”, line 77, in _load_quant make_quant(**make_quant_kwargs) File “C:\AI\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa\quant.py”, line 446, in make_quant make_quant(child, names, bits, groupsize, faster, name + ‘.’ + name1 if name != ‘’ else name1, kernel_switch_threshold=kernel_switch_threshold) File “C:\AI\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa\quant.py”, line 446, in make_quant make_quant(child, names, bits, groupsize, faster, name + ‘.’ + name1 if name != ‘’ else name1, kernel_switch_threshold=kernel_switch_threshold) File “C:\AI\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa\quant.py”, line 446, in make_quant make_quant(child, names, bits, groupsize, faster, name + ‘.’ + name1 if name != ‘’ else name1, kernel_switch_threshold=kernel_switch_threshold) [Previous line repeated 1 more time] File “C:\AI\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa\quant.py”, line 443, in make_quant module, attr, QuantLinear(bits, groupsize, tmp.in_features, tmp.out_features, faster=faster, kernel_switch_threshold=kernel_switch_threshold) File “C:\AI\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa\quant.py”, line 142, in init raise NotImplementedError(“Only 2,3,4,8 bits are supported.”) NotImplementedError: Only 2,3,4,8 bits are supported.

Yeah ExLlama only works with 4-bit Llama models, and this is not a Llama model. Please use AutoGPTQ for this model.

In general, please check the "Provided Files" table, there's a column that indicates if a model is compatible with ExLlama or not.

Sign up or log in to comment