The Q8_0 Quant seems to be broken!
Greetings! Once again, thank you for quantizing my models.
I just tried running the Q8_0 GGUF Quant via OobaBooga. The Q6_K Quant runs just fine, without any errors. The Q_0 Quant on the other hand returned the following error:
21:45:02-103604 ERROR Failed to load the model.
Traceback (most recent call last):
File "/home/redrix/Applications/text-generation-webui/modules/ui_model_menu.py", line 222, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/redrix/Applications/text-generation-webui/modules/models.py", line 93, in load_model
output = load_func_map[loader](model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/redrix/Applications/text-generation-webui/modules/models.py", line 278, in llamacpp_loader
model, tokenizer = LlamaCppModel.from_pretrained(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/redrix/Applications/text-generation-webui/modules/llamacpp_model.py", line 111, in from_pretrained
result.model = Llama(**params)
^^^^^^^^^^^^^^^
File "/home/redrix/Applications/text-generation-webui/installer_files/env/lib/python3.11/site-packages/llama_cpp_cuda_tensorcores/llama.py", line 369, in __init__
internals.LlamaModel(
File "/home/redrix/Applications/text-generation-webui/installer_files/env/lib/python3.11/site-packages/llama_cpp_cuda_tensorcores/_internals.py", line 56, in __init__
raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: models/GodSlayer-12B-ABYSS.Q8_0.gguf
Exception ignored in: <function LlamaCppModel.__del__ at 0x7fb68bc0eca0>
Traceback (most recent call last):
File "/home/redrix/Applications/text-generation-webui/modules/llamacpp_model.py", line 62, in __del__
del self.model
^^^^^^^^^^
AttributeError: 'LlamaCppModel' object has no attribute 'model'
Have you verified that you downloaded the full file? The file should have a SHA-256 of 675cfffa46a9e2ff5dd95981fcb505b8fde1bed5ff9670c3d6afd823e77b410f
Indeed, they are identical. I've downloaded the file twice already, with both having the exact same checksum.
I've downloaded the Q8_0 and it works just fine with llama-cli, so this is either some local/usage problem (out of memory?) or an issue with OobaBooga. Try with llama.cpp and see if that works.
Most likely something with Ooba. Or perhaps Linux is just messing with it. My hardware can run your other Q8_0 Quants just fine.
Thanks though!