Trying to use llama-2-7b-chat.Q4_K_M.gguf with/without tensorflow weights
n00bie question:
The libs think this has tensorflow weights, but "from_tf=True" doesn't resolve.
What am I doing wrong here?
from transformers import AutoModelForCausalLM
model_file = "llama-2-7b-chat.Q4_K_M.gguf"
model = AutoModelForCausalLM.from_pretrained(
"TheBloke/Llama-2-7b-Chat-GGUF", model_file=model_file, model_type="llama", gpu_layers=50, from_tf=True)'''
Gives me (on google colab):
```
OSError Traceback (most recent call last)
in <cell line: 3>()
1 from transformers import LlamaForCausalLM, LlamaTokenizer, AutoModelForCausalLM
2 model_file = "llama-2-7b-chat.Q4_K_M.gguf"
----> 3 model = AutoModelForCausalLM.from_pretrained(
4 "TheBloke/Llama-2-7b-Chat-GGUF", model_file=model_file, model_type="llama", gpu_layers=50, from_tf=True)
5
1 frames
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
3384 }
3385 if has_file(pretrained_model_name_or_path, TF2_WEIGHTS_NAME, **has_file_kwargs):
-> 3386 raise EnvironmentError(
3387 f"{pretrained_model_name_or_path} does not appear to have a file named"
3388 f" {_add_variant(WEIGHTS_NAME, variant)} but there is a file for TensorFlow weights."
OSError: TheBloke/Llama-2-7b-Chat-GGUF does not appear to have a file named pytorch_model.bin but there is a file for TensorFlow weights. Use from_tf=True
to load this model from those weights.
I get this error with or without the "from_tf=True", did this parameter name change without an update to the EnvironmentError?
@cgthayer yeah the problem is huggingface does not support gguf models, and also I would not recommend using llama 2 7b since a MUCH better llama 3 8b came out. Its at least 2-3x better and not as censored. For gguf files, just search llama 3 8b gguf in huggingface.
To use gguf models, you can use llama.cpp or anything that uses it(text generation web ui, llama cpp python, lm studio, and much more)
I get this error with or without the "from_tf=True", did this parameter name change without an update to the EnvironmentError?
I hope you were able to resolve this error by now, but in case someone faces the same issue in future, here is the solution:
The model can be loaded using ctransformers library, which is different from the standard Huggingface transformers library.
ctransformers is a Python binding for the C++ implementation of Transformers, which is compatible with GGUF files. It's designed to work with quantized models, including those in GGUF format.
The following code should work:
from ctransformers import AutoModelForCausalLM # Load the model llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7b-Chat-GGUF", model_file="llama-2-7b-chat.q4_K_M.gguf", model_type="llama", gpu_layers=50)