Trying to use llama-2-7b-chat.Q4_K_M.gguf with/without tensorflow weights

#33

by cgthayer - opened May 1, 2024

May 1, 2024

n00bie question:
The libs think this has tensorflow weights, but "from_tf=True" doesn't resolve.
What am I doing wrong here?

from transformers import AutoModelForCausalLM
model_file = "llama-2-7b-chat.Q4_K_M.gguf"
model = AutoModelForCausalLM.from_pretrained(
    "TheBloke/Llama-2-7b-Chat-GGUF", model_file=model_file, model_type="llama", gpu_layers=50, from_tf=True)'''

Gives me (on google colab):
```

OSError Traceback (most recent call last)
in <cell line: 3>()
1 from transformers import LlamaForCausalLM, LlamaTokenizer, AutoModelForCausalLM
2 model_file = "llama-2-7b-chat.Q4_K_M.gguf"
----> 3 model = AutoModelForCausalLM.from_pretrained(
4 "TheBloke/Llama-2-7b-Chat-GGUF", model_file=model_file, model_type="llama", gpu_layers=50, from_tf=True)
5

1 frames
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
3384 }
3385 if has_file(pretrained_model_name_or_path, TF2_WEIGHTS_NAME, **has_file_kwargs):
-> 3386 raise EnvironmentError(
3387 f"{pretrained_model_name_or_path} does not appear to have a file named"
3388 f" {_add_variant(WEIGHTS_NAME, variant)} but there is a file for TensorFlow weights."

OSError: TheBloke/Llama-2-7b-Chat-GGUF does not appear to have a file named pytorch_model.bin but there is a file for TensorFlow weights. Use from_tf=True to load this model from those weights.


I get this error with or without the "from_tf=True", did this parameter name change without an update to the EnvironmentError?

YaTharThShaRma999

May 1, 2024

@cgthayer yeah the problem is huggingface does not support gguf models, and also I would not recommend using llama 2 7b since a MUCH better llama 3 8b came out. Its at least 2-3x better and not as censored. For gguf files, just search llama 3 8b gguf in huggingface.

To use gguf models, you can use llama.cpp or anything that uses it(text generation web ui, llama cpp python, lm studio, and much more)

smartwhale

Jul 26, 2024

•

edited Jul 26, 2024

I get this error with or without the "from_tf=True", did this parameter name change without an update to the EnvironmentError?

I hope you were able to resolve this error by now, but in case someone faces the same issue in future, here is the solution:

The model can be loaded using ctransformers library, which is different from the standard Huggingface transformers library.
ctransformers is a Python binding for the C++ implementation of Transformers, which is compatible with GGUF files. It's designed to work with quantized models, including those in GGUF format.

The following code should work:

from ctransformers import AutoModelForCausalLM

# Load the model
llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7b-Chat-GGUF", 
                                              model_file="llama-2-7b-chat.q4_K_M.gguf", 
                                              model_type="llama", 
                                              gpu_layers=50)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Trying to use llama-2-7b-chat.Q4_K_M.gguf with/without tensorflow weights

Gives me (on google colab):```

Gives me (on google colab):
```