Unable to load the model via transformers module

#1
by Human420 - opened

Hi,
I am trying to load the model via transformers module, but following error occurs:
OSError: Could not locate model-00001-of-00008.safetensors inside brucethemoose/CaPlatTessDolXaBoros-34B-200K-exl2-4bpw-fiction.

If I am not mistaken the module is trying to load incorrect number of shards (8), even though the model has 5. I was not able to find any solution online. Am I doing something wrong?
I am using this code:

from transformers import AutoModelForCausalLM, AutoTokenizer

which_model = 'brucethemoose/CaPlatTessDolXaBoros-34B-200K-exl2-4bpw-fiction'

tokenizer = AutoTokenizer.from_pretrained(which_model)
model = AutoModelForCausalLM.from_pretrained(which_model, device_map='auto', low_cpu_mem_usage=True)

Thank you!

This not a transformers model, but a exllamav2 one. If you want to load it in transformers, the original model is here:

https://huggingface.co/brucethemoose/CaPlatTessDolXaBoros-Yi-34B-200K-DARE-Ties-HighDensity

However, I would highly recommend a quantization of this model + an optimized runtime like exLllama on any hardware. Vanilla transformers is extremely inefficient at huge context sizes, even on an A100. Prompt processing in particular will take forever.

Sign up or log in to comment