Unable to load the model via transformers module
Hi,
I am trying to load the model via transformers module, but following error occurs:OSError: Could not locate model-00001-of-00008.safetensors inside brucethemoose/CaPlatTessDolXaBoros-34B-200K-exl2-4bpw-fiction.
If I am not mistaken the module is trying to load incorrect number of shards (8), even though the model has 5. I was not able to find any solution online. Am I doing something wrong?
I am using this code:
from transformers import AutoModelForCausalLM, AutoTokenizer
which_model = 'brucethemoose/CaPlatTessDolXaBoros-34B-200K-exl2-4bpw-fiction'
tokenizer = AutoTokenizer.from_pretrained(which_model)
model = AutoModelForCausalLM.from_pretrained(which_model, device_map='auto', low_cpu_mem_usage=True)
Thank you!
This not a transformers model, but a exllamav2 one. If you want to load it in transformers, the original model is here:
https://huggingface.co/brucethemoose/CaPlatTessDolXaBoros-Yi-34B-200K-DARE-Ties-HighDensity
However, I would highly recommend a quantization of this model + an optimized runtime like exLllama on any hardware. Vanilla transformers is extremely inefficient at huge context sizes, even on an A100. Prompt processing in particular will take forever.