Transformers
English
llama
text-generation-inference

Model failing to load

#2
by KrishnaBabu - opened

I am getting the following error when trying to load 'Llama-2-7B-32K-Instruct-GGML' model. The method works fines with other 7B and 13B models but failing particularly with 7B-32K-Instruct .

Code:
from langchain.llms import CTransformers
llm = CTransformers(model="models/llama-2-7b-32k-instruct.ggmlv3.q8_0.bin",
model_type="llama",
config={'max_new_tokens': 256,
'temperature': 0.01}
)

Error:
RuntimeError: Failed to create LLM 'llama' from 'llama-2-7b-32k-instruct.ggmlv3.q8_0.bin'

Please help!

Sign up or log in to comment