Please support phi-3.5-mini-instruct in llama.cpp

#20
by ThiloteE - opened

See Bug: phi 3.5 mini produces garbage past 4096 context #9127
If you want people to use your model, please support llama.cpp.
Many many users use GGUF, as they are small and fast.

As it stands right now, I cannot recommend this model to anybody. As an alternative, I only can point to the older Phi-3 model series, but not phi-3.5.

There's a issue happened on me
I'm not sure why it is related with llama.cpp

I use the llama.cpp project convert phi-3.5 mini to GGUF
But when I integrate it with langchain, it report a bug:

llama_model_load: error loading model: check_tensor_dims: tensor 'token_embd.weight' not found

image.png

Hi @ThiloteE , I meet your issue too when I first use the ollama with phi-3.5 mini
I think it might caused by the prompt template.
I solved it by change the prompt template as following (from ollama phi-3.5 model card):

{{ if .System }}<|system|>
{{ .System }}<|end|>
{{ end }}{{ if .Prompt }}<|user|>
{{ .Prompt }}<|end|>
{{ end }}<|assistant|>
{{ .Response }}<|end|>

Sign up or log in to comment