Can't reproduce

by ayyylol - opened Apr 23

Apr 23

How were the gguf versions made? Given that Phi3ForCausalLM is not yet supported by llama.cpp

Architecture 'Phi3ForCausalLM' not supported

hugandfesse

Apr 25

•

edited Apr 25

You can use convert-hf-to-gguf.py from llama.cpp and then just quantize it the way you want.

midesk

Apr 25

I am able to create a custom fine-tune and convert it to gguf file via the convert-hf-to-gguf.py.
But not able to quantize it .... llama.cpp returns llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'phi3'
I am on the latest llama.cpp commit, which should include the phi3 architecture.
Can you please push me in the right direction how to solve it?

hugandfesse

Apr 26

How about:

Save the safetensors and configs in models subdirectory

./convert-hf-to-gguf.py models/Phi-3
./quantize models/Phi-3/ggml-model-f16.gguf models/Phi-3/Phi-3-model-Q4_K_M.gguf Q4_K_M

midesk

Apr 26

It works but the issue was somewhere else. I was not using the right quantize script.
I rebuilt llama.cpp from source via make and it works!

llama_model_quantize_internal: model size  =  7288.51 MB
llama_model_quantize_internal: quant size  =  2281.66 MB

gugarosa

Microsoft org May 1

Please ensure that you are using a llama.cpp build later than 2717, which has support for Phi-3.

gugarosa changed discussion status to closed May 1

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment