Can't reproduce
How were the gguf versions made? Given that Phi3ForCausalLM is not yet supported by llama.cpp
Architecture 'Phi3ForCausalLM' not supported
You can use convert-hf-to-gguf.py from llama.cpp and then just quantize it the way you want.
I am able to create a custom fine-tune and convert it to gguf file via the convert-hf-to-gguf.py.
But not able to quantize it .... llama.cpp returns llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'phi3'
I am on the latest llama.cpp commit, which should include the phi3 architecture.
Can you please push me in the right direction how to solve it?
How about:
Save the safetensors and configs in models subdirectory
./convert-hf-to-gguf.py models/Phi-3
./quantize models/Phi-3/ggml-model-f16.gguf models/Phi-3/Phi-3-model-Q4_K_M.gguf Q4_K_M
It works but the issue was somewhere else. I was not using the right quantize script.
I rebuilt llama.cpp from source via make
and it works!
llama_model_quantize_internal: model size = 7288.51 MB
llama_model_quantize_internal: quant size = 2281.66 MB
Please ensure that you are using a llama.cpp build later than 2717, which has support for Phi-3.