Can't reproduce

by ayyylol - opened

How were the gguf versions made? Given that Phi3ForCausalLM is not yet supported by llama.cpp

Architecture 'Phi3ForCausalLM' not supported

You can use from llama.cpp and then just quantize it the way you want.

I am able to create a custom fine-tune and convert it to gguf file via the
But not able to quantize it .... llama.cpp returns llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'phi3'
I am on the latest llama.cpp commit, which should include the phi3 architecture.
Can you please push me in the right direction how to solve it?

How about:

Save the safetensors and configs in models subdirectory

./ models/Phi-3
./quantize models/Phi-3/ggml-model-f16.gguf models/Phi-3/Phi-3-model-Q4_K_M.gguf Q4_K_M

It works but the issue was somewhere else. I was not using the right quantize script.
I rebuilt llama.cpp from source via make and it works!

llama_model_quantize_internal: model size  =  7288.51 MB
llama_model_quantize_internal: quant size  =  2281.66 MB
Microsoft org

Please ensure that you are using a llama.cpp build later than 2717, which has support for Phi-3.

gugarosa changed discussion status to closed

Sign up or log in to comment