tags: | |
- gguf | |
- llama.cpp | |
- quantized | |
- microsoft/phi-4 | |
license: apache-2.0 | |
# yasserrmd/phi-4-gguf | |
This model was converted to GGUF format from [`microsoft/phi-4`](https://huggingface.co/microsoft/phi-4) using llama.cpp via | |
[Convert Model to GGUF](https://github.com/ruslanmv/convert-model-to-GGUF). | |
**Key Features:** | |
* Quantized for reduced file size (GGUF format) | |
* Optimized for use with llama.cpp | |
* Compatible with llama-server for efficient serving | |
Refer to the [original model card](https://huggingface.co/microsoft/phi-4) for more details on the base model. | |
## Usage with llama.cpp | |
**1. Install llama.cpp:** | |
```bash | |
brew install llama.cpp # For macOS/Linux | |
``` | |
**2. Run Inference:** | |
**CLI:** | |
```bash | |
llama-cli --hf-repo yasserrmd/phi-4-gguf --hf-file /content/phi-4.q2_k.gguf -p "Your prompt here" | |
``` | |
**Server:** | |
```bash | |
llama-server --hf-repo yasserrmd/phi-4-gguf --hf-file /content/phi-4.q2_k.gguf -c 2048 | |
``` | |
For more advanced usage, refer to the [llama.cpp repository](https://github.com/ggerganov/llama.cpp). | |