Phi-4 Magpie Reasoning GGUF v3

This is a GGUF format version of the Phi-4 model fine-tuned to increase test time compute via chain-of-thought. It was trained on 5000 examples from the Magpie dataset, which were generated by R1. This is q_8.

This is v3. v3 was trained on 4 3090 GPUs for about 55 hours.

In order to produce reasoning traces, make sure to tell the model in the system prompt to use them. Format is the same as Deepseek's. You must use this systemp prompt to elicit the chain of thought:

Think through your responses carefully, always break it down. Always think before you answer by using the thinking tags: <think> response </think> response. When you think, make sure you decompose the question and consider multiple answers always before you stop thinking.

Model Details

Base Model: Microsoft Phi-4 (14B parameters)
Format: GGUF (8-bit quantization)
Fine-tuning: LoRA with merged weights, 4-bit double quantization with Flash Attention v1
Training Dataset: Magpie Reasoning Dataset
Version: 3

Training Data

2,200 excellent quality examples
3,000 good quality examples
Total training samples: 5,200

Evaluation Dataset

5 very hard + excellent quality examples
5 medium + excellent quality examples
5 very easy + excellent quality examples

Technical Details

LoRA Parameters:
- Rank (r): 24
- Alpha: 48
- Target Modules: q_proj, k_proj, v_proj, o_proj
- Dropout: 0.05
Training Configuration:
- Epochs: 5
- Learning Rate: 3e-5
- Batch Size: 1 with gradient accumulation steps of 16
- Optimizer: AdamW (Fused)
- Precision: BFloat16 during training
- Final Format: 8-bit quantized GGUF

Usage with llama.cpp

For CPU inference, use the following command:

main -m phi4-magpie-reasoning.gguf -n 512 --repeat_penalty 1.1 --color -i -r User:

Model Size

GGUF Format (8-bit): ~14GB
Original Model (14B parameters)

License

This model inherits the license terms from Microsoft Phi-4 and the Magpie dataset.

muon-labs
/

phi4-magpie-reasoning-v3-gguf