Phi-4 Magpie Reasoning GGUF v3
This is a GGUF format version of the Phi-4 model fine-tuned to increase test time compute via chain-of-thought. It was trained on 5000 examples from the Magpie dataset, which were generated by R1. This is q_8.
This is v3. v3 was trained on 4 3090 GPUs for about 55 hours.
In order to produce reasoning traces, make sure to tell the model in the system prompt to use them. Format is the same as Deepseek's. You must use this systemp prompt to elicit the chain of thought:
Think through your responses carefully, always break it down. Always think before you answer by using the thinking tags: <think> response </think> response. When you think, make sure you decompose the question and consider multiple answers always before you stop thinking.
Model Details
- Base Model: Microsoft Phi-4 (14B parameters)
- Format: GGUF (8-bit quantization)
- Fine-tuning: LoRA with merged weights, 4-bit double quantization with Flash Attention v1
- Training Dataset: Magpie Reasoning Dataset
- Version: 3
Training Data
- 2,200 excellent quality examples
- 3,000 good quality examples
- Total training samples: 5,200
Evaluation Dataset
- 5 very hard + excellent quality examples
- 5 medium + excellent quality examples
- 5 very easy + excellent quality examples
Technical Details
LoRA Parameters:
- Rank (r): 24
- Alpha: 48
- Target Modules: q_proj, k_proj, v_proj, o_proj
- Dropout: 0.05
Training Configuration:
- Epochs: 5
- Learning Rate: 3e-5
- Batch Size: 1 with gradient accumulation steps of 16
- Optimizer: AdamW (Fused)
- Precision: BFloat16 during training
- Final Format: 8-bit quantized GGUF
Usage with llama.cpp
For CPU inference, use the following command:
main -m phi4-magpie-reasoning.gguf -n 512 --repeat_penalty 1.1 --color -i -r User:
Model Size
- GGUF Format (8-bit): ~14GB
- Original Model (14B parameters)
License
This model inherits the license terms from Microsoft Phi-4 and the Magpie dataset.
- Downloads last month
- 95
Model tree for muon-labs/phi4-magpie-reasoning-v3-gguf
Base model
microsoft/phi-4