
Phi-4 – small models, big results
The Phi-4 family is Microsoft's latest advancement in small language models (SLMs), designed to excel in complex reasoning tasks while maintaining efficiency. The Phi-4 series includes three key models: Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning. The newly released models are built with a clear focus: deliver advanced reasoning performance without the infrastructure demands of trillion-parameter models. They strike an optimal balance between size and performance using advanced techniques such as distillation, reinforcement learning, and carefully curated data.
Phi-4-reasoning is a 14-billion parameter model with a 32k token context window, trained using high-quality web data and OpenAI o3-mini prompts. It excels in tasks requiring detailed, multi-step reasoning such as mathematics, coding, and algorithmic problem solving.
Phi-4-reasoning-plus builds upon this with additional fine-tuning using 1.5x more tokens and reinforcement learning, delivering even higher accuracy and inference-time performance.
Phi-4-mini-reasoning, with just 3.8 billion parameters, was trained on one million synthetic math problems generated by DeepSeek R1. It targets use cases like educational tools and mobile apps, proving capable of step-by-step problem solving in resource-constrained environments.
What sets Phi-4 apart is not just efficiency, but sheer capability. On benchmarks like HumanEval+ and MATH-500:
- Phi-4-reasoning-plus outperforms DeepSeek-R1 (671B parameters) on some tasks, demonstrating that smarter training can beat brute force.
- It also rivals OpenAI’s o3-mini and exceeds DeepSeek-R1-Distill-Llama-70B on complex reasoning and planning tasks.
- Phi-4-mini-reasoning performs competitively with much larger models and even tops some in math-specific benchmarks.
True to Microsoft’s Responsible AI framework, all Phi-4 models are trained with strong safety protocols. Post-training involves supervised fine-tuning (SFT), direct preference optimization (DPO), and reinforcement learning from human feedback (RLHF). Microsoft uses public datasets focused on safety, helpfulness, and fairness – ensuring broad usability while minimizing risks.
All three models are freely available via Hugging Face and Azure AI Foundry, allowing researchers, startups, and educators to integrate high-performance reasoning into their own applications.