Introducing MAI-Code-1-Flash
Today we’re introducing MAI-Code-1-Flash, a new Microsoft coding model built for fast, efficient assistance in everyday developer workflows. It is built end-to-end by Microsoft using clean and appropriately licensed data. The model is rolling out to GitHub Copilot individual users in Visual Studio Code in the model picker and under the default auto picker.
Features and capabilities
- Agentic coding in real developer environments, trained and designed for GitHub Copilot harness, to work better together.
- Adaptive thinking, stays concise for simple requests and spends more reasoning budget on complex tasks.
- Strong instruction-following across single-turn and multi-turn scenarios.
MAI-Code-1-Flash is designed around the simple goal of delivering high-quality coding help with better efficiency. It outperforms Claude Haiku 4.5 with better price to performance across coding benchmarks.
Build for developers, not benchmarks
Coding models are most useful when they perform well in the same environment developers use every day. That is why we built MAI-Code-1-Flash with production workflows at the center, rather than optimizing only for benchmarks. The model was trained directly with GitHub Copilot harnesses used in production. This allows it to learn how to interact with surrounding tools and systems in agentic coding tasks, making it uniquely well suited to real-world Copilot workflows compared to other available models.
During training, we evaluated checkpoints across core software engineering tasks, repository question answering, refactoring, and telemetry-grounded tasks adapted from real GitHub Copilot usage. This alignment between training, evaluation, and production helps offline improvements translate into real-world developer quality.
Designed to maximize value per token
MAI-Code-1-Flash was trained with adaptive solution length control, which helps the model adjust the depth of its response to the task. It can stay concise for simpler requests and spend more reasoning budget when a problem requires deeper analysis or broader code changes. In practice, this means developers start seeing useful output sooner. We see MAI-Code-1-Flash solving harder problems with up to 60% fewer tokens. This helps reduce latency, lower cost, improve return on token, and make interactive workflows feel smoother.
Benchmark results in the production harness
To understand both quality and efficiency, we evaluated MAI-Code-1-Flash against Claude Haiku 4.5 on SWE-Bench Verified, SWE-Bench Pro, SWE-Bench Multilingual, and Terminal Bench 2 using the same production harness that developers use for their everyday coding tasks. We measured task success and the average number of solution tokens required to complete each task.
MAI-Code-1-Flash outperforms Claude Haiku 4.5 across all core coding benchmarks tested, with higher pass rates on all 4 evaluations, including a +16-point lead on the diverse, real-world tasks of SWE-Bench Pro (51.2% vs. 35.2%). It’s not just smarter; it’s leaner, solving harder problems with up to 60% fewer tokens on SWE-Bench Verified, proving that higher accuracy and greater efficiency are no longer a trade-off.
Math, Science, Instruction Following, and Agentic coding tasks
MAI-Code-1-Flash comes out ahead on every benchmark in the table, with the widest margin on IF Bench precise instruction following (+28.9) and the narrowest on rubric-based Advanced IF (+14.5). The strong instruction-following carries over to agentic tool use.
Furthermore, MAI-Code-1-Flash also outperforms Claude Haiku-4.5 on core reasoning capabilities in math, science, and visual generation coding.
Standard benchmarks reward memorization as much as reasoning, for example a model that has seen the Monty Hall problem will answer it correctly, but invert the prizes and it fails. We built a 186-question, 34-category benchmark around adversarial traps like inverted classics, impossible tasks, and underdetermined scenarios to see whether models were actually reasoning or just pattern-matching. MAI-Code-1-Flash surpasses Claude Haiku 4.5 overall and reached 85.8% adjusted accuracy, with especially strong performance in reasoning, instruction-following, and recognizing impossible problems. We also see room for the model to grow, since core adversarial categories like Einstellung traps remained below 50% accuracy.
Try it out
MAI-Code-1-Flash is now rolling out to VS Code GitHub Copilot individual users. No additional setup is required. As the rollout progresses, you may see GitHub Copilot route tasks to MAI-Code-1-Flash through the Auto picker, or see the model available directly in the model picker.
Here are a few fun sample apps we built with MAI-Code-1-Flash in VS Code:
We would love to hear from you! Please join the GitHub Community to share your feedback.
Build the Future With Us
We’re a lean, fast-moving lab made up of some of the world’s most talented minds. We have an exciting roadmap of compute at MAI, with our next-generation GB200 cluster now operational. And we have an ambitious mission we truly believe in. We’re also fortunate to partner with incredible product teams giving our models the chance to reach billions of users and create immense positive impact. If you’re a brilliant, highly-ambitious and low ego individual, you’ll fit right in—come and join us as we work on our next generation of models!