MAI-Image-2-Efficient: Flagship Quality, 41% Lower Cost MAI-Image-2-Efficient: Flagship Quality, 41% Lower Cost

April 14, 2026

Models

MAI Superintelligence Team

A collage featuring a beige knitted sweater against a blue sky, close-up texture shots, and a person wearing the sweater, all set on a brown background with curved white lines.

Available now in Microsoft Foundry and MAI Playground

We built MAI-Image-2 to be our best text-to-image model — photorealistic, expressive, with reliable in-image text.

Today we’re making all that faster and cheaper.

Meet MAI-Image-2-Efficient.

Production-ready quality. Built for speed and scale. 22% faster and 4x more efficient¹. And priced nearly 41% lower — $5 per 1M text input tokens, $19.50 per 1M image output tokens.

That’s not just faster than our own flagship. It’s 40% faster on average than other leading text-to-image models².

Two models, two jobs

MAI-Image-2-Efficient is your production workhorse. Use it when you need volume, speed, and tight cost control — product shots, marketing creatives, UI mockups, branded assets, batch pipelines. It handles short-form text like headlines and labels cleanly, and it’s built to run in real-time, interactive workflows without breaking a sweat.

MAI-Image-2 is your precision tool. Reach for it when the brief demands the highest fidelity — portraits, photorealistic scenes, stylized looks like anime or illustration, and longer or more complex in-image text. This is the model for final deliverables where every detail matters.

Start building now

MAI-Image-2-Efficient is available today in Microsoft Foundry and MAI Playground³. No waitlist, no preview — just plug it in and go. It’s also rolling out across Copilot and Bing, with more surfaces like PowerPoint coming soon.

Partners like Shutterstock are already testing with promising results:

“MAI-Image-2-Efficient shows strong progress in prompt fidelity and creative usability across a range of workflows. In our evaluation work, we look closely at how well models translate intent into consistent, production-ready outputs, and this model is trending in the right direction. That level of reliability is what ultimately matters when teams move from experimentation into real-world use.” – Vanessa Salvo, Principal Product Manager, Shutterstock

This is just the beginning. More models ahead — stay tuned.

A collage of six sections: clothing labels, orange slices with bottles, close-up tomatoes, skin care products with sky background, bottles with figs, and abstract orange and white graphic with the words "THE FUTURE CAN WAIT.

Download Model Card

As tested on April 13, 2026. Compared to MAI-Image-2 when normalized by latency and GPU usage. Throughput per GPU vs MAI-Image-2 on NVIDIA H100 at 1024×1024; measured with optimized batch sizes and matched latency targets. Results vary with batch size, concurrency, and latency constraints.
As tested on April 13, 2026. Compared to Gemini 3.1 Flash (high reasoning), Gemini 3.1 Flash Image and Gemini 3 Pro Image: Measured at p50 latency via AI Studio API (1:1, 1K images; minimal reasoning unless noted; web search disabled). MAI-Image-2, MAI-Image-2e, GPT-Image-1.5-High: Measured at p50 latency via Foundry API.
MAI Playground is available in select markets including the US. Coming soon to EU countries.

Today we’re announcing 3 new world class MAI models, available in Foundry Today we’re announcing 3 new world class MAI models, available in Foundry

April 2, 2026

Models

Mustafa Suleyman

Introducing MAI-Transcribe-1, alongside MAI-Voice-1 and MAI-Image-2. World-class quality at lightning speeds, now available at the most competitive prices.

Available now in Microsoft Foundry and MAI Playground.

MAI-Transcribe-1 delivers state-of-the-art speech-to-text transcription across the top 25 most-used languages ¹ according to the industry-standard FLEURS benchmark. ² Built to deliver world class quality in messy, real-world environments, its batch transcription speed is 2.5x that of existing Microsoft Azure Fast offering. It’s also incredibly efficient, making MAI-Transcribe-1 not just the most accurate, but also lightning fast. It’s now available in Foundry at the best price-performance of any large cloud provider.

Lower is better.

MAI-Voice-1 is our top-tier voice generation model. Built to generate natural, realistic speech, rich with nuance, emotional range and expression that preserves speaker identity even across long-form content.

Today we’re adding the ability to safely and securely create your own custom voice in Microsoft Foundry with just a few seconds of audio. MAI-Voice-1 can transform how easily developers can build voice experiences and voice agents – at high quality and high speed.

The model can generate 60 seconds of audio in just a single second, and highly efficient GPU usage delivers that quality and speed affordably. Hearing is believing, so experience it for yourself with Copilot Audio Expressions or Copilot Podcasts.

The text "MAI-Transcribe-1" appears in bold, brown letters centered on a plain, light beige background.

MAI-Image-2 has turbocharged image generation performance and speed on Copilot after debuting as a top 3 model family on the Arena.ai leaderboard. Users experience at least 2x faster generation times on Foundry and Copilot with similar quality, based on real-world production traffic data. Phased rollouts are also underway in Bing and PowerPoint.

MAI-Image-2 was created with photographers, designers, and visual storytellers that demand natural lighting, accurate skin tones and texture, and clear in-image text for diagrams, layouts, and graphics. Once again, speed and quality don’t come at higher costs – MAI-Image-2 is offered at competitive price-to-performance.

Customers are already embracing MAI-Image-2 for creative work. WPP, one of the world’s largest marketing and communications groups, is among the first enterprise partners building with MAI-Image-2 at scale.

“MAI-Image-2 is a genuine game-changer. It’s a platform that not only responds to the intricate nuance of creative direction, but deeply respects the sheer craft involved in generating real-world, campaign-ready images,” said Rob Reilly, Global Chief Creative Officer, WPP. “WPP has some of the best creative talent in the world and MAI-Image-2 is making them even better.”

A woman stands holding an orange umbrella, photographed from a low angle. She wears a white blouse and blue jeans, with the background featuring a plain white wall.

A bottle labeled "Sofily" sits on a wooden table decorated with peach flowers and a vase of orange blooms, bathed in warm sunlight with a shadowy background.

Images created by WPP using MAI-Image-2

MAI Models: Better, faster, and cheaper than our competitors.

We are rapidly deploying these top-tier models to power our own consumer and commercial products. We’re excited to share the quality, speed, and efficiency gains with our Microsoft Foundry customers with very competitive pricing.

· MAI-Transcribe-1 starts at $0.36 per hour.

· MAI-Voice-1 starts at $22 per 1M characters.

· MAI-Image-2 starts at $5 per 1M tokens for text input and $33 per 1M tokens for image output.

Available now on Microsoft Foundry and MAI Playground.

Starting today, every developer can build with MAI models, including MAI-Transcribe-1, through Microsoft Foundry. You can also try them in the MAI Playground (US only).

Interested in MAI models but don’t have Foundry access?
Fill out this form and we’ll be in touch.

Models that are built to be better from the inside out.

At Microsoft AI, we’re building Humanist AI. We have a distinct view when creating our AI models — putting humans at the center, optimizing for how people actually communicate, training for practical use. You’ll see more models from us soon in Foundry and directly in Microsoft products and experiences.

Consistent with our commitment to safe and responsible AI, these MAI models were developed, tested, and rigorously red-teamed. Through Microsoft Foundry, developers get built-in guardrails, governance, and enterprise-grade controls designed to support safe, compliant deployment at scale.

Model Cards

Download Model Card for MAI-Transcribe-1

Download Model Card for MAI-Voice-1

Download Model Card for MAI-Image-2

^1. Top 25 languages by Microsoft product usage.

^2. Out of the top 25 global languages, MAI-Transcribe-1 ranks 1st by FLEURS in 11 core languages. It wins against Whisper-large-v3 on the remaining 14 and Gemini 3.1 Flash on 11 of those 14.

Introducing MAI-Image-2: for limitless creativity Introducing MAI-Image-2: for limitless creativity

March 19, 2026

Models

MSI team

Imagery generated with MAI-Image-2

Ranked the #3 model family on the Arena.ai leaderboard.

Today, we’re announcing MAI-Image-2 — pushing MAI into the top three text-to-image labs in the world on the Arena.ai leaderboard.

You can try it now in the MAI Playground, where you can experiment with the latest available MAI models and share feedback directly with our teams.

Built with creatives, for creative work

For MAI-Image-2 we spoke with photographers, designers, and visual storytellers who made it clear where we could make the biggest difference for everyday creative work.

Enhanced photorealism

MAI-Image-2 is built for creatives who want images that feel like they exist in the world, with natural light, accurate skin tones, environments that feel lived-in. Creatives can now spend less time fixing in post-production and more time making.

A close-up of a person's face with closed eyes, soft sunlight illuminating their skin. Shadows from nearby branches or leaves create intricate patterns across their face.

Close-up of a human eye's iris, showing detailed, radiating yellow and brown fibers around the dark black pupil. The intricate patterns and vivid colors create a dramatic, abstract effect.

A glacier wall towering like a cathedral interior, deep blue ice with light refracting through layers, tiny human figure at base for scale, cinematic, cold mist in air, hyper-real detail

A person in red winter gear stands inside a massive blue ice cave, surrounded by textured, translucent ice walls and illuminated by sunlight streaming through the arching entrance above.

Reliable in-image text generation

From poster type to the sign in the background of a scene, text can be a key part of imagery. MAI-Image-2 enables consistent creation of infographics, slides, diagrams, and more, with little lost between direction and creation.

Rich, detailed scene generation

Some of the most exciting creative work lives in the strange, the cinematic, the hyper-detailed. MAI-Image-2 is built for that space: surreal concepts, ornate compositions, and ambitious worlds, turning imagination into images.

Abstract modernist design with a red circle, beige vertical and diagonal lines on a black background. Bold text "MODERNISM" appears vertically on the right, with a brief definition in white text on the lower left.

Image of three vibrant oranges with green leaves. Overlaid text reads: "THE IAM MAI CAFÉ. Breakfast, Lunch. Open 9am to 3pm. 03.19.2026." Menu items with prices and bottomless mimosas are also listed.

Typographic layouts and posters can be created with specific prompts on style, imagery, fonts, colors, and more.

A rider on a galloping horse jumps over an obstacle, with an orange, green, and white background. Text below announces "Jumping International CSI 8*" on 13-15 April 2026 and highlights "SAINT FLASH.

Make something today with MAI-Image-2

Preview MAI-Image-2 today on MAI Playground and let us know what you think. We genuinely want to hear from you!

MAI-Image-2 is beginning to roll out on Copilot and Bing Image Creator.  API access is available today for select Microsoft customers, like WPP, who need image generation at scale, and will be open to any developer on Microsoft Foundry soon. If you are interested in exploring MAI-Image-2 for commercial use, fill out an application and we’ll follow up with more details.

There’s much more to come from the Microsoft AI Superintelligence team,  stay tuned.

Try MAI-Image-2

A collage of 16 diverse images, including a ballerina, butterfly wing, person jumping, misty mountains, bubbles over water, seashell, jellyfish, snowflake, green hills, pleated skirt, sand dunes, animal eye, feather, water drop, and ocean waves.

Build the Future With Us Build the Future With Us

We’re a lean, fast-moving lab made up of some of the world’s most talented minds. We have an exciting roadmap of compute at MAI, with our next-generation GB200 cluster now operational. And we have an ambitious mission we truly believe in. We’re also fortunate to partner with incredible product teams giving our models the chance to reach billions of users and create immense positive impact. If you’re a brilliant, highly-ambitious and low ego individual, you’ll fit right in—come and join us as we work on our next generation of models!

Explore all jobs

Introducing MAI-Image-1, debuting in the top 10 on LMArena Introducing MAI-Image-1, debuting in the top 10 on LMArena

October 13, 2025

Models

A collage featuring nature photos (a frog in water, a rabbit, mountains, fields, sunset, a tree) and food photos (pizza, sushi, a grapefruit slice), arranged in a grid on a light beige background.

Update – November 4, 2025:

We have begun launching MAI-Image-1 into select Microsoft products!

Try it in Bing Image Creator: Available at bing.com/create, in the Bing mobile app, or right from the Bing search bar, Bing Image Creator is built to meet people where they already search and create. MAI-Image-1 is now an option alongside DALL-E 3 and GPT4o in the model menu, enabling you to experiment and pick the model that best matches your creative goals.

Try it in Copilot Audio Expressions: Now, when you select Story Mode, Audio Expressions will use MAI-Image-1 to visualize your story with a unique image.

MAI-Image-1 is currently available in all countries that can access Bing Image Creator and Copilot Labs.

Earlier Announcement – October 13, 2025:

Today, we’re announcing MAI-Image-1, our first image generation model developed entirely in-house, debuting in the top 10 text-to-image models on LMArena.

At Microsoft AI, we’re creating AI for everyone – a supportive, helpful presence always in the service of humanity. We’ve shared how purpose-built models are essential for this mission, and we announced our first two in-house models in August. MAI-Image-1 marks the next step on our journey and paves the way for more immersive, creative and dynamic experiences inside our products.

We trained this model with the goal of delivering genuine value for creators, and we put a lot of care into avoiding repetitive or generically-stylized outputs. For example, we prioritized rigorous data selection and nuanced evaluation focused on tasks that closely mirror real-world creative use cases – taking into account feedback from professionals in the creative industries. This model is designed to deliver real flexibility, visual diversity and practical value.

MAI-Image-1 excels at generating photorealistic imagery, like lighting (e.g., bounce light, reflections), landscapes, and much more. This is particularly so when compared to many larger, slower models. Its combination of speed and quality means users can get their ideas on screen faster, iterate through them quickly, and then transfer their work to other tools to continue refining.

A roadrunner with brown and white streaked feathers runs across a sandy desert with sparse shrubs. A flat-topped mesa is visible in the background under a clear blue sky.

“MAI-Image-1” is written in the sand on a beach at sunset, with calm waves and a colorful sky in the background. The sun is low on the horizon, casting a warm glow over the scene.

[1] A roadrunner sprinting across sand [2] MAI-Image-1 written in the sand at sunset over the beach [3] A man crossing a city street

A young man in a coat and jeans walks across a city street at sunset, with buildings, a café, and a blurred cyclist in the background. Warm sunlight creates long shadows on the road.

Build the Future With Us Build the Future With Us

Explore all jobs

Two in-house models in support of our mission Two in-house models in support of our mission

August 28, 2025

Models

At Microsoft AI (MAI) we believe AI should be used to empower every person on the planet. We are creating AI for everyone, a supportive, helpful presence always in the service of humanity. It will be the gateway to a universe of knowledge and a set of capabilities that enable people and organizations to achieve more. Responsible, reliable, filled with personality and expertise, we are focused on creating applied AI as a platform for category defining and deeply trusted products that understand each of our unique needs.

Since last year, we’ve been focused on building the foundation for this vision, with a world class team and infrastructure. To fully meet our goals, MAI requires purpose-built models. Today, we’re excited to preview the first steps to making this a reality.

First, we’re releasing MAI-Voice-1, our first highly expressive and natural speech generation model, which is available in Copilot Daily and Podcasts, and as a brand new Copilot Labs experience to try out here. Voice is the interface of the future for AI companions and MAI-Voice-1 delivers high-fidelity, expressive audio across both single and multi-speaker scenarios.
Second, we have begun public testing of MAI-1-preview on LMArena, a popular platform for community model evaluation. This represents MAI’s first foundation model trained end-to-end and offers a glimpse of future offerings inside Copilot. We are actively spinning the flywheel to deliver improved models. We’ll have much more to share in the coming months. Stay tuned!

We have big ambitions for where we go next. Not only will we pursue further advances here, but we believe that orchestrating a range of specialized models serving different user intents and use cases will unlock immense value. There will be a lot more to come from this team on both fronts in the near future. We’re excited by the work ahead as we aim to deliver leading models and put them into the hands of people globally.

Try MAI-Voice-1 in Copilot and Copilot Labs

MAI-Voice-1 is a lightning-fast speech generation model, with an ability to generate a full minute of audio in under a second on a single GPU, making it one of the most efficient speech systems available today.

MAI-Voice-1 is already powering our Copilot Daily and Podcasts features. We are also launching it in Copilot Labs where you can try our expressive speech and storytelling demos. Imagine creating a “choose your own adventure” story with just a simple prompt, or crafting a bespoke guided meditation to help you sleep. Give it a try!

On a sunny afternoon, a spirited four-year-old named Jamie approached a grizzled pirate who was lounging by the docks. Arr! What be ye wantin’, wee one? This crew ain’t fer the faint of heart! Jamie’s eyes sparkled with excitement as they replied, I wanna be a pirate! I wanna sail the seas and find treasure! Can I join your crew, please? The pirate scratched his beard, chuckling at the child’s enthusiasm. I ye think ye can handle the salty sea air and the dangers of the deep? Jamie nodded vigorously, determination shining through. I can! I can! I’ll be the best pirate ever! The pirate leaned closer, intrigued by Jamie’s spirit. All right, but ye must prove your worth. What be our first task, young matey?

Under a sprawling Texas sky, a skeptical cowboy and an enthusiastic techie met outside a diner. I reckon this fancy AI voice model ain’t all it’s cracked up to be. Ain’t no machine gonna sound like a real human, the techie chuckled, shaking his head. Oh, come on, this thing can express emotions better than some folks I know. It’s like having a storyteller right in your pocket. The cowboy squinted, pondering the implications of such technology. Maybe so, but can it spin a yarn around a campfire? I ain’t convinced just yet. The techie grinned, undeterred by the cowboy’s skepticism. Just wait till you hear it. It might just surprise you, partner.

Try MAI-1-preview in LMArena

MAI-1-preview is an in-house mixture-of-experts model, pre-trained and post-trained on ~15,000 NVIDIA H100 GPUs. This model is designed to provide powerful capabilities to consumers seeking to benefit from models that specialize in following instructions and providing helpful responses to everyday queries.

We will be rolling MAI-1-preview out for certain text use cases within Copilot over the coming weeks to learn and improve from user feedback. We will continue to use the very best models from our team, our partners, and the latest innovations from the open-source community to power our products. This approach gives us the flexibility to deliver the best outcomes across millions of unique interactions every day.

In addition to LMArena, we are also making this model available to trusted testers – apply for API access here. We’re excited to collect early feedback to learn more about where the model performs well and how we can make it better. Stay tuned for more.

Build the future with us Build the future with us

We’re a lean, fast-moving lab made up of some of the world’s most talented minds. We have an exciting roadmap of compute at MAI, with our next-generation GB200 cluster now operational. And we have an ambitious mission we truly believe in. We’re also fortunate to partner with incredible product teams giving our models the chance to reach billions of users and create immense positive impact. If you’re a brilliant, highly-ambitious and low ego individual, you’ll fit right in – come and join us as we work on our next generation of models!

Explore all jobs

Explore

Latest