Skip to main content

Today we’re announcing 3 new world class MAI models, available in Foundry Today we’re announcing 3 new world class MAI models, available in Foundry

April 2, 2026
Models
Mustafa Suleyman

Introducing MAI-Transcribe-1, alongside MAI-Voice-1 and MAI-Image-2. World-class quality at lightning speeds, now available at the most competitive prices.

Available now in Microsoft Foundry and MAI Playground.

MAI-Transcribe-1 delivers state-of-the-art speech-to-text transcription across the top 25 most-used languages 1 according to the industry-standard FLEURS benchmark 2 Built to deliver world class quality in messy, real-world environments, its batch transcription speed is 2.5x that of existing Microsoft Azure Fast offering. It’s also incredibly efficient, making MAI-Transcribe-1 not just the most accurate, but also lightning fast. It’s now available in Foundry at the best price-performance of any large cloud provider.

Lower is better.

MAI-Voice-1 is our top-tier voice generation model. Built to generate natural, realistic speech, rich with nuance, emotional range and expression that preserves speaker identity even across long-form content.

Today we’re adding the ability to safely and securely create your own custom voice in Microsoft Foundry with just a few seconds of audio. MAI-Voice-1 can transform how easily developers can build voice experiences and voice agents – at high quality and high speed.

The model can generate 60 seconds of audio in just a single second, and highly efficient GPU usage delivers that quality and speed affordably. Hearing is believing, so experience it for yourself with Copilot Audio Expressions or Copilot Podcasts.

The text "MAI-Transcribe-1" appears in bold, brown letters centered on a plain, light beige background.

MAI-Image-2 has turbocharged image generation performance and speed on Copilot after debuting as a top 3 model family on the Arena.ai leaderboard. Users experience at least 2x faster generation times on Foundry and Copilot with similar quality, based on real-world production traffic data. Phased rollouts are also underway in Bing and PowerPoint.

MAI-Image-2 was created with photographers, designers, and visual storytellers that demand natural lighting, accurate skin tones and texture, and clear in-image text for diagrams, layouts, and graphics. Once again, speed and quality don’t come at higher costs – MAI-Image-2 is offered at competitive price-to-performance.

Customers are already embracing MAI-Image-2 for creative work. WPP, one of the world’s largest marketing and communications groups, is among the first enterprise partners building with MAI-Image-2 at scale.

MAI-Image-2 is a genuine game-changer. It’s a platform that not only responds to the intricate nuance of creative direction, but deeply respects the sheer craft involved in generating real-world, campaign-ready images,” said Rob Reilly, Global Chief Creative Officer, WPP. “WPP has some of the best creative talent in the world and MAI-Image-2 is making them even better.”

A woman stands holding an orange umbrella, photographed from a low angle. She wears a white blouse and blue jeans, with the background featuring a plain white wall.
A bottle labeled "Sofily" sits on a wooden table decorated with peach flowers and a vase of orange blooms, bathed in warm sunlight with a shadowy background.
Images created by WPP using MAI-Image-2

MAI Models: Better, faster, and cheaper than our competitors.

We are rapidly deploying these top-tier models to power our own consumer and commercial products. We’re excited to share the quality, speed, and efficiency gains with our Microsoft Foundry customers with very competitive pricing.

· MAI-Transcribe-1 starts at $0.36 per hour.

· MAI-Voice-1 pricing starts at $22 per 1M characters.

· MAI-Image-2 starts at $5 per 1M tokens for text input and $33 per 1M tokens for image output.

Available now on Microsoft Foundry and MAI Playground.

Starting today, every developer can build with MAI models, including MAI-Transcribe-1, through Microsoft Foundry. You can also try them in the MAI Playground (US only).

Interested in MAI models but don’t have Foundry access?
Fill out this form and we’ll be in touch.

Models that are built to be better from the inside out.

At Microsoft AI, we’re building Humanist AI. We have a distinct view when creating our AI models — putting humans at the center, optimizing for how people actually communicate, training for practical use. You’ll see more models from us soon in Foundry and directly in Microsoft products and experiences.

Consistent with our commitment to safe and responsible AI, these MAI models were developed, tested, and rigorously red-teamed. Through Microsoft Foundry, developers get built-in guardrails, governance, and enterprise-grade controls designed to support safe, compliant deployment at scale.

Model Cards

Download Model Card for MAI-Transcribe-1

Download Model Card for MAI-Voice-1

Download Model Card for MAI-Image-2

1. Top 25 languages by Microsoft product usage.

2. Out of the top 25 global languages, MAI-Transcribe-1 ranks 1st by FLEURS in 11 core languages. It wins against Whisper-large-v3 on the remaining 14 and Gemini 3.1 Flash on 11 of those 14.

Related Stories Related Stories