Introducing MAI-Transcribe-1.5

June 2, 2026

Models

Superintelligence team

Today we’re launching our MAI-Transcribe-1.5, the most accurate multilingual speech-to-text model with a best-in-class Word-Error-Rate (WER) across 43 languages.

This latest model has expanded the range of languages available without compromising accuracy and quality.

It’s now being integrated into Copilot, Teams, GitHub, and Dynamics 365 Contact Centre – and it’s also available in Foundry, where it’s the fastest, most efficient and most cost‑effective transcription model of any hyper-scaler.

Features and Capabilities

SOTA accuracy as shown on FLEURS multilingual transcription benchmark, and #3 on the Artificial Analysis leaderboard.
Leading accuracy x speed on Artificial Analysis leaderboard.
Expanded language coverage from 25 to 43.
Can transcribe an hour of audio in under 15 seconds. Up to five times faster on long audio than Gemini 3.1, Scribe v2, GPT-4o-Transcribe.
Includes Keyword Biasing, enabling the model to be aware of domain specific terminology which improves WER by up to 30% on FLEURS.
Optimized for real-world use cases such as being able to handle transcription with noisy backgrounds.

Accuracy

We expanded coverage by 18 new languages without compromising accuracy. On FLEURS – the standard multilingual benchmark – we have achieved best-in-class Word Error Rate across 43 languages, maintaining our position as the most accurate model on the benchmark.

A table showing word-error-rate percentages for 32 languages across four AI models: MALT Transcribe, Sonix v2, OpenAI 3.1 Flash Lite, and GPT-4o Transcribe. GPT-4o has the lowest average error rates for most languages.

On the Artificial Analysis leaderboard we achieved a Word Error Rate of 2.4%, achieving #3 position in a very competitive open benchmark.

Speed

MAI-Transcribe-1.5 is now a leader in terms of accuracy x speed on the Artificial Analysis leaderboard, running up to 5x faster than models of comparable accuracy.

This is particularly impactful when transcribing long audio files, as the model can transcribe an hour of audio in under 15 seconds.

Table comparing MAI-Transcribe-1 and MAI-Transcribe-15 for FLEURS (both ranked #1), overall WER (2.6% vs 2.4%), and transcription speed (both transcribe 1 hour of audio in 53 seconds).

Keyword biasing

A major challenge for many transcription models is when they fail on domain specific words, which often matter the most to users. These often include people and product names, medical terms, internal acronyms, and customer-specific vocabulary which are critical for enterprises.

MAI-Transcribe-1.5 can now bias its predictions toward a list of domain specific keywords provided by the user. The model does not blindly force matches, it uses the shared context to decide when keyword biasing should apply. This dramatically improves recognition of specialized vocabulary while maintaining accuracy on general speech.

When using the keyword biasing, we observe a 30% reduction in Word-Error-Rate (WER) on the FLEURS multilingual benchmark.

English

Without keyword biasing
So, um, for the next phase, Sean will, uh, take care of the documentation. Oif, right, uh, she’ll handle the user testing sessions. Societal is, um, leading the workflow design. Soren will, uh, set up the analytics, and Niamh is going to coordinate the deployment timeline.

With keyword biasing
List of keywords: “Aisling, Shaun, Xochitl, Ljubiša, Søren, Siobhán, Jorge, Nguyễn Phúc, Aoife, Tadhg, Ghislaine, Niamh, Szczepan, Eoin, Kseniya, Wojciech, Xavier, Maoz”
So, um, for the next phase, Shaun will, uh, take care of the documentation. Aoife, right, uh, she’ll handle the user testing sessions. Xochitl is, um, leading the workflow design. Søren will, uh, set up the analytics, and Niamh is going to coordinate the deployment timeline.

What’s next

Diarization – the ability to identify who said what in multi-speaker audio – essential for meetings, interviews, and call center analytics.
A native streaming API, enabling real-time transcription for live applications and voice agents, moving beyond the current batch-first approach.
Expanded language support – giving each new language the same depth of accuracy and robustness as the existing 43 languages.

Try it out

You can also explore the models directly in the MAI Playground.

Learn more about MAI-Transcribe-1.5

Model card [Link]
Foundry API documentation [Link]
Cookbook [Link]

Build the Future With Us

We’re a lean, fast-moving lab made up of some of the world’s most talented minds. We have an exciting roadmap of compute at MAI, with our next-generation GB200 cluster now operational. And we have an ambitious mission we truly believe in. We’re also fortunate to partner with incredible product teams giving our models the chance to reach billions of users and create immense positive impact. If you’re a brilliant, highly-ambitious and low ego individual, you’ll fit right in—come and join us as we work on our next generation of models!

Explore all jobs

Latest models

MAI-Voice-2

MAI-Thinking-1

MAI-Code-1-Flash

MAI-Image-2.5

MAI-Transcribe-1.5