A blurred, abstract landscape featuring soft green and yellow hues, with hints of mountains and a field under a hazy sky. The image has a dreamy, painterly effect.

MAI-Transcribe-1.5

Turn noisy audio into highly accurate, domain‑aware transcripts across 43 languages.

Features

MAI-Transcribe-1.5 delivers consistent, high-accuracy transcription across languages, accents, and challenging audio conditions.

Leading accuracy across 43 languages

State-of-the-art transcription quality across a globally diverse set of languages and speaking styles, with automatic language detection included.

Deutsch

Chinese 中文

Italiano

Hindi हिन्दी

English (Australian)

Français

Español

English

Japanese 日本語

Português

Adapts to your industry

Tailors transcription to domain-specific terminology, making it ready for captions, call analysis, accessibility, and content workflows out of the box.
Agent 00:04

Thank you for calling Pharmacy Support. My name is Jordan. How can I help you today?

HCP 00:10

Hi. I was charged twice for my Lisinopril refill last week and I haven’t heard anything about a refund.

Agent 00:19

I’m sorry about that. It looks like the auto-refill processed at the same time as your manual order. I’ll reverse the extra charge now — you should see it back within 3 to 5 business days.

HCP 00:31

That makes sense. Thank you for sorting that out.

Daniel 00:06

Let’s start with the Q3 budget review. I sent the updated spreadsheet this morning — did everyone get a chance to look at it?

Sara 00:14

I did. The travel line item looks high to me. Can we revisit that before we approve?

Daniel 00:21

Sure. I think two of those trips could be moved to virtual. That would bring it in line with last quarter.

Sara 00:29

Agreed. Let’s flag it for revision before the final sign-off.

Anton Johnson
00:05

This is Dr. Johnson. Dictating notes for a follow-up visit with Marcus T., 47-year-old male.

00:12

Patient presents with persistent lower back pain, three weeks in duration. Reports morning stiffness that improves with movement. No radiation to the legs.

00:24

Assessment: likely mechanical lower back pain. No red flag symptoms. Contributing factors include posture and sedentary work habits.

00:32

Plan: referral to physical therapy, six sessions. Ibuprofen 400 milligrams as needed with food. Follow up in four weeks or sooner if symptoms worsen. End note.

Handles anything you throw at it

Built to perform in imperfect conditions, background noise, variable audio quality, and everything in between.
Colorful sketch of a busy cafe with people sitting at tables drinking coffee, reading, and talking. A barista works behind the counter, and bookshelves line the back wall. The atmosphere is lively and social.

Hey, so I was hoping to change my flight, if that’s at all possible.

Using the Model

Hey, so I was hoping to change my flight, if that’s at all possible. It’s currently set for 10 pm tonight, but I’m really trying to switch to something earlier, ideally sometime before 6 pm. Is that something we could maybe look into?

Hey, so I was hoping to change my flight, if that’s at all possible. It’s currently set for 10 pm tonight, but I’m really trying to switch to something earlier, ideally sometime before 6 pm. Is that something we could maybe look into?

Hey, so I was hoping to change my flight, if that’s at all possible. It’s currently set for 10 pm tonight, but I’m really trying to switch to something earlier, ideally sometime before 6 pm. Is that something we could maybe look into?
Performance

Industry leading accuracy

MAI-Transcribe-1.5 achieved the lowest Word Error Rate among leading speech-to-text models. On FLEURS across 43 languages, it outperformed Scribe V2, Whisper-large-V3, GPT-4o-Transcribe, and Gemini 3.1 Flash.

FLEURS Benchmark

Languages in which we outperformed on FLEURS public ‘test’ dataset
Outperforming Scribe v2
Language Count

29 /43

Languages beating
Gemini 3.1 Flash

34 /43

Languages beating
GPT-Transcribe

27 /40

Overall Average Word Error Rate (WER) by model

Average Word-Error-Rate Across 43 Languages

Version Comparison

MAI-Transcribe-1.5

  • Avg WER on FLEURS 4.9%
  • WER on Artificial Analysis 2.4%
  • Languages 43
  • Contextual Biasing YES
  • Latency 5.7X
  • Pricing $0.36 per hour of audio
View docs

MAI-Transcribe-1

  • Avg WER on FLEURS 3.9%
  • WER on Artificial Analysis 2.6%
  • Languages 25
  • Contextual Biasing NO
  • Latency
  • Pricing $0.36 per hour of audio
View docs

Try MAI-Transcribe-1.5

MAI Playground

Experiment with all other MAI models.
Try in playground

Microsoft Foundry (Azure Speech)

Turn every spoken word into accurate, searchable text instantly.
Try it in Foundry
English (United States)
Your Privacy Choices Opt-Out Icon Your Privacy Choices
Consumer Health Privacy Sitemap Contact Microsoft Privacy Manage cookies Terms of use Trademarks Safety & eco Recycling About our ads