Introducing Copilot Health
Introducing Copilot Health
At some point, we’ve all stared at a test result we didn’t understand. Worn a device that tracked everything but revealed little. Sat in a clinic waiting room with a list of questions we forgot the moment we sat down for a consultation. Felt that quiet, unsettling feeling that something is off – but had nowhere to take it.
The truth is that most people don’t need more information. They need help to make sense of what they already have.
That’s what Copilot Health is for.
Today, we’re launching Copilot Health, a separate, secure space within Copilot where medical intelligence makes sense of your information and delivers personalized health insights that you can act on.
Copilot Health doesn’t replace your doctor. It makes every minute you have with them count more. You arrive prepared, with the right questions, the right context, and the confidence that comes from better understanding your own body.
Copilot Health brings together your health records, wearable data, and health history into one place, then applies intelligence to turn them into a coherent story. Where the connection between your broken sleep and the reasons why become visible. Where you stop scrolling symptoms at midnight and start having better informed conversations.
We’re making Copilot Health available through a careful, phased rollout. Today we’re opening a waitlist to join our early community shaping the experience.
Health Sources You Can Trust
Long waits, clinician shortages, and uneven access to medical care lead many people turning to online sources for help. From understanding first‑time knee pain to finding an open urgent care clinic, our consumer products at Microsoft already respond to over 50 million consumer health questions a day. You can learn more about the health questions people bring to Copilot here.
We’ve improved the quality and reliability of answers by elevating information from credible health organizations across 50 countries, as verified by our clinical team using principles independently established by the National Academy of Medicine. Responses include clear citations with easy links to source material, alongside expert‑written answer cards from Harvard Health. We’ve also made it easier to find a doctor that accepts your insurance. Copilot Health connects to real‑time US provider directories so users can search for clinicians by specialty, location, languages spoken and insurance coverage.
All Your Health Data in One Place
A truly helpful health companion needs to do more than provide general answers – it needs to draw on your health history and goals. Copilot Health gives you a dedicated space to bring all your personal health data together into a comprehensive profile including your:
- Activity levels, sleep patterns, vital signs, and other trends from over 50 wearable devices including Apple Health, Oura, Fitbit and more.
- Health records from over 50,000 U.S. hospitals and provider organizations through HealthEx, including your visit summaries, medication lists, and test results.
- Comprehensive lab test results from Function.
Towards Medical Superintelligence
Copilot Health makes use of increasingly sophisticated AI to make sense of patterns in your health data, surfacing more proactive and actionable insights.
Initiatives such as our Microsoft AI Diagnostic Orchestrator (MAI‑DxO) have already demonstrated impressive results in research environments. Forthcoming publications will outline how our systems can be applied across a broader range of clinical cases and conditions.
This work paves the way to providing users with trusted access to medical superintelligence – health AI that can ultimately combine the wide-ranging knowledge of a general physician, with the depth of a specialist. At every step, new AI features drawing on these capabilities will only be released into Copilot Health after rigorous clinical evaluations and with clear labelling.
Safe and Secure by Design
We recognize that having access to your personal and sensitive health information is an important responsibility. Your Copilot Health conversations and data are isolated from general Copilot and kept under additional access, privacy, and safety controls. Data in Copilot Health is protected with industry leading safeguards, including encryption at rest and in transit, strict access controls, and the ability to manage and delete your information when you choose. You can disconnect your connectors to health data sources such as electronic health records or wearables instantaneously at any time. Your information in Copilot Health is not used for model training.
Copilot Health is developed with our internal clinical team and informed by an external panel of over 230 physicians from more than 24 countries, who contribute medical expertise, safety feedback, and real‑world perspective. Microsoft’s responsible AI principles guide how Copilot Health is designed, developed, and deployed, with a focus on fairness, transparency, and accountability. These principles shape our product decisions – from data handling and model development to monitoring and incident response.
Copilot Health has achieved ISO/IEC 42001 certification, the world’s first standard for AI management systems, meaning an independent third party has verified how we build, govern, and continuously improve the AI behind this service.
Building For All
We are designing Copilot Health with a diverse set of users, working in collaboration with organizations like the AARP, who serve the interests of 38 million older Americans, and the National Health Council, representing over 180 patient advocacy groups. Our goal is for everyone to be able to use Copilot Health confidently.
How to Access
How to Access
Sign up to be one of the first to try Copilot Health and help shape the experience.
Copilot Health is launching first in English in the United States to adults aged 18 and older. We are actively developing additional language and voice options and will announce expanded support and new geographies when ready.
Copilot Health is not intended to diagnose, treat, or prevent diseases or other conditions and is not a substitute for professional medical advice.
Related Stories
Related Stories
Health Check: How People Use Copilot for Health
Health Check: How People Use Copilot for Health
There’s nothing more important than your health.
Our 2025 Copilot Usage Report revealed that people talk about their health, and the conditions of their loved ones, more than any other topic on mobile.
Inspired by this finding, we decided to carry out an in-depth analysis of over half a million health and wellbeing-related conversations people had with Copilot over the course of January 2026.
This research shows not only the breadth and depth of people’s engagement with AI for their health, but how AI can show up through the growing cracks in our healthcare systems. It shows people changing topics over the course of the day, how AI supports squeezed family members, and helps cut through the complexity of navigating healthcare choices. In all this, it highlights the critical importance of accuracy, reliability, and trust.
As with all our usage reports and conversation analysis, we adopt a strict privacy-preserving approach. All conversations are de-identified at source, and we rely on an automated workflow that extracts topics and intents. No human reads user conversations as part of this process.
Although this research underwrites the importance of health in AI, what we found challenged many assumptions – people aren’t just asking general health questions. In nearly 1 in 5 conversations, people describe their own symptoms, get help interpreting their own test results, or managing their own conditions. And people aren’t just asking for themselves, but for the people who depend on them. Here are some highlights:
What People Ask About
People go to Copilot above all for information. They want the facts, fast and tailored to them. Around 40% of questions focus on understanding symptoms, medical conditions, and treatments. Questions framed in general terms may well reflect a user’s own health concern rather than casual curiosity, and the true share of personal health questions may be higher. In a landscape where information asymmetry and health misinformation remain widespread, people want trusted and easy to understand explanations drawn from credible sources.
Meaningful interactions go far beyond general knowledge. One of the most common reasons people turn to Copilot (10.9% of health questions) is to interpret symptoms (often new or unexpected) and to understand laboratory or imaging results. While safe interpretation still relies on qualified clinicians, these are practical, often time‑sensitive questions where people feel they need clear, credible explanations before taking the next steps.
Personalized lifestyle and fitness coaching drive significant engagement (9% of queries), with nutrition and exercise the top two sub-categories. What stands out here is the shift from generic advice to tailored, ongoing guidance – the kind of personalized support that traditional internet search tools don’t provide.
People also use Copilot to navigate the healthcare system (5.8% of health questions touch on healthcare navigation, insurance, or benefits). Users want to find local clinicians matching their medical concerns, location, and insurance coverage. They want help understanding benefits, comparing care options, and managing medical paperwork. In these stressful moments, Copilot functions as a guide through an often-opaque system, helping people feel more prepared and confident in their decisions.
General health information dominates, but nearly 1 in 5 conversations involve personal symptom assessment or condition management.
When People Ask
Conversations change over the course of the day. While emotions and wellbeing represent a relatively small share of health queries overall, their proportion rises as the day goes on – from 3.4% of all health queries in the morning and daytime to 4.3% in the evening and 5.2% at night. We also found a nocturnal increase in questions related to understanding medical symptoms, suggesting that people turn to AI when they cannot easily reach a clinician, a pharmacist, or even friends and family.
Personal health topics rise steadily through the evening and into the night, while research and academic queries fall away.
Who People Ask For
Our users are asking for others, not just themselves. Across symptom and condition management questions, 1 in 7 conversations are on behalf of someone else. These queries often involve children’s wellbeing, aging parents’ medications, or a partner’s test results.
Growing numbers of people find themselves raising children, supporting aging parents, and managing others’ health decisions at once. This “sandwich generation” goes online to answer concerns, coordinate care, and prepare questions when time and access are limited. Proxy use changes the nature of queries – more requests involve summarizing histories, comparing treatment options, or translating clinical language for non‑medical caregivers. All of which requires clearer guidance around consent, privacy, and clear direction on escalation paths.
Mobile is where most personal health conversations happen. Symptom questions and emotional wellbeing queries are far more common on phones, while desktop skews heavily toward research and academic work.
Where People Ask
Depending on the device, people use Copilot very differently. On mobile, people ask about symptoms and condition management at twice the rate they do on desktop. Emotional wellbeing conversations are 75% more common. Mobile is where the most personal and immediate health conversations happen.
Desktop use, by contrast, skews toward work-adjacent tasks like health research and academic work (3x more common) likely reflecting more professional use by students, researchers and clinicians.
Most symptom conversations are about users themselves, but one in seven are on behalf of someone else.
Why This Matters and How We Are Responding
As existing models of healthcare delivery struggle to keep pace with demand, more people are turning online and increasingly to AI. Until recently, many people relied on internet search for navigating health questions. The problem is this can offer limited help in distinguishing between simple explanations and alarming possibilities. With growing pressure on healthcare services, we believe people need better tools to make sense of health information when access is challenging.
Generative AI can step in to help. It delivers more tailored responses to user queries, asks specific follow up questions, and guides people towards a recommended next best action whatever the time of day. Done right, this has the potential to expand timely access to reliable guidance and make a difference at a time of need.
Across Microsoft AI’s consumer products, including Bing and Copilot, we already handle over 50 million health questions daily. We take this responsibility seriously. In November 2024, we formed a dedicated consumer health team to focus on areas that address users’ most pressing health questions including:
Credible health information
Copilot’s health answers are anchored upon thousands of credible sources, identified using principles independently published by the National Academy of Medicine. We provide clear citations for where information comes from with single-click links out to source material. Alongside generative responses we also surface expert‑written answer cards in partnership with respected organizations including Harvard Health.
Care navigation
In the US, Copilot now connects to real‑time provider directories, so users can find high quality providers by specialty, location, and personal preferences. Equipped with this information, users can book appointments and continue their health journeys. We’re actively working to expand this service globally.
Our usage research supports the importance of these areas. Getting the answer right really matters when it comes to your health and wellbeing. It’s why the Microsoft AI Health team is working to deliver richer clinical context and stronger clinical reasoning into conversations that will deepen our ability to give clear, relevant, and safer answers. Richer context means Copilot can understand patterns and explain what might be going on rather than responding in isolation. Stronger reasoning means Copilot can break down complex questions step by step, highlight what’s important, and help people prepare for more productive conversations with clinicians.
AI must deliver for health. We will keep working to ensure that it does.
Copilot is not intended to diagnose, treat, or prevent diseases or other conditions and is not a substitute for professional medical advice.
Making Sense
of the Unknown
Support After
a Diagnosis
Clarity in
the Wait
Jacqueline, Sumbal, and Dylan describe how Copilot has helped them. Please note that they were not part of the study.
The Path to Medical Superintelligence
The Path to Medical Superintelligence
Benchmarked against real-world case records published each week in the New England Journal of Medicine, we show that the Microsoft AI Diagnostic Orchestrator (MAI-DxO) correctly diagnoses up to 85% of NEJM case proceedings, a rate more than four times higher than a group of experienced physicians. MAI-DxO also gets to the correct diagnosis more cost-effectively than physicians.
As demand for healthcare continues to grow, costs are rising at an unsustainable pace, and billions of people face multiple barriers to better health – including inaccurate and delayed diagnoses. Increasingly, people are turning to digital tools for medical advice and support. Across Microsoft’s AI consumer products like Bing and Copilot, we see over 50 million health-related sessions every day. From a first-time knee-pain query to a late-night search for an urgent-care clinic, search engines and AI companions are quickly becoming the new front line in healthcare.
We want to do more to help -and believe generative AI can be transformational. That’s why, at the end of 2024, we launched a dedicated consumer health effort at Microsoft AI, led by clinicians, designers, engineers, and AI scientists. This effort complements Microsoft’s broader health initiatives and builds on our longstanding commitment to partnership and innovation. Existing solutions include RAD-DINO which helps accelerate and improve radiology workflows and Microsoft Dragon Copilot, our pioneering voice-first AI assistant for clinicians.
For AI to make a difference, clinicians and patients alike must be able to trust its performance. That’s where our new benchmarks and AI orchestrator come in.
Medical Case Challenges and Benchmarks
To practice medicine in the United States, physicians need to pass the United States Medical Licensing Examination (USMLE), a rigorous and standardized assessment of clinical knowledge and decision making. USMLE questions were among the earliest benchmarks used to evaluate AI systems in medicine, offering a structured way to compare model performance – both against each other and against human clinicians.
In just three years, generative AI has advanced to the point of scoring near-perfect scores on the USMLE and similar exams. But these tests primarily rely on multiple-choice questions, which favor memorization over deep understanding. By reducing medicine to one-shot answers on multiple-choice questions, such benchmarks overstate the apparent competence of AI systems and obscure their limitations.
At Microsoft AI, we’re working to advance and evaluate clinical reasoning capabilities. To move beyond the limitations of multiple-choice questions, we’ve focused on sequential diagnosis, a cornerstone of real-world medical decision making. In this process, a clinician begins with an initial patient presentation and then iteratively selects questions and diagnostic tests to arrive at a final diagnosis. For example, a patient presenting with cough and fever may lead the clinician to order and review blood tests and a chest X-ray before they feel confident about diagnosing pneumonia.
Each week, the New England Journal of Medicine (NEJM) – one of the world’s leading medical journals – publishes a Case Record of the Massachusetts General Hospital, presenting a patient’s care journey in a detailed, narrative format. These cases are among the most diagnostically complex and intellectually demanding in clinical medicine, often requiring multiple specialists and diagnostic tests to reach a definitive diagnosis.
How does AI perform? To answer this, we created interactive case challenges drawn from the NEJM case series – what we call the Sequential Diagnosis Benchmark (SD Bench). This benchmark transforms 304 recent NEJM cases into stepwise diagnostic encounters where models – or human physicians – can iteratively ask questions and order tests. As new information becomes available, the model or clinician updates their reasoning, gradually narrowing toward a final diagnosis. This diagnosis can then be compared to the gold-standard outcome published in the NEJM.
Each requested investigation also incurs a (virtual) cost, reflecting real-world healthcare expenditures. This allows us to evaluate performance across two key dimensions: diagnostic accuracy and resource expenditure. You can watch how an AI system progresses through one of these challenges in this short video.
Getting to a Correct Diagnosis
We evaluated a comprehensive suite of frontier generative AI models against the 304 NEJM cases. The foundation models tested included GPT, Llama, Claude, Gemini, Grok, and DeepSeek.
Beyond baseline benchmarking, we also developed the Microsoft AI Diagnostic Orchestrator (MAI-DxO), a system designed to emulate a virtual panel of physicians with diverse diagnostic approaches collaborating to solve diagnostic cases. We believe that orchestrating multiple language models will be critical to managing complex clinical workflows. Orchestrators can integrate diverse data sources more effectively than individual models, while also enhancing safety, transparency, and adaptability in response to evolving medical needs. This model-agnostic approach promotes auditability and resilience, key attributes in high-stakes, fast-evolving clinical environments.
Fig 1.
The MAI-Dx Orchestrator turns any language model into a virtual panel of clinicians: it can ask follow-up questions, order tests, or deliver a diagnosis, then run a cost check and verify its own reasoning before deciding whether to proceed.
MAI-DxO boosted the diagnostic performance of every model we tested. The best performing setup was MAI-DxO paired with OpenAI’s o3, which correctly solved 85.5% of the NEJM benchmark cases. For comparison, we also evaluated 21 practicing physicians from the US and UK, each with 5-20 years of clinical experience. On the same tasks, these experts achieved a mean accuracy of 20% across completed cases.
MAI-DxO is configurable, enabling it to operate within defined cost constraints. This allows for explicit exploration of the cost-value trade-offs inherent in diagnostic decision making. Without such constraints, an AI system might otherwise default to ordering every possible test – regardless of cost, patient discomfort, or delays in care. Importantly, we found that MAI-DxO delivered both higher diagnostic accuracy and lower overall testing costs than physicians or any individual foundation model tested.
Comparison of AI powered diagnostic agents by accuracy and average diagnostic test cost per case. Top performing agents appear toward the top left quadrant, reflecting higher accuracy and lower cost. The lower dotted line represents the performance range of the best individual foundation models. The purple line traces the performance of MAI-DxO across different configurations. The red cross indicates the average performance of 21 practicing physicians.
What’s Next?
Physicians are typically characterized by the breadth or depth of their expertise. Generalists, like family physicians, manage a wide array of conditions across ages and organ systems. Specialists, such as rheumatologists, focus deeply on a single system, disease area or even condition. No single physician, however, can span the full complexity of the NEJM case series. AI, on the other hand, doesn’t face this trade-off. It can blend both breadth and depth of expertise, demonstrating clinical reasoning capabilities that, across many aspects of clinical reasoning, exceed those of any individual physician.
This kind of reasoning has the potential to reshape healthcare. AI could empower patients to self-manage routine aspects of care and equip clinicians with advanced decision support for complex cases. Our findings also suggest that AI reduce unnecessary healthcare costs. U.S. health spending is nearing 20% of US GDP, with up to 25% of that estimated to be wasted – per having little influence on patient outcomes.
Of course, our research has important limitations. Although MAI-DxO excels at tackling the most complex diagnostic challenges, further testing is needed to assess its performance on more common, everyday presentations. Clinicians in our study worked without access to colleagues, textbooks, or even generative AI, which may feature in their normal clinical practice. This was done to enable a fair comparison to raw human performance.
A novel aspect of this work is its attention to cost. While real-world health costs vary across geographies and systems, and include many downstream factors that we don’t account for, we apply a consistent methodology across all agents and physicians evaluated to help quantify high level trade-offs between diagnostic accuracy and resource use.
For us, this is just the first step. We’re energized by the opportunities ahead. Important challenges remain before generative AI can be safely and responsibly deployed across healthcare. We need evidence drawn from real clinical environments, alongside appropriate governance and regulatory frameworks to ensure reliability, safety, and efficacy. That’s why we’re partnering with leading health organizations to rigorously test and validate these approaches—an essential step before any broader roll out.
Together with our partners, we strongly believe that the future of healthcare will be shaped by augmenting human expertise and empathy with the power of machine intelligence. We are excited to take the next steps in making that vision a reality.
Further information
SD Bench and MAI-DxO are research demonstrations only and are not currently available as public benchmarks or orchestrators. You can find more detail on the underlying methodology and results in a pre-print paper published alongside this blog. We are in the process of submitting this work for external peer review and are actively working with partners to explore the potential to release SDBench as a public benchmark.
Acknowledgments
We are grateful to NEJM Group for permission to use the NEJM cases in the research reported in this blog post. The research described here has benefited from the insights of many people. We are grateful to the authors named on the arXiv paper and the wider team at MAI. We also thank further colleagues both inside and outside of Microsoft for sharing their insights including Bryan Bunning, Nando de Freitas, Andrija Milicevic, Hoifung Poon, David Rhew, Karén Simonyan, Eric Topol, and Jim Weinstein. Gianluca Fontana and Kevin
Hawkins (Prova Health) provided support on the health economics and outcomes section.
Q&A
Is this AI safe to use for healthcare?
The work presented here is not yet approved for clinical use and would only be approved after rigorous safety testing, clinical validation, and regulatory reviews. For now, this represents exciting initial research. At the heart of any plans to deploy this technology in the real world is our commitment to safety, trust, and quality ensuring that any healthcare solutions are clinically grounded, ethically designed, and transparently communicated.
Will AI replace doctors?
While AI is becoming a powerful tool in healthcare, our team of practicing clinicians believes AI represents a complement to doctors and other health professionals. While this technology is advancing rapidly, their clinical roles are much broader than simply making a diagnosis. They need to navigate ambiguity and build trust with patients and their families in a way that AI isn’t set up to do. Clinical roles will, we believe, evolve with AI giving clinicians the ability to automate routine tasks, identify diseases earlier, personalize treatment plans, and potentially prevent some diseases altogether. For consumers, they will provide better tools for self-management and shared decision making.
What is an AI orchestrator?
In the context of generative AI, an orchestrator is like a digital conductor helping to coordinate multiple steps in achieving a complex task. In healthcare, the role of orchestration is crucial given the high stakes of each decision. Our orchestrator sits above underlying language models making sure each point in getting a diagnosis is handled systematically, reducing the risk in future of errors and offering the necessary stability, consistency and transparency to ultimately build trust from users.
Why have you looked at costs?
We initially wanted to understand whether the AI was simply requesting excessive diagnostic workups to reach the right diagnosis. What we found was that our Orchestrator was able to reach the correct answer with much less money spent on testing. In some ways this is not a surprise as diagnostic over-testing is recognized as being a widespread challenge, accounting for millions of unnecessary tests annually in the US. This work suggests AI creates an opportunity for clinicians – and consumers – to reach a faster, more accurate diagnosis while reducing costs.
Build the Future With Us
Build the Future With Us
We’re a lean, fast-moving lab made up of some of the world’s most talented minds. We have an exciting roadmap of compute at MAI, with our next-generation GB200 cluster now operational. And we have an ambitious mission we truly believe in. We’re also fortunate to partner with incredible product teams giving our models the chance to reach billions of users and create immense positive impact. If you’re a brilliant, highly-ambitious and low ego individual, you’ll fit right in—come and join us as we work on our next generation of models!