by Professor Shafi Ahmed
Surgeon | Futurist | Innovator | Entrepreneur | Humanitarian | Intnl Keynote Speaker
"The moment three of the world's most powerful technology companies launch dedicated healthcare AI platforms within weeks of each other, we are no longer watching an experiment. We are witnessing a revolution, and the responsibility that comes with it falls on all of us."
A Personal Experience: Testing ChatGPT Health in Practice
I recently had the opportunity to test ChatGPT Health in a real world clinical context, and the experience was both illuminating and, in important respects, reassuring. I obtained early access when it launched and used the platform to synthesise my own medical data, and it produced a concise, actionable patient summary from uploaded clinical documents, including consultation summaries, lab reports, imaging PDFs, and scan reports. The “assistant” automatically analysed the embedded text, applied optical character recognition where needed, extracted test names, values, and imaging impressions, and translated technical findings into plain English.
The workflow moved through a clear and logical sequence: ingested the documents,analysed embedded text and apply OCR where needed, extracted key clinical information including test names, values and imaging impressions, synthesised that information into a coherent clinical picture, prioritised the most actionable findings, and produce formatted outputs. The deliverables were overall practical and useful: it produced a one page clinical summary in plain English as an aide memoire and for my own doctor, a table showing baseline to latest trends, and a prioritised action plan covering labs to repeat, referrals to consider, and lifestyle goals. For a clinician managing a complex patient with multiple documents spread across systems, this kind of structured synthesis is genuinely time saving.
What impressed me most was not the technical capability itself, but the design philosophy behind it. The system was transparent about its own uncertainty, flagging where OCR might have introduced errors. It consistently framed all outputs as non-diagnostic, explicitly directing the user to seek a clinician's review before making any clinical decisions. The fail-safe fallback position.
The limits were equally instructive. OCR errors on scanned documents introduced uncertainty that required human review. High stakes interpretations, such as imaging impressions and abnormal lab results, demanded clinician validation before any action could be taken. And the privacy workflow, while thoughtfully designed within the tool, placed significant responsibility on the user: documented patient consent, careful data minimisation, and de-identification of any outputs intended for sharing were not automated, but required deliberate human judgment at every step. In other words, the tool was only as safe as the person using it.
This single experience encapsulates both the genuine promise and the honest limits of where consumer health AI currently stands. ChatGPT Health certainly did not and would never claim to replace clinical judgment. But it did something that matters enormously: it gave a clinician more time to think and a patient a clearer picture of my own health by quickly ingesting vast amounts of data and providing prompts and insights. That, done well and done safely, is no small feat.
Introduction: A Tectonic Shift in Healthcare AI
January 2026 will be remembered as the month the landscape of AI in healthcare changed permanently. Within days of each other, OpenAI launched ChatGPT Health, and Anthropic unveiled Claude for Healthcare and Life Sciences. By March 2026, Microsoft had followed with Copilot Health. Three of the most powerful technology companies on the planet, each with a different philosophy, strategy, and relationship with medicine, had simultaneously decided that healthcare was their next great frontier.
This is not a coincidence. It is convergence: the product of an unstoppable force meeting a system desperately in need of transformation. As a practising clinician, I believe this moment deserves more than headlines. It demands careful, critical analysis. What are these platforms offering? Where do they agree? Where do they diverge? What does the emerging research tell us? And most importantly, what does this mean for patients, clinicians, and the future of care?
In this edition of AI Horizons, I will examine all three platforms in depth, compare them across the dimensions that matter most, and share my own perspective as both a clinician and an advocate for responsible innovation.
The Race to Become Healthcare's AI Front Door
The scale of the opportunity is staggering. OpenAI reports that over 230 million people ask health and wellness questions on ChatGPT every week, that is roughly the combined populations of the UK, France and Germany turning to a chatbot for medical guidance, every single week. Microsoft's own analysis of more than 500,000 de-identified Copilot conversations confirmed a similar pattern: health is already one of the most-searched and discussed topics on its platform. And according to a 2025 American Medical Association survey, roughly 66% of physicians were already using AI tools in some form in their practice.
The message is clear: AI in healthcare is not a future phenomenon. It is already happening, at an enormous scale, with or without the guardrails that formal medical platforms provide. OpenAI, Microsoft, and Anthropic are not creating demand; they are responding to it and trying to shape it responsibly. Whether they will succeed is the defining question of the next decade.
OpenAI: Scale, Speed, and the Consumer First Vision
ChatGPT Health & ChatGPT for Healthcare
OpenAI moved first, launching ChatGPT Health on January 7, 2026, a dedicated space within the ChatGPT platform that allows users to connect their medical records, wearable data, and wellness apps for personalised, contextualised health conversations. The sheer ambition of the product is not in doubt. Over two years, OpenAI worked with more than 260 physicians across 60 countries and dozens of specialities, who reviewed model outputs over 600,000 times across 30 clinical focus areas. Their flagship benchmark, HealthBench, built in partnership with clinicians, was explicitly designed to move beyond exam style testing toward evaluating open ended clinical tasks: triage decisions, differential diagnosis, patient communication, and the critical ability to recognise when to escalate to a human clinician.
The data infrastructure behind ChatGPT Health is substantial. Through b.well Connected Health, users can access a network spanning more than 2.2 million providers, 320 health plans, and lab sources, all built on FHIR-based APIs. Integrations include Apple Health, Function, MyFitnessPal, Weight Watchers, AllTrails, Instacart, and Peloton, a suite that speaks to a vision of health as something woven through every aspect of daily life, not confined to hospital visits.
Simultaneously, OpenAI launched ChatGPT for Healthcare, an enterprise grade product powered by GPT5 models, designed for hospitals, clinics, and health systems. Major institutions, including AdventHealth, HCA Healthcare, Boston Children's Hospital, Cedars-Sinai, and Stanford Medicine's Children's Health, were among the early adopters. The enterprise product supports HIPAA compliance, role based access controls, prior authorisation assistance, ambient documentation, and integration with Microsoft SharePoint. The breadth of this rollout signals that OpenAI is not positioning itself as a niche tool; it is positioning itself as the infrastructure layer for AI across the entire healthcare enterprise.
What I find most compelling, and most revealing, about OpenAI's approach is the framing. Sam Altman has publicly described healthcare as 'maybe the area where there's the strongest improvement' from GPT5. This is an extraordinary claim, and one that carries extraordinary responsibility.
Anthropic: Safety First, Clinically Grounded, EnterpriseReady
Claude for Healthcare & Life Sciences
Anthropic took a different path. Launching Claude for Healthcare on January 11, 2026, Anthropic's approach is characterised by a more deliberate, enterprise first philosophy, one rooted in the principles of Constitutional AI, safety by design, and deep clinical integration.
The technical credentials are impressive. Claude Opus 4.5, Anthropic's flagship model, leads on medical benchmarks, including MedCalc and MedAgentBench, with particular strength in extended reasoning tasks and a measurably improved track record in reducing hallucinations compared to earlier models. For healthcare organisations, the product offers HIPAA-ready tools, connectors to the Centres for Medicare and Medicaid Services (CMS) Coverage Database, ICD10 coding, the National Provider Identifier Registry, and access to PubMed's 35 million biomedical papers. For life sciences, connectors extend to Medidata, ClinicalTrials.gov, bioRxiv, medRxiv, Open Targets, and ChEMBL, spanning the entire drug development pipeline from hypothesis to regulatory submission.
The partner ecosystem speaks volumes about Anthropic's positioning. Novo Nordisk, Sanofi, Genmab, Viz.ai, Flatiron Health, Commure, and Carta Healthcare are among those who have publicly endorsed Claude, and critically, many of these endorsements centre not just on performance but on trust. As Banner Health's Chief Technology Officer noted, the attraction was Anthropic's focus on AI safety and its Constitutional AI approach. For Commure, which scales to tens of millions of appointments, Claude's precision in clinical documentation was described as 'the prerequisite for trust.' These are not merely marketing relationships — they are signals that safety conscious healthcare organisations see Anthropic as a fundamentally different kind of partner.
On the consumer side, Claude Pro and Max subscribers can now connect to Apple Health, Android Health Connect, HealthEx, and Function to receive personalised health summaries, plain language explanations of test results, and appointment-preparation support. The design ethos is explicit: Claude is built to acknowledge uncertainty, direct users to clinicians for personalised guidance, and avoid overreach, qualities that are as much a philosophical commitment as a technical feature.
The Life Sciences expansion is, in my view, where Anthropic may have its most transformative long term impact. The ability to draft clinical trial protocols that account for FDA and NIH requirements, to track trial enrollment in real time via Medidata, to identify gaps in regulatory submissions, and to analyse spatial biology data through Owkin's Pathology Explorer, these are capabilities that could meaningfully accelerate the journey from laboratory discovery to patient benefit.
Microsoft: The Ecosystem Play and the Quest for Medical Superintelligence
Copilot Health
Microsoft arrived last to the consumer health AI party, but arrived with formidable resources and a distinct strategic vision. Launched on March 12, 2026, Copilot Health is a dedicated, secure space within Microsoft's Copilot platform that integrates wearable data from over 50 devices, health records from more than 50,000 US hospitals via HealthEx, and lab results from Function. Responses are grounded in verified information from credible health organisations across 50 countries, with expert written answer cards from Harvard Health providing an additional layer of clinical credibility.
Microsoft's safety architecture is notable: Copilot Health has achieved ISO/IEC 42001 certification, the world's first standard for AI management systems, meaning that an independent third party has verified how the company builds, governs, and continuously improves the AI behind the service. An external panel of over 230 physicians from 24 countries contributes ongoing clinical review. The company's consumer health questions already exceed 50 million per day across its platforms, a scale that gives Microsoft unique insight into the breadth and depth of what people are actually asking about their health.
What distinguishes Microsoft's long term vision most strikingly is the explicit invocation of medical superintelligence. The MAI Diagnostic Orchestrator (MAIDxO), Microsoft's AI diagnostic system, has already demonstrated 85.5% diagnostic accuracy on complex NEJM cases, compared to under 20% for experienced physicians in the same sequential diagnostic simulation. The path toward what Microsoft calls 'health AI that can ultimately combine the wide ranging knowledge of a general physician with the depth of a specialist' is clearly being laid. Copilot Health, in this light, is not merely a consumer product, it is the public facing expression of a much deeper research ambition.
Microsoft also benefits from an advantage neither OpenAI nor Anthropic can easily replicate: deep integration with existing hospital infrastructure. Microsoft's relationships with health systems through Azure, Teams, and the Microsoft 365 ecosystem give Copilot Health a natural pathway into clinical workflows that consumer first or API first competitors will need to build from scratch. But it is Microsoft's research ambition, more than its infrastructure, that invites the deepest reflection.
The philosophical and ethical dimensions of this ambition deserve serious attention. In my recent essay, The Physician's Replacement: Ethics and Philosophy in the Age of Medical Superintelligence, I explored what it truly means when AI systems begin to exceed human diagnostic capability. That essay was prompted by Daniel Nadler of OpenEvidence, who at the 2026 J.P. Morgan Healthcare Conference articulated a vision of networks of specialised AI agents, each functioning as a clinical subspecialist. Microsoft's MAI DxO is, in many respects, the most concrete realisation of that vision to date.
The epistemological challenge this raises is profound. Medical knowledge today doubles every three months. No individual clinician, however brilliant, can hold even a fraction of it in working memory. An AI system like MAI DxO that can maintain perfect recall of every clinical trial, recognise patterns across billions of patient encounters, and update its knowledge in near real time represents not merely a quantitative improvement over human physicians, but a qualitative transformation in the nature of medical expertise itself. As I wrote in my MSI essay, this is not just a better doctor. It is a different kind of knowing.
But with that power come the ethical questions I raised in that same essay, and which Microsoft must now confront in practice rather than theory. The black box problem is not merely a technical inconvenience: if we cannot understand why MAI DxO recommends a particular treatment, how can patients meaningfully consent to it? How can we audit it for bias? The Enlightenment ideal of rational, explicable medical reasoning gives way to an oracular model where we trust the pronouncements of an intelligence we cannot fully comprehend.
There is also the justice question. If MAI DxO or its successors become the standard of diagnostic care, access to them cannot be a luxury. As I argued in my MSI essay, there is a real risk of a two tier medicine emerging: urban academic centres deploying superintelligent diagnostic tools, while rural hospitals and low resource settings make do with older, less capable systems. Microsoft, with its global reach and its stated commitment to building for all, is better positioned than most to address this, but the commitment must be structural, not aspirational.
Finally, and perhaps most importantly, there is the question of what medicine is ultimately for. MAI DxO may one day surpass every physician alive in diagnostic accuracy. But the covenant of care, the relationship of trust and vulnerability between healer and patient, the human capacity to bear witness to suffering and guide someone through the impossible decisions of serious illness, these are not optimisation problems. They are the irreducibly human core of medicine. Microsoft's most important challenge, as it pursues medical superintelligence, is to ensure that the pursuit of diagnostic excellence never becomes an excuse to forget this.
Comparing the Three: Where They Converge and Where They Diverge
What They Share
Despite their differences, OpenAI, Anthropic, and Microsoft share a remarkable degree of consensus on the foundational principles of responsible healthcare AI. All three platforms are explicitly not intended to replace clinical advice, diagnosis, or treatment, a statement that is simultaneously a legal necessity, a safety commitment, and, frankly, a meaningful acknowledgement of humility before the complexity of medicine. All three maintain separate, isolated spaces for health data. All three pledge that health conversations will not be used to train their foundational models. And all three have invested substantially in physician oversight, though the scale and structure of that involvement differ considerably.
Where They Differ
The differences, however, are as instructive as the similarities.
OpenAI is the most consumer facing, with the largest existing user base and the most aggressive integration with everyday wellness apps. Its strength is breadth, the ability to meet hundreds of millions of people where they already are and gently elevate the quality of health conversations they are already having. Its risk is equally its strength: at that scale, even small error rates translate into enormous numbers of potentially harmful interactions.
Anthropic is the most clinically grounded, with the strongest focus on enterprise healthcare and life sciences infrastructure. Its constitutional AI approach, where safety, honesty, and avoiding harm are baked into the model's core values rather than layered on top, gives it a philosophical foundation that resonates particularly with safety critical healthcare settings. Its context window is the largest of the three, enabling whole document analysis of clinical guidelines, regulatory submissions, and trial protocols that shorter context models cannot handle reliably. Its current limitation is reach: Claude for Healthcare requires deliberate organisational adoption and does not yet have the mass market visibility of ChatGPT.
Microsoft occupies a uniquely powerful middle position. It has the enterprise relationships, the regulatory credibility (ISO/IEC 42001), and the existing infrastructure integration that gives it a natural path into hospital systems. Its consumer product, Copilot Health, is more conservative in its integrations than ChatGPT Health but more substantiated in its clinical review process. Its research ambition, medical superintelligence via MAIDxO, may ultimately be the most consequential of all three, if it translates from research environments into real clinical deployment.
What Does the Research Tell Us?
The academic literature on these platforms is both inspiring and sobering, and clinicians must engage with it honestly.
The most encouraging evidence comes from Microsoft's own MAIDxO research, published in early 2026. In a study using 304 complex cases drawn from the New England Journal of Medicine, the MAI Diagnostic Orchestrator achieved diagnostic accuracy of up to 85.5%, more than four times higher than experienced physicians, while reducing diagnostic costs by over 50%. Separately, the AMIE conversational diagnostic system, in a randomised, double blind OSCE style trial across 159 simulated cases in Canada, the UK, and India, outperformed primary care physicians on 30 of 32 clinically meaningful axes, including diagnostic accuracy, empathy, and communication.
OpenAI's HealthBench, built with 262 physicians across 60 countries on 5,000 realistic clinical scenarios, represents a meaningful advance over multiple choice medical exam testing, specifically designed to evaluate safety, appropriate escalation, and clinical judgement in open ended tasks. A study published in NEJM AI found that Therabot, a generative AI chatbot for mental health, demonstrated clinically significant reductions in depressive and anxiety symptoms in an RCT, with therapeutic alliance ratings comparable to human therapists.
However, the picture is not uniformly positive. A stress test of ChatGPT Health's triage capabilities, published in Nature Medicine, is particularly important reading. Using 60 clinician authored vignettes across 21 clinical domains, researchers found that the system under triaged 52% of gold standard emergencies, directing patients with conditions such as diabetic ketoacidosis and impending respiratory failure to 24/48 hour evaluation rather than the emergency department. When family members or friends minimised symptoms, the system's recommendations shifted significantly, suggesting vulnerability to anchoring bias. The study also found inconsistent activation of suicide crisis safeguards. These findings do not invalidate the platform, but they do mandate vigilance, especially at consumer scale.
A broader systematic review across 83 studies in npj Digital Medicine found that while generative AI models, including GPT4, Claude 3, and Gemini, perform comparably or slightly better than nonexpert physicians in diagnostic tasks, expert clinicians still significantly outperform AI. The Lancet Digital Health's STANDING Together consensus, developed by over 350 global experts, further reminds us that AI models trained predominantly on majority European datasets systematically underperform for underserved populations, a challenge that all three platforms must urgently address as they scale globally.
The Privacy and Regulatory Question
All three platforms face the same fundamental tension: health data is the most sensitive data that exists, and yet the most powerful AI requires access to it to provide genuinely personalised support.
OpenAI's Nate Gross acknowledged at the ChatGPT Health launch that, for consumer products, HIPAA does not apply, a statement that is technically accurate but will surprise many users who assume that health specific apps carry clinical grade privacy protections. Like all cloud stored data, information in ChatGPT Health could theoretically be obtained through legal processes such as subpoenas. OpenAI's assurance that health data will not be used for model training is important but remains, as I noted in my earlier analysis of this launch, a pledge that must be verified by ongoing independent audit rather than simply accepted at face value.
Anthropic's HIPAA ready enterprise products, designed specifically for provider and payer organisations, sit within a more established regulatory framework. The explicit opt-in design, with users controlling exactly what data Claude can access and disconnecting at any time, represents a more conservative and arguably more trustworthy approach to consent. Anthropic's Claude.ai consumer health integrations are similarly designed with user control at their core.
Microsoft's ISO/IEC 42001 certification provides perhaps the most rigorous independent verification of any of the three platforms, though certification covers governance processes rather than clinical outcomes and should not be confused with regulatory approval as a medical device. In the UK, it is worth noting that none of these platforms is currently registered as an MHRA medical device, and in the US, none carries FDA clearance for clinical decision support. As regulatory frameworks evolve, the EU AI Act, MHRA reforms, and FDA guidance on AI/ML medical devices are all advancing rapidly, the compliance requirements for these platforms will almost certainly increase.
My Perspective: A Surgeon's View from the Frontier
I have spent thirty years practising surgery and a decade advocating for the responsible integration of technology into medicine. I have seen technologies that were greeted with scepticism become indispensable, and others that were greeted with euphoria quietly fade. My view on these three platforms is neither uncritically enthusiastic nor reflexively dismissive, it is grounded in what I know medicine actually requires.
What excites me most about this convergence is not any individual feature. It is the signal it sends about where we are in the journey. When three of the world's most capable technology companies, each with a sophisticated understanding of risk, liability, and regulation, make simultaneous, significant bets on healthcare AI, they are telling us something important: the technology may be now good enough to be genuinely useful, and certainly the unmet need is large enough to justify the risk.
The use cases that I believe are most immediately valuable are not the most dramatic. They are the mundane but critical: helping a patient understand their blood test results before an appointment; automating prior authorisation so a patient gets the care they need three days sooner; helping a clinician in a rural clinic access evidence that was previously locked behind specialist consultations. These are not the stories of AI replacing doctors. They are stories of AI finally fulfilling a promise that has been made, and broken, many times before.
However, what concerns me is the gap between research environments and real world deployment. The Nature Medicine triage stress test should be required reading for anyone deploying ChatGPT Health at scale. The STANDING Together recommendations on algorithmic bias should inform every decision about training datasets across all three platforms. And the question of who bears legal and moral responsibility when AI generated health guidance contributes to patient harm, a question that is as unresolved today as it was three years ago, must be answered before these platforms reach their full scale.
I am also struck by what is absent from all three launches: a serious, sustained engagement with the healthcare systems of low and middle income countries. The combined health burden of sub Saharan Africa, South Asia, and Latin America dwarfs that of the US market these platforms are primarily designed to serve. If the true ambition of healthcare AI is to democratise access to medical expertise, and all three companies say it is, then the communities that stand to benefit most must be designed for from the beginning, not retrofitted after the fact.
Looking Forward: What 2026 Holds
The arrival of OpenAI, Microsoft, and Anthropic in consumer and enterprise healthcare AI simultaneously creates a competitive dynamic that will accelerate innovation in ways that individual company roadmaps alone never could. We can expect rapid iteration on safety benchmarks, increasingly sophisticated integration with clinical workflows, growing regulatory scrutiny, and, I hope, growing attention to equity and global access.
The question I ultimately return to is not which platform will win. It is whether the platform that wins will make medicine better, not just more efficient, not just more scalable, but genuinely more human. The evidence that AI can match or exceed human performance in specific diagnostic tasks is now compelling. The evidence that AI can replace the covenant of care between healer and patient is, and I believe will remain, absent.
What I hope for, and what I will continue to advocate for, is a future in which OpenAI, Microsoft, Anthropic, and the healthcare systems they serve hold each other to the highest possible standards: of accuracy, of equity, of transparency, and of humility before the extraordinary complexity of human suffering and human healing.
That future is nearer than it has ever been. But it is not yet guaranteed.
Final Thoughts
The arrival of three of the world's most powerful AI companies in healthcare within weeks of one another is a watershed moment. OpenAI brings scale, reach, and consumer trust. Anthropic brings clinical rigour, enterprise infrastructure, and a safety philosophy that is genuinely distinctive. Microsoft brings ecosystem integration, regulatory credibility, and a research ambition, medical superintelligence, that may ultimately reshape medicine more profoundly than any consumer product.
Each platform reflects a different answer to the same question: how should AI enter medicine? The answer, I believe, is not a competition between these three visions, but a conversation, one that must include clinicians, patients, regulators, and the communities most in need of better healthcare. We are, as I have often said, not at the end of this journey. We are at the most consequential beginning.


