A study published in the journal Science by researchers at Harvard Medical School and Beth Israel Deaconess Medical Center found that OpenAI’s o1 reasoning model correctly identified the exact or near-diagnosis in 67 percent of real emergency room cases, compared to 55 percent and 50 percent for two attending physicians working the same cases. The study, based on 76 real ER cases from a Boston hospital, is one of the most rigorous comparisons yet between AI diagnostic performance and human physicians using actual clinical data.

The results do not mean AI is ready to replace doctors. The authors were emphatic about that caveat. But the findings add to a rapidly growing body of evidence that AI reasoning models can match or exceed physician performance on text-based diagnostic tasks, raising fundamental questions about how AI will be integrated into medical practice.

Study Design and What Was Tested

The Harvard team tested OpenAI’s o1 model across six separate clinical reasoning tasks that mirror what emergency physicians do during a shift: generating a differential diagnosis (a ranked list of possible conditions), selecting appropriate diagnostic tests, estimating disease probability, and deciding on a treatment plan.

According to Harvard Magazine, the 76 cases were drawn from real emergency room records, not curated textbook scenarios. The AI received only the same electronic health record data available to physicians, with no additional context. Both human physicians and the AI were working blind relative to the case outcomes.

The o1 model’s advantage was most pronounced in the initial diagnosis generation phase: it produced more complete and more accurate differential diagnosis lists, which is the most consequential step because a missed diagnosis at this stage leads to missing the correct workup entirely.

The Numbers

TaskAI (OpenAI o1)Physician 1Physician 2
Correct triage diagnosis67%55%50%
Overall differential diagnosis accuracyMet or exceeded physiciansBaselineBaseline
Test selectionGenerally matched or exceededBaselineBaseline
Treatment plan qualityGenerally matched or exceededBaselineBaseline

Important Caveats the Authors Emphasized

The study’s co-senior author Arjun Manrai told NPR and other outlets: “We’re not saying that AI replaces physicians. We’re saying AI is very good at text-based diagnostic reasoning.” The critical limitation is that emergency medicine is not only text-based. Physicians observe patients directly: checking skin color, listening to breathing sounds, evaluating gait and affect, picking up distress signals that electronic records do not capture.

The 76-case sample is also relatively small for drawing sweeping conclusions. The cases were from a single Boston hospital and reflect that institution’s patient population, disease mix, and documentation practices. Generalization to other health systems, particularly those serving different demographics or lower-resource environments, requires additional study.

According to NPR, the authors described the finding as a ceiling effect: “We’re already at the ceiling” of what text-based diagnostic testing can show, meaning future studies need to test AI in real clinical environments with full sensory information, not just text records.

What This Means for AI in Healthcare

The study adds evidence to a debate that has accelerated significantly in 2025 and 2026 as reasoning models improved substantially over their predecessors. Earlier studies found AI performing well on medical licensing exams (USMLE) and radiology readings. This study pushes further by showing performance advantages on real-world emergency cases with real physician comparators.

The practical implications are more likely to play out as augmentation rather than replacement. AI diagnostic tools can flag conditions a time-pressured physician might overlook, prompt consideration of rare diseases on the differential, and help junior physicians learn from high-quality AI-generated differentials. The model is the AI as a consultant available at all hours, not as a replacement for the clinician at the bedside.

Health systems are now investing heavily in AI-assisted clinical decision support. According to Fortune, the Harvard results accelerated conversations at several major academic medical centers about integrating o-series reasoning models into their clinical workflows within the next 12 to 24 months.

Frequently Asked Questions

Can AI diagnose patients better than doctors?

In text-based diagnostic tasks using electronic health record data, OpenAI’s o1 model correctly identified the right diagnosis in 67 percent of real ER cases compared to 55 and 50 percent for two attending physicians in a Harvard study published in Science in April 2026. The study authors cautioned this does not mean AI replaces doctors: emergency medicine involves direct observation, physical examination, and clinical judgment that text-based AI cannot yet replicate.

What AI model was tested in the Harvard medical study?

The Harvard Medical School and Beth Israel Deaconess Medical Center study tested OpenAI’s o1 model, a reasoning model that uses step-by-step thinking to work through complex problems. The study was published in the journal Science in April 2026 and tested the model against two emergency room attending physicians on 76 real ER cases from a Boston hospital.

Will AI replace emergency room doctors?

Not according to the study’s authors or most medical experts. The Harvard study showed AI advantage in text-based diagnostic reasoning but was explicit about the limitation: emergency physicians observe patients directly, picking up sensory information that electronic records do not capture. The most likely near-term integration is AI as a diagnostic consultant that flags conditions a busy physician might miss, not as a replacement for the clinician at the bedside.

Enjoyed this?

Trust Post Desk

A journalist and editor at TrustPost.org covering world and national news, technology updates and human-interest stories. They check every fact, interview sources in person or online, and aim to deliver clear, accurate reporting. Their work ranges from breaking news to in-depth features and daily newsletters. Outside the newsroom, they follow emerging trends and engage with readers on social media.