Has AI Surpassed Emergency Physicians? The Future of Diagnosis Presented by Harvard Research

Has AI Surpassed Emergency Physicians? The Future of Diagnosis Presented by Harvard Research

Has AI Surpassed Emergency Physicians? The "Future of Diagnosis" Presented by Harvard Research

Faced with a patient brought into the emergency room, making an initial judgment based on limited information.
That judgment can sometimes mean the difference between life and death, with delays of just minutes or seconds.

In this highly tense medical environment, AI has reportedly provided more accurate diagnoses than human doctors—a piece of news that has sent ripples not only through the medical community but also across the tech industry and social media.

As reported by TechCrunch, a new study by a research team from Harvard Medical School and Beth Israel Deaconess Medical Center has been making waves. The study compared OpenAI's inference models "o1" and "GPT-4o" with human doctors across various medical tasks. The most notable comparison was their diagnostic capabilities using actual emergency room cases.

The study involved 76 patients who visited the emergency room at Beth Israel. The research team had two internal medicine physicians and AI models provide diagnoses based on information from electronic medical records. Subsequently, two other physicians evaluated these diagnoses without knowing whether they were made by humans or AI.

The results were shocking. During the initial triage stage, OpenAI's o1 provided "accurate or very close diagnoses" in 67% of cases, whereas the human doctors achieved 55% and 50%, respectively. This means that, at least under the conditions of this study, AI outperformed human doctors in narrowing down diagnoses from limited initial information.

Furthermore, when more information was available, o1's diagnostic accuracy reportedly increased to about 82%, according to media reports. Human doctors also improved their accuracy to 70-79%, but AI demonstrated a level equal to or slightly better than that.

Looking at these numbers alone, one might be tempted to declare, "AI has finally surpassed doctors." Indeed, such interpretations are not uncommon on social media. However, what this study truly reveals is a more complex and realistic picture of medical AI.


AI Excelled in "Clinical Reasoning on Paper"

In this study, AI processed text information recorded in electronic medical records. This includes patient vitals, age, medical history, brief nurse notes, and test information—some of the data doctors use for diagnosis.

Importantly, AI did not observe the patient's complexion, assess their breathing difficulty, or perform an abdominal examination. It did not include the patient's voice tremors, gait, sweating, family reactions, or the atmosphere among medical staff—information that doctors unconsciously gather on-site.

In other words, this AI should be seen as "considering diagnoses from text information recorded in the emergency room," rather than "examining patients in the emergency room."

This point has been strongly emphasized by healthcare professionals and technologists on social media. On Hacker News, comments supporting the notion that "doctors do not diagnose emergency patients by merely looking at notes" gained traction. The tasks evaluated for AI were a slice of real emergency medicine, and concluding that "AI will replace emergency doctors" based solely on these results is premature.

Similarly, on Reddit, reactions highlighted that the comparison was likely with internal medicine doctors, not emergency physicians, and that the primary goal of emergency care is not necessarily to hit the final diagnosis but to avoid missing life-threatening conditions. This is a crucial point.

In emergency medicine, the questions are not only "What is the final diagnosis for this patient?" but also "Is there a life-threatening risk now?" "Does immediate treatment need to be initiated?" "Can the patient be sent home?" "Should they be referred to a specialist?" The ability to accurately name a diagnosis and the ability to safely triage without missing dangerous conditions overlap but are not the same.


The Significance of the "67%" That Cannot Be Ignored

However, it is also dangerous to dismiss the results as mere overestimation.

The initial stages of an emergency room visit often involve limited information. Test results may not yet be available, and patients may struggle to articulate their symptoms. The difficulty increases with elderly patients, those with impaired consciousness, or language barriers.

In such scenarios, if AI can comprehensively read information from electronic medical records and broadly suggest differential diagnoses, it could serve as a useful "second pair of eyes" for doctors.

In a case highlighted by The Guardian, a patient with pulmonary embolism worsened after treatment, leading doctors to suspect ineffective anticoagulants. Meanwhile, AI focused on the patient's history of lupus and suggested the possibility of lung inflammation, which turned out to be correct.

Such cases well illustrate the strengths of medical AI. Human doctors are influenced by limited time, fatigue, the number of patients they handle, recent experiences, and biases from their specialties. AI, on the other hand, does not tire and can quickly read the vast information in electronic records, picking up overlooked medical histories and test value combinations.

Of course, AI also has biases and can make errors. However, if AI makes different types of oversights than humans, the combination of doctors and AI could potentially enhance safety more than a single doctor alone.


From "AI vs. Doctors" to "Doctors + AI"

The essence of this study is not about "AI replacing doctors." Rather, it is closer to the question of "how should doctors use AI?"

The researchers themselves do not claim that AI can independently make life-and-death decisions in emergency rooms. Instead, they emphasize the need for prospective clinical trials to verify safety and efficacy in actual patient care.

A common misunderstanding in discussions about medical AI is evaluating AI's capabilities as a "complete replacement for humans." However, in real medical settings, many AI systems are already used in a supportive capacity, such as in image diagnostics, drug interaction checks, medical record creation support, and risk prediction models.

This study indicates that generative AI and large language models are evolving from mere text generation tools to "tools that support clinical reasoning."

By integrating patient symptoms, medical history, test values, and past medical records, AI can suggest, "Shouldn't we consider this disease?" "Shouldn't we add this test?" "Isn't there a risk with this drug choice?" In such usage, AI can become an assistant that broadens the diagnostic net, rather than a replacement for doctors.

The perspective of Dr. Adam Rodman, as introduced by The Guardian, aligns closely with this direction. The future of medicine may move towards a model involving "doctors, patients, and AI."


Expectations and Cautions Spread on Social Media

 

Reactions on social media have largely fallen into three categories.

The first is the expectation for AI in medicine. On Hacker News, opinions suggested that "in the long run, AI is likely to surpass humans in areas requiring knowledge, experience, and pattern recognition, such as medicine." Just as AI is rapidly demonstrating its power in software development, the view is that specialized models in medicine could evolve to surpass many doctors in various areas.

People holding this view are also paying attention to the realities of the medical field. Doctors are busy, with limited time for each patient. Medical records are voluminous, and specialized knowledge is updated daily. If AI is properly introduced, it could reduce oversights and raise the standard of care in rural or physician-shortage areas.

The second is caution regarding the study design. On Reddit, reactions noted that "the headline is misleading." Criticisms were made regarding the comparison being with "internal medicine doctors" rather than "emergency physicians," the fact that the goal of emergency medicine is not necessarily to hit the final diagnosis, and that AI did not actually examine patients.

On Hacker News, there were also voices saying, "These benchmarks can easily be distorted" and "strong conclusions should not be drawn from a single study." In AI evaluations, there can be unintended hints in datasets or task settings that differ from actual practice. In high-risk areas like medicine, it is necessary to separate victories on benchmarks from safety in clinical settings.

The third is concerns about responsibility and ethics. On Reddit, there were multiple reactions questioning "who would be responsible if a patient died due to an AI diagnostic error." Doctors have medical malpractice, accountability, and professional obligations. However, in the case of AI models, it is unclear where responsibility lies—whether with the development company, the hospital that implemented it, the doctor who used it, or the regulatory authorities.

This issue is one of the biggest barriers to the widespread adoption of medical AI. It's not just about whether AI provides the correct diagnosis, but also about who verifies, explains, and faces the patient when mistakes are made. Without addressing these aspects, it is difficult to fully trust AI in medical settings.


The Boundary Between a "Convenient Second Opinion" and "Dangerous Automation"

Based on this study, the most realistic use of medical AI at present is likely as a "second opinion."

For example, when a doctor is considering a diagnosis, AI can suggest differential candidates. It can highlight diseases that are easy to overlook, severe but infrequent diseases, and risks that emerge from medication history or medical history. The doctor can then make the final decision based on their examination, tests, and dialogue with the patient.

In this form, AI's strengths can be utilized while retaining human responsibility and judgment. Especially in environments like emergency rooms, where time is limited and the pressure to make decisions is high, AI could potentially reduce "thinking omissions."

The danger, however, lies in automating the acceptance of AI's suggestions as they are. If AI confidently presents an incorrect diagnosis, there is a risk that humans may be swayed by it. Expert comments introduced by The Guardian also pointed out the risk of doctors unconsciously following AI's answers.

This is akin to the problem known as "automation bias." Humans tend to perceive answers provided by computers as objective and accurate. In busy settings, the reassurance of "AI says so too" can weaken independent thinking.

Therefore, when introducing medical AI, it is not enough to simply place a highly accurate model. It is necessary to design how doctors will verify AI's suggestions, detect AI's errors, explain them to patients, and record them.


Benefits and Risks for Patients

From the patient's perspective, there is great expectation for AI diagnostic support.

There might be fewer oversights by doctors. The discovery of rare diseases might be expedited. Even in small rural hospitals, knowledge support comparable to large hospitals might be available. AI might suggest important candidates even in situations where specialists are not immediately available, such as at night or on holidays.

On the other hand, there is also an increased risk of patients inputting symptoms directly into AI and making self-judgments without a doctor's examination. This study compared AI within the context of medical data and doctor evaluations. It does not imply that the same accuracy would be achieved if the general public freely used chatbots to input symptoms.

In fact, AI's responses can vary significantly based on how patients express their symptoms. If they omit significant symptoms or exclude information they do not consider important, AI may be led in the wrong direction. Doctor's interviews and physical examinations play a role in capturing information that patients cannot verbalize.

Therefore, this study does not convey the message "if AI examines you, hospitals are unnecessary." More accurately, it suggests that "if doctors appropriately use AI, the quality of diagnosis could potentially be enhanced."


Will Doctors' Jobs Disappear?

As usual, the debate on whether "doctors will be replaced by AI" has arisen on social media.

However, a doctor's job is not just about naming diagnoses. They listen to patients' anxieties, explain the necessity of tests, present treatment options, consider side effects and lifestyle backgrounds, talk with families, support end-of-life decision-making, convey poor prognoses, and connect with medical systems, insurance, and community resources.

These tasks are not mere information processing. They involve value judgments, trust relationships, physical procedures, and ethical responsibilities.

Even if AI can suggest diagnostic candidates, patients will still have questions like "So, what should I do?" "Should I undergo this treatment?" "How should I explain this to my family?" In these situations, the role of human doctors becomes even more crucial.

However, the way doctors work will likely change. AI will read medical records, provide differential diagnoses, suggest test candidates, and draft treatment plans. Doctors will verify these and make judgments tailored to the patient's context. In other words, doctors may shift from being "experts who process everything in their heads" to "experts who integrate multiple information sources, including AI, to make responsible judgments."


What This Study Is Really Asking

This study is not a declaration of victory for medical AI.
At the same time, it is not a suggestion to underestimate AI.

Rather, it poses the following questions to us.

When AI can come up with more diagnostic candidates than humans, how should doctors use it?
When AI can be both right and wrong, how should responsibility be delineated?
Even if AI improves the quality of healthcare, how should we protect patient trust and accountability?
And how can we prevent new healthcare disparities from emerging between hospitals that can use AI and those that cannot?

The numbers from the Harvard study certainly have an impact. The difference of 67% versus 50-55% in initial triage suggests that medical AI is beginning to demonstrate its power in tasks close to clinical practice, rather than being just an experimental toy.

However, this is not a simple story of "AI defeating doctors."
More accurately, it indicates a shift from an era of diagnosing solely by doctors to an era of diagnosing with the inclusion of AI.

In the emergency room, the most important thing is not who wins, but whether the patient is saved.

If AI can reduce doctors' oversights, doctors can detect AI's errors, and patients can receive safer medical care, then that combination has significant value. However, achieving this requires not only technological accuracy but also systems, responsibility, education, ethics, and on-site design.

This study has brought the future of medical AI significantly closer to reality.
The next question is not how smart AI is.
It is how safely human society can use that intelligence.



Source URL

TechCrunch: Article reporting that AI had higher accuracy than human doctors in emergency room diagnoses in a Harvard study
https://techcrunch.com/2026/05/03/in-harvard-study-ai-offered-more-accurate-diagnoses-than-emergency-room-doctors/

Science Published Paper: Research paper evaluating the clinical reasoning performance of large language models
https://www.science.org/doi/10