May 22, 2024

ChatGPT-4 revealed to surpass physicians in clinical reasoning but lag behind residents in certain cases

A recent review revealed that an artificial intelligence program known as ChatGPT-4 has exhibited superior performance compared to internal medicine residents and attending physicians in processing medical data and showcasing clinical reasoning at two academic medical centers. Doctor-scientists at Beth Israel Deaconess Medical Center (BIDMC) conducted this study and published their findings in a research letter in JAMA Internal Medicine. The study involved directly comparing a large language model’s (LLM) reasoning capabilities with human performance using criteria developed to evaluate physicians.

The comparison highlighted that while LLMs can make diagnoses, clinical medicine involves a more extensive approach beyond just making diagnoses, according to Adam Rodman, MD, an internal medicine physician and researcher at BIDMC. The researchers aimed to determine whether LLMs possess comparable clinical reasoning skills to physicians and were surprised to find that these AI models demonstrated equivalent or superior reasoning abilities throughout the clinical case evolution.

Relying on a previously validated tool called the revised-IDEA (r-IDEA) score to evaluate clinical reasoning, the investigators engaged 21 attending physicians and 18 residents to work through 20 clinical cases that comprised four progressive stages of diagnostic reasoning.

The researchers instructed both physicians and the chatbot GPT-4 to provide their differential diagnoses and justifications at each stage. The AI model was then scored on clinical reasoning (r-IDEA score) and various other reasoning measures alongside the human participants.

Stephanie Cabral, MD, a third-year internal medicine resident at BIDMC, explained the four sequential stages of diagnostic reasoning: triage data, system review, physical exam, and diagnostic testing and imaging. The study found that the AI chatbot obtained the highest r-IDEA scores, scoring a median of 10 out of 10, followed by 9 for attending physicians and 8 for residents. Despite this, the AI model displayed higher instances of incorrect reasoning compared to residents, indicating that AI may complement rather than replace human reasoning.

The study underscores the potential of AI as a tool to enhance clinical reasoning processes, reducing inefficiencies in healthcare and improving patient-physician interactions. The authors suggest further research to explore the optimal integration of AI into clinical practice. The findings offer a promising outlook for leveraging AI to enhance the quality and experience of patient care.

The study was conducted in collaboration with co-authors including Zahir Kanjee, MD, Philip Wilson, MD, Byron Crowe, MD (BIDMC), Daniel Restrepo, MD (Massachusetts General Hospital), and Raja-Elie Abdulnour, MD (Brigham and Women’s Hospital).

1. Source: Coherent Market Insights, Public sources, Desk research
2. We have leveraged AI tools to mine information and compile it