Posted in | News | Consumer Robotics | Medical Robotics

The Role of Human Judgment in AI-Assisted Diagnosis

Download PDF Copy

Reviewed

Reviewed by Lexie CornerNov 20 2024

According to a study published in Radiology, radiologists and other medical professionals may rely too heavily on AI when making diagnostic decisions, particularly when it identifies a specific region of interest in an X-ray.

Chest radiograph (CXR) examples of (A, C) local (feature-based) AI explanations and (B, D) global (prototype-based) AI explanations from a simulated AI tool, ChestAId, presented to physicians in the study. In all examples, the correct diagnostic impression for the radiograph case in question is “right upper lobe pneumonia,” and the corresponding AI advice is correct. The patient clinical information associated with this chest radiograph was “a 63-year-old male presenting to the Emergency Department with cough.” To better simulate a realistic AI system, explanation specificity was changed according to high (ie, 80%-94%) or low (ie, 65%–79%) AI confidence level: bounding boxes in high-confidence local AI explanations (example in A) were more precise than those in low-confidence ones (example in C); high-confidence global AI explanations (example in B) had more classic exemplar images than low-confidence ones (example in D), for which the exemplar images were more subtle. — Chest radiograph (CXR) examples of (A, C) local (feature-based) AI explanations and (B, D) global (prototype-based) AI explanations from a simulated AI tool, ChestAId, presented to physicians in the study. In all examples, the correct diagnostic impression for the radiograph case in question is “right upper lobe pneumonia,” and the corresponding AI advice is correct. The patient clinical information associated with this chest radiograph was “a 63-year-old male presenting to the Emergency Department with cough.” To better simulate a realistic AI system, explanation specificity was changed according to high (ie, 80%−94%) or low (ie, 65%–79%) AI confidence level: bounding boxes in high-confidence local AI explanations (example in A) were more precise than those in low-confidence ones (example in C); high-confidence global AI explanations (example in B) had more classic exemplar images than low-confidence ones (example in D), for which the exemplar images were more subtle. Image Credit: RSNA

As of 2022, 190 radiology AI software programs were approved by the U.S. Food and Drug Administration. However, a gap between AI proof-of-concept and its real-world clinical use has emerged. To bridge this gap, fostering appropriate trust in AI advice is paramount.

Paul H. Yi, MD, Study Senior Author and Associate Member, Department of Radiology, St. Jude Children’s Research Hospital

In the multi-site, prospective trial, 220 radiologists and internal medicine/emergency medicine physicians (132 radiologists) reviewed chest X-rays with the assistance of AI guidance. Each physician was tasked with evaluating eight chest X-ray cases, using recommendations from a simulated AI assistant with diagnostic performance comparable to specialists in the field.

The clinical vignettes included frontal and, when available, corresponding lateral chest X-ray images taken from Beth Israel Deaconess Hospital in Boston, using the open-source MIMI Chest X-Ray Database. A panel of radiologists selected cases that represented real-world clinical practice.

For each case, participants were shown the patient's clinical history, AI suggestions, and X-ray images. AI provided either a correct or incorrect diagnosis, along with local or global explanations. In a local explanation, AI highlights the most relevant features of the image, while in a global explanation, AI uses similar visuals from previous cases to explain how it arrived at its diagnosis.

Dr. Yi added, “These local explanations directly guide the physician to the area of concern in real-time. In our study, the AI literally put a box around areas of pneumonia or other abnormalities.”

Reviewers might accept, alter, or reject the AI ideas. They were also asked to rate their level of confidence in the findings and impressions, as well as the utility of AI advice.

Trust in AI a Double-Edged Sword

Drew Prinster, MS, and Amama Mahmood, MS, computer science Ph.D. students at Johns Hopkins University in Baltimore, led the study’s analysis of the effects of experimental variables on diagnostic accuracy, efficiency, physician perception of AI usefulness, and "simple trust" (how quickly a user agreed or disagreed with AI advice). The researchers considered variables such as user demographics and professional experience.

The findings revealed that when AI provided local explanations, reviewers were more likely to align their diagnostic decisions with AI recommendations. However, this resulted in less time for them to consider the diagnosis fully.

“Compared with global AI explanations, local explanations yielded better physician diagnostic accuracy when the AI advice was correct. They also increased diagnostic efficiency overall by reducing the time spent considering AI advice,” Dr. Yi stated.

When the AI advice was correct, the average diagnostic accuracy among reviewers was 85.3 % with global explanations and 92.8 % with local explanations. However, when the AI advice was incorrect, physician accuracy dropped to 26.1 % with global explanations and 23.6 % with local explanations.

Dr. Yi added, “When provided local explanations, both radiologists and non-radiologists in the study tended to trust the AI diagnosis more quickly, regardless of the accuracy of AI advice.”

Chien-Ming Huang, Ph.D., co-senior author of the study, and John C. Malone, Assistant Professor in the Department of Computer Science at Johns Hopkins University, noted that this confidence in AI could have drawbacks, including the potential for over-reliance or automation bias.

Dr. Yi stated, “When we rely too much on whatever the computer tells us, that is a problem because AI is not always right. I think as radiologists using AI, we need to be aware of these pitfalls and stay mindful of our diagnostic patterns and training.”

According to the study, AI system developers should think carefully about how different types of AI explanations can affect reliance on AI advice.

Dr. Yi concluded, “I really think collaboration between industry and healthcare researchers is key. I hope this paper starts a dialog and fruitful future research collaborations.”

Journal Reference:

Prinster, D. et. al. (2024) Care to Explain? AI Explanation Types Differentially Impact Chest Radiograph Diagnostic Performance and Physician Trust in AI. Radiology. doi.org/10.1148/radiol.233261

Source:

Radiological Society of North America