Humans Outperform AI in Interpreting Chest X-Rays

AI tools may help boost radiologists’ confidence in their diagnoses, but they can’t be relied on to identify common lung diseases on chest X-rays, a new study says.

Researchers pitted 72 radiologists against four commercially AI tools in an analysis of more than 2,000 X-rays. The human experts won, according to results published Sept. 25 in Radiology.

“Chest radiography is a common diagnostic tool, but significant training and experience is required to interpret exams correctly,” said lead researcher Dr. Louis Plesner, resident radiologist and PhD fellow in radiology at Herlev and Gentofte Hospital in Copenhagen, Denmark.

“While AI tools are increasingly being approved for use in radiological departments, there is an unmet need to further test them in real-life clinical scenarios,” Plesner said in a journal news release. “AI tools can assist radiologists in interpreting chest X-rays, but their real-life diagnostic accuracy remains unclear.”

Commercially available and FDA-approved AI tools are available to assist radiologists, Plesner said.

In this study, the X-rays had been taken over two years at four Danish hospitals. About one-third had at least one target diagnosis.

The X-rays were reviewed for three common findings: airspace disease, which is a chest X-ray pattern, for example, caused by pneumonia or lung edema; pneumothorax, or collapsed lung; and pleural effusion, a buildup of water around the lungs.

AI tools had sensitivity rates ranging from 72% to 91% for airspace disease, 63% to 90% for pneumothorax, and 62% to 95% for pleural effusion. A highly sensitive test means fewer cases of disease are missed.

The study found that radiologists outperformed AI in accurately identifying the presence and absence of the three common lung diseases.

“The AI tools showed moderate to a high sensitivity comparable to radiologists for detecting airspace disease, pneumothorax and pleural effusion on chest X-rays,” Plesner said. “However, they produced more false-positive results [predicting disease when none was present] than the radiologists, and their performance decreased when multiple findings were present and for smaller targets.”

For pneumothorax, the probability that patients with a positive screening result truly had the disease ranged from between 56% to 86% for the AI systems compared to 96% for the radiologists.

“AI performed worst at identifying airspace disease, with positive predictive values ranging between 40% and 50%,” Plesner said. “In this difficult and elderly patient sample, the AI predicted airspace disease where none was present five to six out of 10 times. You cannot have an AI system working on its own at that rate.”

The goal of radiologists is to balance both finding and excluding disease, avoiding significant overlooked diseases and overdiagnosis, Plesner said.

“AI systems seem very good at finding disease, but they aren’t as good as radiologists at identifying the absence of disease especially when the chest X-rays are complex” he said. “Too many false-positive diagnoses would result in unnecessary imaging, radiation exposure and increased costs.”

In prior studies claiming AI superiority over radiologists, the radiologists reviewed only the image without access to the patient’s clinical history and previous imaging studies. “In everyday practice, a radiologist’s interpretation of an imaging exam is a synthesis of these three data points,” Plesner said.

More information

The American Hospital Association has more on using AI in diagnosis and care.

SOURCE: Radiology, news release, Sept. 26, 2023

Source: HealthDay