Use of artificial intelligence (AI) to interpret radiographic images is possible if it can learn to read “noninterpretable images,” such as those of the abdomen and the axial skeleton, suggests a UK study.
“When special dispensation for the AI candidate was provided, the AI candidate was able to pass two of 10 mock examinations,” the researchers said. “Further training and revision are strongly recommended, particularly for cases the AI considers ‘noninterpretable,’ such as abdominal radiographs and those of the axial skeleton.”
This prospective multi-reader diagnostic accuracy study included one AI candidate and 26 radiologists who had passed the Fellowship of the Royal College of Radiologists (FRCR) examination in the preceding 12 months. The research team assessed the accuracy and pass rate of the AI compared with radiologists across 10 mock FRCR rapid reporting examinations (30 radiographs; 90-percent accuracy rate to pass).
The AI candidate achieved an average overall accuracy of 79.5 percent (95 percent confidence interval [CI], 74.1‒84.3) and passed two of 10 mock FRCR examinations when excluding noninterpretable images from the analysis. On the other hand, radiologists achieved an average accuracy of 84.8 percent (95 percent CI, 76.1‒91.9) and passed four of 10 mock examinations. [BMJ 2022;379:e072826]
For the AI candidate, the sensitivity was 83.6 percent (95 percent CI, 76.2‒89.4) and the specificity was 75.2 percent (95 percent CI, 66.7‒82.5) compared with summary estimates across all radiologists of 84.1 percent (95 percent CI, 81.0‒87.0) and 87.3 percent (95 percent CI, 85.0‒89.3), respectively.
Out of 300 radiographs, 148 were correctly interpreted by >90 percent of radiologists, while the AI candidate was incorrect in 14 of 148 (9 percent). Meanwhile, in 20 out of 300 radiographs that were correctly interpreted by >50 percent of radiologists, the AI candidate was correct in 10 of 20 (50 percent).
Notably, most of the imaging difficulties arose from the interpretation of musculoskeletal rather than chest radiographs.
AI potential
“The AI candidate’s performance is representative of similar AI models reported in the wider literature,” the researchers said.
For instance, a recent meta-analysis of AI algorithms for detecting fractures on imaging reported a sensitivity and specificity of 89 percent and 80 percent, respectively, in studies with adequate external validation cohorts and low risk of bias. [Radiology 2022;304:50-62]
In another meta-analysis, AI algorithms for classifying abnormal from normal chest radiographs had a sensitivity of 87 percent and specificity of 89 percent. However, studies with no external validation were included, which likely enhanced the accuracy reported in the meta-analysis. [NPJ Digit Med 2021;4:65]
“The promise of AI as a diagnostic adjunct in clinical practice remains high. Although lowly ranked for diagnostic accuracy rate overall (rank 26), the AI came close to radiologist level performance when we consider the cases it could interpret,” the researchers said.
“This could potentially bring near radiologist level accuracy to physicians in the clinical environment (especially considering that the radiologists in this cohort could potentially be higher performing, given recent examination success) and where routine immediate radiographic reporting by nonradiologists is not available and levels of training in and exposure to radiographic interpretation can be highly heterogeneous,” they continued.