Many studies claiming that artificial intelligence is as good as (or better than) human experts at interpreting medical images are of poor quality and are arguably exaggerated, posing a safety risk to patients, researchers warn in a paper published March 25, 2020, in the BMJ.
Researchers reviewed the results of published studies over the past 10 years—two eligible randomised clinical trials and 81 non-randomised studies—and compared the performance of a deep learning algorithm in medical imaging with expert clinicians. Of the non-randomised studies, only nine were prospective and just six were tested in a 'real world' clinical setting.
The average number of human experts in the comparator group was just four, while access to raw data and code (to allow independent scrutiny of results) was severely limited. More than two thirds (58 of 81) studies were judged to be at high risk of bias, and adherence to recognised reporting standards was often poor. Three quarters (61 studies) stated that performance of AI was at least comparable to (or better than) that of clinicians, and only 31 (38 percent) stated that further prospective studies or trials were needed.
The findings raise concerns about the quality of evidence underpinning many of these studies, highlighting the need to improve their design and reporting standards. The researchers say that many of the studies presented arguably exaggerated claims about superior performance of AI to clinicians, which could pose a risk to patient safety.