Researchers examine uncertainty in medical AI papers going back a decade

In the big data domain, researchers need to ensure that conclusions are consistently verifiable. But that can be particularly challenging in medicine because physicians themselves aren't always sure about disease diagnoses and treatment plans.

To investigate how machine learning research has historically handled medical uncertainties, scientists at the University of Texas at Dallas; the University of California, San Francisco; the National University of Singapore; and over half a dozen other institutions conducted a meta-survey of studies over the past 30 years. They found that uncertainty arising from imprecise measurements, missing values, and other errors was common among data and models but that the problems could potentially be addressed with deep learning techniques.

The coauthors sought to quantify the prevalence of two types of uncertainty in the studies: structural uncertainty and uncertainty in model parameters. Structural uncertainty deals with how AI model structures (i.e. architectures) are used and the accuracy with which they extrapolate information, while uncertainty in model parameters considers the parameters (configuration variables internal to models) chosen to make predictions from a given corpus.

The researchers looked at 165 papers published by the Institute of Electrical and Electronics Engineers (IEEE), Dutch publisher Elsevier, and American academic journal publisher Springer between 1991 and 2020. The coauthors report a rise in the number of papers that address uncertainty, from 1 to 6 papers (1991 to 2009) to 7 to 21 papers (2010 to 2020), which they attribute to growing consensus about how uncertainty can impact clinical outcomes.

According to the coauthors, the studies dealt with uncertainty using one of six classical machine learning techniques: Bayesian inference (27% of the studies), fuzzy systems (24%), Monte Carlo simulation (18%), rough classification (11%), Dempster-Shafer theory (14%), and imprecise probability (7%). Each of the techniques comes with inherent disadvantages, however:

Bayesian inference addresses structural uncertainty and uncertainty in parameters while integrating prior knowledge, but it's computationally demanding.
Fuzzy systems quickly learn from unfamiliar data sets, but they're limited with respect to the number of inputs they can take.
Monte Carlo simulation can answer questions that are intractable analytically with easy-to-interpret results, but its solutions aren't exact.
Rough classification doesn't need preliminary information about data and automatically generates a set of decision rules, but it can't process real-valued data (i.e., real numbers).
Dempster-Shafer theory accounts for multiple sources of evidence, but it's computationally intensive.
Imprecise probability makes it easier to tackle conflicting evidence but also has a high computational burden.

The researchers suggest deep learning as a remedy to the shortcomings of classical machine learning because of its robustness to uncertainty -- deep learning algorithms generalize better even in the presence of noise. The team points out that in recent work, for instance, deep learning algorithms have been shown to achieve strong performance with noisy electrocardiogram signals.

"In the future, deep learning models may be explored to mitigate the presence of noise in the medical data," the researchers wrote. "Proper quantification of uncertainty provides valuable information for optimal decision making."

The study's findings are yet another data point in the debate about the ways AI is applied to medicine. Google recently published a whitepaper that found an eye disease-predicting system was impractical in the real world, partially because of technological and clinical missteps. STAT reports that unproven AI algorithms are being used to predict the decline of COVID-19 patients. And companies like Babylon Health that claim their systems can diagnose diseases as well as human physicians have come under scrutiny from regulators and clinicians.

More