In a study published on the preprint server Arxiv.org, researchers at Donghua University and the University of California, Santa Barbara highlight the dangers posed by imprecise medical data when fed to AI and machine learning algorithms. Learning algorithms, they find, can carry out calculations subject to uncertain influences, resulting in ranges of results that could lead to mislabeling and inappropriate treatments.
Clinical lab tests play an important role in health care. In fact, it’s estimated that from early detection to the diagnosis of diseases, test results guide more than 70% of medical decisions and prescriptions. The availability of medical data sets would seem to make health a natural fit for AI and machine learning. But due to equipment, instrument, material, and test method limitations, data inaccuracy often occurs (as a result of expired reagents, controls, calibrators, and failures in sampling systems), potentially impacting the accuracy of AI systems. According to a 2006 study, the prevalence of laboratory errors can be as high as one every 330 to 1,000 events, one every 900 to 2,074 patients, or one every 214 to 8,316 laboratory results.
In an attempt to quantify the effects of imprecision on an AI system’s results, the team designed a model to represent data imprecision with a parameter to control the degree of imprecision. The model generates imprecise samples for comparison experiments, which can be evaluated using a group of measures to determine how inconsistent a prediction is for an individual patient. It also identifies data mislabeling attributable to imprecise predictions.
In an experiment, the researchers compared the prediction results from data in a medical database with corresponding predictions generated from the imprecision model. They used a hyperthyroidism corpus from Ruijin Hospital in Shanghai, which included 2 to 10 years of 2,460 patients’ records, to train and test the imprecision model, running each experiment 10 times and averaging the results together.
The team reports that data the imprecision model generated led to abnormally low or abnormally high predicted levels of thyrotropin receptor antibodies and thyroid-stimulating hormone, the pituitary hormone that drives the thyroid gland to produce metabolism-stimulating thyroxine and triiodothyronine. “The prediction label could easily change from correct to wrong or from wrong to correct for these ranges by introducing the imprecision to the data, leading to the unstable decline,” they wrote. “The study has direct guidance on practical healthcare applications … It motivates to build robust models that can take imprecisions into account with better generalization.”
While the study’s findings are perhaps somewhat obvious, they’re another data point in the debate about the deployment of AI in medicine. Google recently published a whitepaper that found an eye disease-predicting system was impractical in the real world, partially because of technological and clinical missteps. STAT reports that unproven AI algorithms are being used to predict the decline of COVID-19 patients. And companies like Babylon Health, which claim their systems can diagnose diseases as well as human physicians can, have come under scrutiny from regulators and clinicians.
“The potential of AI is well described, however in reality health systems are faced with a choice: to significantly downgrade the enthusiasm regarding the potential of AI in everyday clinical practice, or to resolve issues of data ownership and trust and invest in the data infrastructure to realize it,” MIT principal research scientist Leo Anthony Celi and coauthors wrote in a recent policy paper laying out what they call the “inconvenient truth” about AI in health care. “Without this however, opportunities for AI in healthcare will remain just that — opportunities.”