Facebook claims its AI can anticipate COVID-19 outcomes using X-rays

Researchers at Facebook and New York University (NYU) claim to have developed three machine learning models that could help doctors predict how a COVID-19 patient's condition might develop. The open-sourced models, all of which require no more than a sequence of X-rays, ostensibly predict patient deterioration up to four days in advance and the amount of supplemental oxygen (if any) a patient might need.

The novel coronavirus pandemic continues to reach alarming new heights in the U.S. and around the world. In the U.S. last week, daily deaths exceeded 4,000 for the first time since the start of the health crisis. Record numbers of infections in the hundreds of thousands per day have placed a strain on health systems nationwide, with states like California struggling to maintain space in overtaxed intensive care units.

Huiying Medical, Alibaba, RadLogics, Lunit, DarwinAI, Infervision, Qure.ai, and others have developed AI algorithms that ostensibly diagnose COVID-19 from X-rays with high accuracy. What differentiates the approach taken by Facebook and NYU, however, is that it attempts to predict long-term clinical trajectories. Stanford, Mount Sinai, and electronic health record vendors Epic and Cerner have developed models that turn out risk scores for a patient's chances of dying or needing a ventilator, but few (if any) make these predictions from a single scan or electronic medical record.

As part of an ongoing collaboration with NYU Langone Health's Predictive Analytics Unit and Department of Radiology, Facebook researchers pretrained an AI system on two large, public chest X-ray datasets, MIMIC-CXR-JPG and CheXpert, using a self-supervised learning technique called Momentum Contrast (MoCo). Self-supervised learning enabled the MoCo model to learn from X-ray scans within the datasets, even when labels explaining those scans weren't available.

The next step was fine-tuning the MoCo model using an extended version of the publicly available NYU COVID-19 dataset. The researchers built classifiers with 26,838 X-ray images from 4,914 patients, annotated to indicate whether the patient's condition worsened within 24, 48, or 72 hours of the scan in question. One classifier predicts patient deterioration based on a single X-ray, while the other uses a sequence of aggregated X-rays.

The researchers claim that the classifiers relying on a series of X-ray images outperformed human experts at predicting ICU needs, mortality, and adverse events up to 96 hours in advance. While the results aren't necessarily applicable to other hospitals with unique datasets, the researchers believe new classifiers can be built from the MoCo model with relatively few resources, perhaps a single GPU.

"Being able to predict whether a patient will need oxygen resources would also be a first and could help hospitals as they decide how to allocate resources in the weeks and months to come. With COVID-19 cases rising again across the world, hospitals need tools to predict and prepare for upcoming surges as they plan their resource allocations," the Facebook team wrote in a blog post. "These predictions could help doctors avoid sending at-risk patients home too soon and help hospitals better predict demand for supplemental oxygen and other limited resources."

Recent research from the University of Toronto, the Vector Institute, and MIT revealed that the chest X-ray datasets used to train diagnostic models -- including MIMIC-CXR and CheXpert -- exhibit imbalance, biasing them against certain gender, socioeconomic, and racial groups. Female patients suffer from the highest disparity levels, even though the proportion of women in the dataset is only slightly less than men. White patients -- the majority, with 67.6% of all the X-ray images -- are the most-favored subgroup, while Hispanic patients are the least-favored.

The Facebook and NYU researchers say they addressed this bias by pretraining on non-COVID data and carefully selecting each test sample. But earlier last year, the U.S. Centers for Disease Control and Prevention recommended against the use of CT scans or X-rays for COVID-19 diagnosis, as did the American College of Radiology (ACR) and radiological organizations in Canada, New Zealand, and Australia. That's because even the best AI systems sometimes can't tell the difference between COVID-19 and common lung infections like bacterial or viral pneumonia.

Partly due to a reticence to release code, datasets, and techniques, much of the data used today to train AI algorithms for diagnosing diseases may perpetuate inequalities. A team of U.K. scientists found that almost all eye disease datasets come from patients in North America, Europe, and China, meaning eye disease-diagnosing algorithms are less certain to work well for racial groups from underrepresented countries. In another study, Stanford University researchers claimed that most of the U.S. data for studies involving medical uses of AI come from California, New York, and Massachusetts. A study of a UnitedHealth Group algorithm determined that it could underestimate by half the number of Black patients in need of greater care. And a growing body of work suggests that skin cancer-detecting algorithms tend to be less precise when used on Black patients, in part because AI models are trained mostly on images of light-skinned patients.

Determining just how reliable Facebook's and NYU's algorithms are would likely require thorough testing at multiple, diverse health systems around the world -- with patients' consent. A study published in Nature Machine Intelligence revealed that a COVID-19 deterioration model successfully deployed in Wuhan, China yielded results that were no better than a roll of the dice when applied to a sample of patients in New York. While careful fine-tuning might help Facebook's and NYU's algorithm avoid the same fate, it's impossible to predict where the biases might arise, which speaks to the need for auditing prior to deployment at any scale.

More