The surge of sensationalist COVID-19 AI research

There seems to be a tendency to hastily use imperfect and questionable data to train an AI solution for COVID-19, a dangerous trend that not only does not help any patient or physician but also damages the reputation of the AI community. Dealing with a pandemic -- as significant as it is -- does not suspend basic scientific principles. Data has to be curated by medical experts, full and rigorous validations have to be performed, and results have to be reviewed by peers before we deploy any solution or even proposal into the world, particularly when society is dealing with many uncertainties.

It is safe to say we are all deeply concerned about the COVID-19 pandemic. This coronavirus has drastically changed our reality: We're experiencing stress, restrictions, quarantines; we're witnessing heroic sacrifices of caregivers including staff, nurses, and physicians; we're losing loved ones; and we're facing economic hardships and massive uncertainties about what is in store in the coming months. Under such circumstances, it is only natural that many of us are thinking of ways to help -- in the fastest ways possible. The AI community is no exception.

Machine learning methods live on data. They learn from labeled data to classify, predict, and estimate. The quality and reliability of any AI method directly depend on the quality and reliability of the labeled data. In computer science, we talk about “garbage in, garbage out” (GIGO), which summarizes the experience that low-quality input generates unreliable output, or “garbage.” This becomes even more critical when we are dealing with highly complex data modalities, such as medical images -- data that generally require highly specialized knowledge for correct interpretations.

Within the AI community, we are fully dependent on data. As long as the domain is not sensitive (finance, healthcare, surveillance, etc.), we usually assemble our datasets by using a variety of methods, from the manual gathering of samples up to highly sophisticated crawlers to parse through the Internet and other publicly available repositories. In medical imaging, we deal with a highly sensitive domain that generally requires a long process to curate and access a set of labeled images. Needless to say, the curation has to happen within the walls of a hospital, not just because the experts are there but also due to the required de-identification of images to comply with privacy regulations.

But sometimes we get impatient; we create toy datasets by manually collecting publicly accessible sources (e.g., online journals) -- and there is generally no concern about this approach. Most of the time we -- as AI researchers with no clinical or medical competency -- create our toy datasets to run initial investigations and get a feel for the challenges to come. This usually happens in anticipation of receiving a professionally curated dataset, a process that is often slowed down by ethic reviews and intellectual property negotiations.

To be clear, a “toy dataset” in the medical imaging domain is not a toy just because it is commonly very small, but more importantly because it has been created by engineers and computer scientists, not by physicians and medical/clinical experts. And nobody would complain if we play with our toys inside our AI labs to get prepared to deal with the actual data from the hospital.

Radiologists around the world are understandably very busy, to put it mildly; it is not the best time to forge collaborations with radiologists if you are an overambitious AI researcher who wants to help. So some of us have started to assemble our own datasets to get prepared for future tasks.

Collections of x-ray and CT images – scraped from the Internet – seem to emerge here and there and appear to be evolving as the creators continue to add images. Because of the availability of such datasets on one side and the ubiquity of basic AI knowledge and tools on the other side, many AI enthusiasts and startups have impulsively begun to develop solutions for COVID-19 in x-ray images.

One finds websites and blogs that advise on how to detect COVID-19 from x-ray scans with high accuracy. Others provide a sort of tutorial on detecting COVID-19 in x-ray images. We are even starting to see non-peer-reviewed papers that go a step further and baptize their solution with aggrandizing names like “COVID-Net.” This type of work commonly lacks many experimental details to explain how one has dealt with a few images from a very small number of patients to feed the deep network. Such papers report no validation, and no radiologist has guided the experiments. Many of these works were hurriedly made public before the creators of datasets could even provide sufficient explanations about their collection processes.

In an attempt to overcome the small data size, AI enthusiasts and startups mix the few COVID-19 images they have with other public datasets, i.e., pneumonia datasets. This is generally quite clever, but I looked more closely at one case and the trouble is that the pneumonia cases were pediatric images; so the COVID-Nets are comparing pediatric pneumonia (children one to five years old) with adult COVID-19 patients. Well, this happens when we exclude radiologists from research that needs expert oversight.

Why are we rushing to publish faulty AI results on tiny datasets mixed with wrong anatomies, with no radiological backing, and with no validation? Do we want to help COVID-19 patients?

Perhaps the abundance of funding opportunity announcements in recent days and the possibility of getting exposure for our research is misleading us into faulty research conduct; we cannot abandon fundamental scientific principles due to lockdowns and quarantines. AI is neither a ventilator nor a vaccine nor a pill; it is extremely unlikely that the exhausted radiologists in Wuhan, Qom, or Bergamo will download the Python code of our poorly trained network (using insufficient and improper data and described in quickly written papers and blogs) just to obtain a flawed second opinion.

Yes, we all want to help. Let us wait for real data from hospitals, let us do the ethics clearance and de-identification, and let us work with radiologists to develop solutions for chest issues of the future. Otherwise, we may create the impression we are doing sensational research and are more concerned with self-promotion than with the well-being of patients. Radiologists are working day and night to understand the manifestation of this virus in medical images. Let us work with them and learn from them to unleash the true potential of AI for combating viral infections in the future.

Hamid Tizhoosh is a professor in the Faculty of Engineering at the University of Waterloo, where he leads the Kimia Lab (Laboratory for Knowledge Inference in Medical Image Analysis).

More