In any time of trouble, the archetypal hero usually takes a specific form — soldiers in World War II, firefighters on 9/11, and now health care professionals in the time of COVID-19. But data scientists are playing an indispensable role in fighting the global pandemic. While medical professionals are on the front lines caring for the sick, data scientists have shouldered the responsibility of helping keep everyone else healthy by disseminating crucial information to the world.
A case in point is the “flatten the curve” mantra. Originating from the Centers for Disease Control (CDC), the idea is that we have to slow down rates of infection in order to keep health care systems from collapsing. Those three words are potentially going to save millions of lives. But they mean nothing without the simple graphs illustrating how slowed infections can improve outcomes, and the graphs don’t exist without data.
These visualizations tell a story of exponential growth that can be difficult for the average person to immediately comprehend. “COVID-19 is … a tricky thing to reason about — very tricky thing to reason about. Intuition breaks down,” said Jeremy Howard in an interview with VentureBeat. Howard is the cofounder of Fast.ai, which offers free courses on deep learning, and he is on the faculty at the University of San Francisco. He pointed out that there’s a long gap between an outbreak occurring and the visible results. The nature of an illness is that it takes a while for the disease to show itself in a person, and it takes longer to see it at scale.
“This is a perfect storm of what the human brain is bad at,” said Howard. “We respond to what we can see. And we respond to stories. A pandemic doesn’t give you those things.”
He continued, “But what we do have is data. Data scientists are people who know how to look at data and find out what story it’s telling us.” He joked that data scientists aren’t very good at telling the story — except through visualizations like those illustrating the need to “flatten the curve.”
And most if not all of this data and its accompanying visualizations are being made available for free. There’s a sort of digital flotilla coming to our collective rescue in the form of free, articulated, and visualized data.
This week, a collaboration between a number of businesses and organizations like Microsoft and the Allen Institute, along with the White House Office of Science and Technology (OSTP), yielded a trove of data related to COVID-19. The COVID-19 Open Research Dataset (CORD-19) is a machine learning-readable repository of some 29,000 articles about coronaviruses. And then the data scientists mobilized: Kaggle, a Google company that bills itself as the world’s largest community of data scientists, launched a forecasting challenge to uncover factors that impact coronavirus transmission rates.
The primary goal is not to forecast accurately. But to find factors that impact transmission rate.
— Anthony Goldbloom (@antgoldbloom) March 19, 2020
One of the best resources for non-experts trying to keep track of the spread of COVID-19 is an interactive dashboard maintained by Johns Hopkins University. It’s tracking infection rates, recovery rates, and death rates by geographic location.
In the U.S., the COVID Tracking Project pulls information from all 50 states “to include positive and negative results, pending tests, and total people tested for each state or district currently reporting that data,” per its website. You can check info on your state and follow a link to the best data source for your state. The COVID Tracking Project is a volunteer effort, built by Jeff Hammerbacher of Related Sciences, along with two journalists from the Atlantic, Robinson Meyer and Alexis Madrigal. According to the site, a small army of volunteers from different fields is maintaining and updating the data.
On a more individual level, Howard and his Fast.ai cofounder Rachel Thomas spent a frenetic weekend earlier this month putting together an article on how to protect yourself and your community from COVID-19. It links to additional resources and has been translated into 17 languages so far. The effort was personal for Howard and Thomas, both of whom have preexisting medical conditions that make them more vulnerable to COVID-19.
Both Howard and Thomas have also been active on platforms like Twitter, disseminating information and debunking misinformation. They’ve already shared the first part of their most recent deep learning course, which includes facts about COVID-19 alongside some analysis.
All of the aforementioned resources are free and available to all. And they’re just the tip of the iceberg. So many organizations, companies, and individuals are doing what they can to get the data about COVID-19 and the story it tells to as many people as possible.
Parts of that story are still incomplete, and we need data scientists to understand and explain those gaps, too. As Howard pointed out, data scientists know how to work with censored data, or data where labels are missing or there are unknown values. For example, a stat may suggest a certain percentage of those infected with COVID-19 will die, but that might take into account only a small number of cases because most people with the virus have neither died nor yet recovered. And gauging infection rates requires understanding that the number may be significantly impacted by the absence of testing kits — no testing kits means no tests, and no tests means no infections are recorded. In the U.S., as in many countries, testing kit availability has been a serious problem.
People want to help in times of crisis, out of both the kindness of their hearts and a need to regain some measure of control. It’s a particularly wonderful human trait. But there’s so little most of us can actively do in the face of this pandemic. Because we want to bring our skills and resources to the fight, it’s counterintuitive and uncomfortable to find that the best thing the vast majority of us can do is nothing — to literally stay home. To flatten the curve.
But data scientists possess a skillset that is crucial to tackling this global pandemic. Their work helps us all understand what’s happening, and it enables experts like epidemiologists to build knowledge, track progress, and provide guidance that ultimately saves lives.