Presented by AWS Machine Learning


In 2003, the SARS outbreak took the world by surprise. While it pales in comparison to the current pandemic, in short order the disease incapacitated countries around the world, infecting more than 8,000 people, and causing billions of dollars in damage.

“For me, the SARS outbreak was an eye-opening event,” says Dr. Kamran Khan, infectious disease physician, professor of medicine and public health at the University of Toronto, and founder and CEO of BlueDot. “I recognized that we’d never seen anything like it before, but there would be more outbreaks like this again in the future.”

Khan spent the next 10 years studying infectious disease spread, looking for a way to better detect and respond to threats like SARS and the ones that followed.

By 2013, machine learning technology had advanced to the point where he was able to put his vision of a digital global warning system into action — and BlueDot was born. Now, the company’s machine learning algorithms use billions of data points from across a broad spectrum of sources to detect potential outbreaks, track current ones, and predict how the disease will continue to spread.

Powered by AWS, their ML platform anticipates the spread and impact of over 150 different pathogens, toxins, and syndromes in near-real time. With this critical information, they’re able to advise governments, public health organizations, and other clients on how to disrupt the threat of pandemics — and help get ongoing disease spread under control.

“Time is everything during an outbreak, and a pandemic is a global emergency,” Khan says. “The ability to quickly generate insights and get those insights out to the rest of the world is essential, and machine learning is key to that ability.”

Fast-forward five years, and the world saw the arrival of the latest virus, the one that would change people’s lives on a global scale.

BlueDot first detected the coronavirus outbreak in Wuhan on December 31, 2019. It was just a few hours after the first cases were diagnosed by local authorities. With this early information, they were able to send out alerts almost a week before any official announcements were made by the Chinese government or international health organizations. This was just the beginning of their COVID-19 work.

The pandemic-fighting power of ML

Pandemics pose a complex challenge — and the urgency to solve the problem is growing. BlueDot’s outbreak detection solution is unique, and particularly powerful, because of the way it combines public health and medical expertise with advanced data analytics and machine learning on the AWS. This enables them to track, contextualize, and anticipate infectious disease risks.

The company’s software consists of a machine learning platform that leverages billions of data points from a vast array of sources in over 65 languages. It’s constantly scanning foreign-language news reports, animal and plant disease networks, official government announcements, and more than 100 datasets with proprietary algorithms to identify new outbreaks.

AWS is key to processing all of this data, using custom machine learning algorithms that rely on natural language processing to make sense and structure all of the data. Using Amazon Elastic Compute (EC2) they can process massive amounts of unstructured text data into organized, structural, spatiotemporal pathogen data — identifying the space, time and name of the pathogen. For instance, the word “plague” might refer to an outbreak, or it might refer to a component of a fantasy video game. This is where subject matter experts have worked with data scientists to train the platform to process all of this information and organize it, so that the algorithm can differentiate the article that’s about the heavy metal band Anthrax from an actual outbreak of anthrax. The algorithm can also eliminate duplicates from among multiple stories being written about an event.

“We would need hundreds of people if we did this all manually,” Khan says. “This is where machine learning can allow us to process and make sense of this vast amount of unstructured data in all these various languages to find the metaphorical needles in the haystack.”

Once the algorithms extract the place and the time of a potential outbreak, the platform adds context. It cross-references this information with other complementary data, such as how many people live in that area, where are the neighboring airports, are there direct flights out of the region, where do they go, and with how many passengers? What’s the temperature like? And so on, adding private sector data to the analysis.

BlueDot also incorporates anonymized air traffic data to follow the movement of passengers to anticipate where diseases might disperse around the planet, as well as anonymous location data from 400 million mobile devices worldwide.

With respect to the microbe, the algorithm can parse the data to identify what type it is, from flu or measles to dengue fever. And once the pathogen is identified, it can add their own internal knowledge of the disease, such as how it is spread, the clinical manifestation of the disease, whether there’s a vaccine, and what the mortality rate is.

Tracking COVID-19

BlueDot’s machine learning algorithm identified early news of pneumonia of unknown origin from Chinese news reports. The machine learning algorithms translated the text, analyzed the data, and alerted BlueDot scientists that a serious situation was beginning to brew in Wuhan.

The company’s experts in epidemiology, medicine, and public health confirmed that a potential outbreak, similar to the event that started in Guangdong province with the SARS outbreak, was occurring, and posed a legitimate threat. Then the location of the outbreak was cross-referenced using a variety of models that found where the neighboring airports were via spatial models and spatial analytics using GIS (geographic information systems).

It found the locations of the airports, automatically connected all of the flight data, and passenger-level data, and conducted an analysis to find all the potential destinations the disease could be spread to. Machine learning lets them track an outbreak like this continuously, and at scale, Khan says.

Above: Tracking Wuhan final destinations

“For every single outbreak that appears in the world every day, we’re able to identify every other location on the planet that may be connected to it and should be aware of that particular event,” he explains. “That way we’re anticipating its potential arrival, not just responding or reacting to it when it shows up.”

Concerned about the parallels with the SARS outbreak, the company’s scientists made their insights available to the broader public by publishing a peer-reviewed scientific paper, which appeared on January 13. It identified the places the outbreak could travel to next. Of the 20 cities the paper listed, 12 of those were among the first cities that were impacted by COVID-19. The number-one city on the list was Bangkok, and Bangkok was the first city in the world that had a case of COVID-19 reported as it spread outside of mainland China.

As cities started to go into lockdown, implementing stay-at-home orders to slow transmission of this virus, they were able to use mobile phone data to understand how well social distancing interventions were being adhered to. This allowed public health messages to be strategically targeted to the places most needing the message, and helped fight the disease on as many fronts as possible as countries begin to develop their reopening strategies.

The future of disease detection

“We’re actively researching ways machine learning can better anticipate the spread, impact and consequence of global diseases,” Khan says. “Without a high-performance computing environment, it wouldn’t be possible to make sense of all this information.”

Meanwhile, they’re not losing sight of Ebola activity in the Democratic Republic of Congo, or an outbreak of Lassa fever, or other types of diseases that can’t be ignored. This machine learning platform is critical to monitoring threats on an ongoing basis. From early detection, to tracking leaps across continents, to mitigating the spread in airports and local communities, this technology is the most powerful ammunition scientists have.

“We’re deep in the fight against COVID-19 now, but we can’t stop looking at the next threat,” Khan says. “While we turn our attention to mitigating the current pandemic, a machine can keep its eye on everything else happening around the world.”


Dig deeper: See more ways machine learning is being used to tackle today’s biggest social, humanitarian, and environmental challenges. 


Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. Content produced by our editorial team is never influenced by advertisers or sponsors in any way. For more information, contact sales@venturebeat.com.