AI uses tweets to help researchers analyze floods

Social media gets a lot of negative press, but there's more to Twitter and Facebook than botnets, memes, and political trolls. In a research paper preprint on Arxiv.org ("Integrating Social Media into a Pan-European Flood Awareness System: A Multilingual Approach"), scientists at the Joint Research Center, the European Commission's science and knowledge service, detail a prototype -- Social Media for Flood Risk (SMFR) -- that "enriches" Europe's Flood Awareness System (EFAS) with real-time reports from Twitter users.

It builds on research published by Harvard and Google in August 2018, which described an AI model capable of predicting the location of aftershocks up to one year after a major earthquake, and by Facebook AI researchers in December, who developed a method to analyze satellite imagery and to quantify damage from fires and other disasters. More recently, scientists at Google published a retrospective on a machine learning system that accurately predicts riverine floods -- that is, floods from overrun riverbanks -- with 75% precision.

Separately, researchers in the U.K. have used tweet-ingesting machine learning algorithms to map out where violence is likely to occur during riots, to project when mass protests might be imminent, and to identify gang members.

"Over the past decade, social media has emerged as a relevant information source during disasters, prompting researchers from diverse areas to converge on this domain," the paper's coauthors wrote. "Social media analysis has demonstrated the potential to provide timely, precious information about the spatial and temporal development of a crisis, as well as supporting the identification of key disaster-related events."

First, a quick primer on the EFAS: It's a part of the Copernicus Emergency Management Service (Copernicus EMS) and operated by the European Commission's Emergency Response Coordination Centre (ERCC), a division of the European Commission's Civil Protection and Humanitarian Aid Operations set up to support coordinated responses to disasters inside and outside Europe. Much like the U.S.'s Federal Emergency Management Agency, ERCC monitors hazards and risks, collects and analyzes data on disasters, and prepares plans for teams and equipment deployment. And ERCC sources EFAS for forecasts -- principally probabilistic medium-range flood forecasts (including short-range flash floods), but also seasonal forecasts, impact assessments, and early warnings.

The researchers' system tapped EFAS to determine when the risk of floods in a certain geographic area exceeded a threshold. This triggered data collection from social media -- Twitter -- to the tune of up to 400 keywords at a time, the public streamer API's maximum limit.

Extracting messages with relevant keywords (i.e., words indicating a flood is about to happen or recently happened) was no easy task, given that EFAS covers an area where the population speaks more than 27 languages. The team's solution was a multi-lingual classification system that used language-agnostic mathematical representation of words, or word embeddings, to infer similarities among keywords in four tongues: German, English, Spanish, and French.

To train it, they sourced a corpus containing over 7,000 annotated messages (between 1,200 and 2,300 messages per language). Meanwhile, they used a separate model to suss out "representative" messages (tweets having at least a 90% probability of being flood-related) for areas in which flood risk had been predicted.

To test the robustness of their approach, the scientists integrated SMFR into EFAS and deployed it during recent floods affecting Calabria, Italy, in early October 2018. SMFR collected two days' worth of tweets -- about 14,347 in all -- which SMFR analyzed for "relatedness." The researchers report that the AI-filtered messages closely correlated with actual flooding, and they say it's a promising first step toward a system which could shorten response time in early stages of disasters.

"[D]uring the development of an event, collected messages could be valuable to international rescue coordinators ... because they provide insights about the local response, about whether alerts that have been issued by authorities, and about some of the concerns that those affected by a flood or a flood alert may have," the team wrote. "As future research activities, one can envision a global system comprising dozens of languages [and] further steps in the direction of using social media as a data source that can feed into a predictive model."