Researchers built a data set for training AI to detect natural disasters from social media images

This week, people in California and Gulf Coast states experienced the impact of historic natural disasters. Called signs of climate change, both are unique: The California fires were started by hundreds of lightning strikes, creating some of the largest fires recorded in state history, and Hurricane Laura hit Louisiana harder than any hurricane in more than 150 years.

To assist humanitarian groups and first responders, AI researchers created the Incidents data set, which they call one of the largest ever assembled for detecting accidents and natural disasters people share on social media platforms like Flickr and Twitter. Creators of the Incidents data set said they hope it spurs the creation of AI that uses computer vision to recognize natural disasters and flag incidents for humanitarian organizations and emergency responders.

The Incidents data set contains 1.1 million images and spans 43 categories of accidents or natural disasters, ranging from car accidents to volcanic eruptions. Images contain location labels as well like beach, bridge, forest, or house. A paper about the Incidents data set was published this week as part of the European Conference on Computer Vision (ECCV).

"Our dataset is significantly larger, more complete, and much more diverse than any other available dataset related to incident detection, enabling the training of robust models able to detect incidents in the wild," the paper reads.

The Incidents data set contains nearly 447,000 images labeled as accidents or natural disasters and 697,000 labeled images without any accident or natural disaster. The data set was assembled by researchers from MIT, Qatar Computing Research Institute, and the Universitat Oberta de Catalunya in Spain. Photos were obtained from Google Images searches and labeled by Mechanical Turk employees. Labeled images were only accepted after achieving 85% accuracy.

Researchers pointed out that images labeled as negative were critical to making robust models. "We can observe that, without using the class negatives during training, the model is not able to distinguish the difference between a fireplace and a house on fire or detect when a bicycle is broken because of an accident," the paper reads.

To test the effectiveness of Incidents, researchers used the data set to train a convolutional neural network and found an average precision of 77% across earthquakes and floods on Twitter. The experiment includes analysis of 900,000 Twitter photos from five earthquakes and two floods. The data created AI capable of recognizing earthquakes and floods from nearly a million Twitter photos with an average precision of about 74% and 89%, respectively.

Researchers also conducted experiments with 40 million geotagged Flickr images to analyze emergency event detection from earthquakes and volcanic eruptions. They found the AI capable of recognizing the location of earthquakes and volcanic events.

A variety of AI models exist today to identify natural disasters and their impact. Beyond weather forecasting models, there's AI for predicting when floods will happen along the Ganges River in India or how a wildfire may spread after ignition; for detecting when a wildfire starts using satellite imagery, though satellites can be obstructed by smoke or clouds; and for assessing flood and fire damage. AI systems can identify natural disasters from the words people use in social media -- but few are made for detecting disasters from images shared on social media. In the coming months, a U.S. federal agency will introduce the full ASAPS data set to spur the creation of AI tools that automatically detect police, fire, or medical emergencies in real time from social media photos and videos. Some coauthors of the Incidents data set paper introduced in 2017 an AI system for analyzing natural disasters as shared on Twitter, but it could only recognize three kinds of disasters.

More