Facebook launches two datasets to improve AI video analysis

Facebook launched a pair of new open datasets today to help developers and data scientists train artificial intelligence systems to better understand videos.

The Scenes, Objects, and Actions dataset (SOA) will provide developers with a massive set of videos that contain multiple labels indicating what’s going on inside them. Each video has been tagged by humans trained to attach multiple labels that reflect where a video is taking place, what is in it, and what is going on in the scene. Those labels can then be used to train AI systems.

A Generic Motions dataset includes a set of GIFs that are focused on certain motion properties, like jumping and sliding. As the name implies, the subjects in the video include more than humans, so it should be possible to use the data to train a machine to understand different motions, like a panda falling or a kitten sliding.

Both of these datasets should be useful for building more intelligent video understanding systems, using machine learning. SOA is supposed to help deal with machine learning systems that don’t actually understand the underlying videos but rather pick up some sort of tangential marker.

One example Manohar Paluri, Facebook’s computer vision research lead, cited on stage at the GitHub Universe conference was a hypothetical neural network that only looks for the presence of a kayak inside a video when it labels footage as containing “kayaking.” While that would work for many pieces of footage, such a system might also label a piece of footage set in a garage full of kayaks as about being about kayaking.

Facebook will be challenging developers and data scientists around the world to come up with the best models for understanding the contents of videos using the SOA dataset.

Robust open datasets have played an important role in driving the field of machine learning forward. ImageNet, a set of labeled images, has become a key benchmark for computer vision systems, for example. Facebook’s newly released footage could help propel the field of computer vision for video to new heights.

More