All the sessions from Transform 2021 are available on-demand now. Watch now.

Facebook launched a pair of new open datasets today to help developers and data scientists train artificial intelligence systems to better understand videos.

The Scenes, Objects, and Actions dataset (SOA) will provide developers with a massive set of videos that contain multiple labels indicating what’s going on inside them. Each video has been tagged by humans trained to attach multiple labels that reflect where a video is taking place, what is in it, and what is going on in the scene. Those labels can then be used to train AI systems.

A Generic Motions dataset includes a set of GIFs that are focused on certain motion properties, like jumping and sliding. As the name implies, the subjects in the video include more than humans, so it should be possible to use the data to train a machine to understand different motions, like a panda falling or a kitten sliding.

Both of these datasets should be useful for building more intelligent video understanding systems, using machine learning. SOA is supposed to help deal with machine learning systems that don’t actually understand the underlying videos but rather pick up some sort of tangential marker.

One example Manohar Paluri, Facebook’s computer vision research lead, cited on stage at the GitHub Universe conference was a hypothetical neural network that only looks for the presence of a kayak inside a video when it labels footage as containing “kayaking.” While that would work for many pieces of footage, such a system might also label a piece of footage set in a garage full of kayaks as about being about kayaking.

Facebook will be challenging developers and data scientists around the world to come up with the best models for understanding the contents of videos using the SOA dataset.

Robust open datasets have played an important role in driving the field of machine learning forward. ImageNet, a set of labeled images, has become a key benchmark for computer vision systems, for example. Facebook’s newly released footage could help propel the field of computer vision for video to new heights.


VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more
Become a member