Connect with top gaming leaders in Los Angeles at GamesBeat Summit 2023 this May 22-23. Register here.
Computer vision is emerging as a major boon for tech companies looking to bring machines up to speed and perform tasks hitherto only achievable by humans.
In the past few months alone, eBay has revealed big plans to roll out a new search feature that lets you use existing photos to find similar items, while online clothing retailer ASOS announced something similar in the fashion realm. Taking things to the next level, Shutterstock last week unveiled a neat new experimental feature that allows users to search for stock photos based on their spatial composition, and a few days back Google’s Photos app garnered a new image recognition feature for pets.
Put simply, things are getting pretty exciting in the field of computer vision, and we’re starting to see results from the growing investment across the AI sphere.
Video gaga
Many of the computer vision developments that have already made it into actual products involve static image-based applications, but we’re beginning to see the fruits of computer vision technology in video, too. Russian authorities deployed facial recognition smarts across the country’s CCTV network, for example. Pornhub is doing something similar to automatically categorize “adult entertainment” videos, including training the system to recognize specific sexual positions. Then there is the burgeoning autonomous vehicle industry that leans heavily on machines’ ability to understand real-world actions.
Event
Transform 2023
Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.
Against this backdrop, Google has launched a new video dataset it hopes will be used to “accelerate research” into computer vision applications that involve recognizing actions within videos. AVA, an acronym for “atomic visual actions,” is a dataset made up of multiple labels for people doing things in video sequences.
The challenge of identifying actions in videos is compounded in complex scenes where multiple actions are combined and carried out by different people.

Above: AVA: Example
“Teaching machines to understand human actions in videos is a fundamental research problem in Computer Vision, essential to applications such as personal video search and discovery, sports analysis, and gesture interfaces,” explained Google software engineers Chunhui Gu and David Ross, in a blog post. “Despite exciting breakthroughs made over the past years in classifying and finding objects in images, recognizing human actions still remains a big challenge.”
AVA is essentially a bunch of YouTube URLs annotated with a set of 80 atomic actions that extend across nearly 58 thousand video segments and cover everyday activities such as shaking hands, kicking, hugging, kissing, drinking, playing instruments, walking, and more.

Above: Google: AVA
By allowing anyone to access the dataset, Google is hoping to improve machines’ “social visual intelligence” so they can understand what humans are doing and anticipate what they may do next.
“We hope that the release of AVA will help improve the development of human action recognition systems, and provide opportunities to model complex activities based on labels with fine spatio-temporal granularity at the level of individual person’s actions,” the company said.
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.