For Google, recognizing objects in photos is no longer a challenge. (See Google Photos for proof.) The next challenge is video. There’s more data to deal with, and videos are simply harder to summarize than images.
But Google’s YouTube has long been called the world’s second-largest search engine — second only to Google Search. While text can help Google return YouTube search results, the raw content of the video itself is not generally taken into consideration.
A few months ago, Google gave a big gift to the research community: the YouTube 8M data set. Perhaps not coincidentally, today Google updated that data set. The significance here is that image recognition research has been propelled by the availability of open data, specifically Stanford’s ImageNet and Microsoft’s COCO. Artificial intelligence (AI) systems require data in order to become smarter, and these organizations have stepped up to provide that raw material.
Google doesn’t just want to advance the state of the art for the benefit of all, though. It also wants to improve its products — in the same way that it brought Smart Replies to Gmail and instant visual translations to Google Translate. Surely Google wants YouTube to be the best damn place to find a video that relates to your query.
“If it could [recognize] a video of a cow jumping over a moon, or a cat jumping over a fence, that would be really cool,” Google senior fellow Jeff Dean said today in a meeting with reporters at Google’s inaugural TensorFlow Dev Summit at company headquarters.
That would mean Google would no longer need to rely on metadata like descriptions and comments for searches, Dean said. The underlying technology could make for better video recommendations, as well.
It’s not clear when YouTube might release the enhanced search capability.
Generally speaking, though, “video is maybe a few years behind where we are with images,” Dean said.