Researchers at Google, Facebook, and other companies are working hard to use artificial intelligence to understand what’s going on in videos — and in pictures, and in speech. Today Google showed off its latest breakthroughs in research, involving a trendy type of AI called deep learning.
This approach often involves ingesting lots of data to train systems called neural networks, and then feeding new data to those systems and receiving predictions in response.
In Google’s case, researchers tested out several methods in order to correctly recognize objects and interpret motion in videos of sports: recurrent neural networks and feature-pooling networks, in combination with widely used convolutional neural networks.
“We conclude by observing that although very different in concept, the max-pooling and the recurrent neural network methods perform similarly when using both images and optical flow,” Google software engineers George Toderici and Sudheendra Vijayanarasimhan wrote in a blog post today on their work, which will be presented at the Computer Vision and Pattern Recognition conference in Boston in June.
You can read the academic paper, or, to get a sense of Google’s latest video-processing capabilities, you can just watch this video:
VentureBeatVentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the subjects of interest to you
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
- networking features, and more