Researchers at Google, Facebook, and other companies are working hard to use artificial intelligence to understand what’s going on in videos — and in pictures, and in speech. Today Google showed off its latest breakthroughs in research, involving a trendy type of AI called deep learning.
This approach often involves ingesting lots of data to train systems called neural networks, and then feeding new data to those systems and receiving predictions in response.
In Google’s case, researchers tested out several methods in order to correctly recognize objects and interpret motion in videos of sports: recurrent neural networks and feature-pooling networks, in combination with widely used convolutional neural networks.
“We conclude by observing that although very different in concept, the max-pooling and the recurrent neural network methods perform similarly when using both images and optical flow,” Google software engineers George Toderici and Sudheendra Vijayanarasimhan wrote in a blog post today on their work, which will be presented at the Computer Vision and Pattern Recognition conference in Boston in June.
You can read the academic paper, or, to get a sense of Google’s latest video-processing capabilities, you can just watch this video: