Microsoft today announced previews for video optical character recognition (OCR), face detection, and emotion detection services from Azure Media Analytics, which itself is a recently launched portfolio of smart services built with Azure Media Services.

Video OCR, which is currently only available in a private preview, is more advanced than mere OCR in documents and even images, because it has to go through each individual frame of video.

“When used in conjunction with a search engine, you can easily index your media by text, and enhance the discoverability of your content,” Microsoft Azure Media Services program manager Adarsh Solanki wrote in a blog post. “This is extremely useful in highly textual video, like a video recording or screen-capture of a slideshow presentation.The Azure OCR Media Processor is optimized for digital text.”

The SRI lab has developed video OCR technology, but it hasn’t previously come to public clouds that Azure competes with, like Amazon Web Services or the Google Cloud Platform.

Facebook has also been experimenting with extraction of information from video — but specifically people, of course, and the words people say, in order to generate captions.

Microsoft has developed related but different technology: free public previews for face detection and emotion detection within video through Azure Media Analytics. This builds on the face tracking and emotion detection made possible by the Microsoft Project Oxford application programming interfaces (APIs), which have since been rebranded to Microsoft Cognitive Services.

“Multiple faces can be detected and subsequently be tracked as they move around, with the time and location metadata returned in a JSON file,” Microsoft Azure Media Services program manager Richard Li wrote in a blog post “During tracking, it will attempt to give a consistent ID to the same face while the person is moving around on screen, even if they are obstructed or briefly leave the frame.”

Happiness, surprise, sadness, anger, disgust, fear, and contempt can be detected, according to documentation.

The Face Detector Media Processor (MP) does have technical limitations. For instance, Azure Media Analytics can currently only detect a maximum of 64 faces per video, according to Li. And it can only work with .MP4, .MOV, and .WMV files right now, he wrote.

A Hyperlapse service is also available from Azure Media Analytics, alongside an indexer, video summarization, and content moderation.


VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more
Become a member