AWS catches up to competition with AI services for video and text understanding

Amazon Web Services launched a slew of machine learning-based services today that are aimed at making it easier for customers to embed intelligent capabilities in their applications. One new service offers video analysis, while a trio of language understanding APIs offer automatic transcription, translation, and document processing.

These tools are designed to make it easier for customers to reap the benefits of machine learning without requiring the expert knowledge necessary to build systems themselves. The services join Amazon's existing suite of pre-built AI capabilities for customers, including its Lex language understanding service, Polly text-to-speech offering, and Rekognition image recognition service.

It's a move by AWS to catch up with its major competitors Microsoft and Google, which already offer similar services, as do other cloud providers that AWS competes with.

A new Rekognition Video service will let customers automatically analyze footage that they have in the cloud to detect important entities, sentiment, celebrities, and more. It also offers the ability to provide information that computer programs can use to track where people are inside a scene.

To help customers get all of the relevant video information into the cloud, AWS launched Kinesis Video Streams. It's a service in general availability today that's designed to help customers securely ingest and store video, audio, and other time-encoded data like radar.

The announcement comes roughly a week after AWS announced updates to its Rekognition service that support recognizing faces in photos of crowds, along with real-time face-matching capabilities that make it possible to process large volumes of photos for matching with a central database of faces.

The new Transcribe service will, as its name implies, offer automatic transcription of long-form speech. It can process both high-quality recorded audio and recordings of phone conversations. AWS is starting the service with support for English and Spanish, and it plans to support additional languages soon.

Transcribe stands apart from other speech recognition services by focusing on generating transcripts with time stamps, as well as automatic punctuation generation that uses machine learning to make the resulting text more human-readable.

Customers will be able to translate text that they have in AWS, whether processed by Transcribe or brought in through other means, with a new Translate service. It offers automatic, machine learning-based translations for any text fed into it.

AWS also launched a service to provide applications with deeper understanding of content that they've been fed. Comprehend pulls out entities like people and places, plus key phrases and how positively users feel about the content in a document.

While that may not sound like much, that information can be used to help classify an otherwise difficult-to-process pile of documents, which has been a tough problem for computers to solve.

All of this comes as part of the AWS re:Invent conference in Las Vegas. Earlier today, the company announced a new SageMaker service that's designed to make it easier for developers to build custom machine learning models without deep expertise.

More