Amazon’s re:Invent 2019 conference in Las Vegas kicked off with a bang — or rather, with product announcements made during a midnight keynote at The Venetian. The Seattle company’s Amazon Web Services (AWS) division unveiled Amazon Transcribe Medical, a new edition of its Transcribe speech recognition service that lets developers add medical speech-to-text capabilities to their apps, and it debuted DeepComposer, which enables AWS customers to compose music using AI and a physical (or virtual) MIDI controller.
On the transcription side of the equation, Amazon Transcribe Medical offers an API that integrates with voice-enabled apps and works with most microphone-equipped devices. It’s designed to transcribe medical speech for primary care, Amazon says, and to be deployed “at scale” across “thousands” of health care facilities to provide secure note-taking for clinical staff. It supports both medical dictation and conversational transcription, and like the standard Amazon Transcribe, Transcribe Medical features conveniences like automatic and “intelligent” punctuation.
Transcribe Medical is fully managed in that it doesn’t require provisioning or management of services — it sends back a stream of text in real time. Moreover, it’s covered under AWS’ HIPAA eligibility and business associate addendum (BAA), meaning that any customer that enters into a BAA with AWS can use Transcribe Medical to process and store personal personal health information (PHI).
Amazon says that already, Amgen and SoundLines are using Transcribe Medical to produce text transcripts from recorded notes and feed transcripts into downstream analytics. “For the 3,500 health care partners relying on our care team optimization strategies for the past 15 years, we’ve significantly decreased the time and effort required to get to insightful data,” said SoundLines president of technology Vadim Khazan in a statement.
Transcribe Medical’s availability in the AWS regions U.S. East (North Virginia) and U.S. West (Oregon) comes months after Amazon made three of its AI-powered, cloud-hosted products — Translate, Comprehend, and Transcribe — eligible under the Health Insurance Portability and Accountability Act of 1996, or HIPAA. It’s the principal law providing data privacy and security provisions for medical information in the U.S.
It’s worth noting that Amazon isn’t the only tech giant offering speech recognition products targeting the health care segment. Microsoft this year said it would team up with Nuance to host the latter’s AI software that understands patient-clinician conversations, which it integrates with medical records. For its part, rival Philips has long offered tailor-made automatic transcription solutions for health care professionals in public hospitals and small practices.
In a somewhat related reveal this morning, AWS detailed DeepComposer, which it calls the “world’s first” machine learning-enabled musical keyboard. It’s a 32-key, two-octave keyboard designed for developers to try their hand at either pretrained or custom AI models.
Budding composers first record a short musical tune (or use a prerecorded one) before selecting a model for their favorite genre, as well as the model’s architecture parameters and the loss function (which is used during training to measure the difference between the algorithm’s output and expected value). Next, they choose hyperparameters (parameters whose values are set before the learning process begins) and a validation sample, after which DeepComposer produces a composition that can be played in the AWS console or exported or shared on SoundCloud.
As AWS AI and machine learning evangelist Julien Simon explains in a blog post, DeepComposer taps a generative adversarial network (GAN) to fill in compositional gaps in songs. A generator component draws on random data to create samples that it forwards to a discriminator bit, which learns to distinguish genuine samples from fake samples. As the discriminator improves, so does the generator, such that the generator progressively learns how to create samples closer to those that are genuine.
It’s akin to efforts by Google, OpenAI, and others to generate music with tonal and melodic consistency. Late last year, Project Magenta, a Google Brain effort “exploring the role of machine learning as a tool in the creative process,” presented Musical Transformer, a model capable of generating songs with recognizable repetition. And in April, OpenAI debuted MuseNet, an AI system capable of creating novel 4-minute songs with 10 different instruments across styles from country to Mozart to the Beatles.
Developers can apply to receive a DeepComposer keyboard once it becomes available, or use the new virtual keyboard in the AWS console.
SageMaker Operators for Kubernetes
Lastly, AWS launched Amazon SageMaker Operators for Kubernetes, which lets data scientists using Kubernetes train, tune, and deploy AI models in Amazon’s SageMaker machine learning development platform. AWS customers can install SageMaker Operators on Kubernetes clusters to create Amazon SageMaker jobs natively using the Kubernetes API and command-line Kubernetes tools.
Specifically, users can make calls to SageMaker that kick off services like Managed Spot Training, which distributes model training to reduce training time by scaling to multiple nodes with graphics chips. Compute resources are preconfigured and optimized, only provisioned when requested, scaled as needed, and shut down automatically when jobs complete. Additionally, hyperparameters are optimized automatically, and fully trained models are deployed to fully managed autoscaling clusters spread across multiple datacenters.
“Now with Amazon SageMaker Operators for Kubernetes, customers can continue to enjoy the portability and standardization benefits of Kubernetes … along with integrating the many additional benefits that come out-of-the-box with Amazon SageMaker, no custom code required,” wrote AWS Deep Learning senior product manager Aditya Bindal in a press release.
Amazon SageMaker Operators for Kubernetes are generally available in AWS server regions including US East (Ohio), US East (N. Virginia), US West (Oregon), and EU (Ireland).
The audio problem: Learn how new cloud-based API solutions are solving imperfect, frustrating audio in video conferences. Access here