On Wednesday, just days ahead of Google’s Cloud Next conference, Amazon hosted its annual Amazon Web Services cloud computing conference, AWS Summit, at the Jacob K. Javits Convention Center in New York City. It didn’t hold back.
SageMaker, the Seattle company’s full-stack machine learning platform, got two major updates: SageMaker Streaming Algorithms and SageMaker Batch Transform. The former, which is available for neural network models created with Google’s TensorFlow, lets customers stream data from AWS’ Simple Storage Service (S3) directly into SageMaker GPU and CPU instances. The latter allows them to transfer large training datasets without having to break them up with an API call.
In terms of hardware, Amazon added Elastic Compute Cloud (EC2) to its Snowball Edge system, an on-premises Intel Xeon-based platform for data processing and collection. And it enhanced its local storage, compute, data caching, and machine learning inference capabilities via AWS Greengrass, AWS Lambda, and Amazon S3, enabling new categories of virtualized applications to run remotely in work environments with limited connectivity.
On the services front, Amazon Transcribe’s new Channel Synthesis tool merges call center audio from multiple channels into a single transcription, and Amazon Translate now supports Japanese, Russian, Italian, Traditional Chinese, Turkish, and Czech. Amazon Comprehend, Amazon’s natural language processing services (NLP), now boasts improved text analysis thanks to syntax identification.
Finally, Amazon revealed a slew of new and extended partnerships with major clients. Fortnite developer Epic Game said it’s building “new games [and] experiences” on AWS; 21st Century Fox will use Amazon’s cloud service for the “vast majority” of on-demand content delivery; Major League Baseball and Formula 1 are planning to tap AWS’ AI tools for real-time data analytics; and Celgene will leverage Amazon’s machine learning platform to expedite drug analysis and validation.
It’s a lot to take in. For a bit of context around this week’s announcements, I spoke with Dr. Matt Wood, general manager of artificial intelligence at AWS, who shed light on Amazon’s momentum in cloud computing, overarching trends in AI, and the problem of bias in machine learning models and datasets.
Here’s a transcript of our interview, which has been edited for length and clarity.
VentureBeat: Today, you announced SageMaker Streaming Algorithms, which allows AWS customers to train machine learning models more quickly. What was the motivation? Was this something for which customers expressed a deep desire?
Matt Wood: There are certain things across AWS that we want to invest in, and they’re the things that we think aren’t going to change over time. We’re building a business not for one year, 10 years, or 50 years, but 100 years — far in excess of when I’m going to be around and in charge of it. When you take that long-term view, you tend to put money not into the things you think are going to change, but into the things you think are going to stay the same.
For infrastructure, and for AWS — and this is true for machine learning as well — cost is really a big driver of that … It’s impossible for us to imagine our customers saying that they want the service to be more expensive, so we go out of our way to drive down costs.
A really good example is something we announced a couple of years ago that we call Trusted Advisor. Trusted Advisor is a feature you can turn on inside your AWS account that automatically, without you having to do anything, makes recommendations about how to reduce your AWS bill. We delivered over $300 million in annual savings to customers that way.
These are some of the advantages that the cloud provides, and they’re advantages that we want to maintain.
VentureBeat: On the client side of things, you announced a lot of strategic partnerships with Epic, Major League Baseball, and others, almost all of which said they’ll be using AWS as their exclusive cloud platform of choice. So what’s the movement there? What’s the feedback been like so far?
Wood: We see a lot of usage in sports analytics. Formula 1 chose AWS as their machine learning platform, Major League Baseball chose AWS as their machine learning platform, and the National Football League chose AWS as their machine learning platform. The reason for that is they want to drive better experiences for their viewers, and they see machine learning as a key piece of the advanced next-generation statistics they want to bring into their production environment — everything from route prediction [to] stat prediction.
That’s just one big area. Other areas are pharmaceuticals and health care. We have HIPAA compliance, which allows [our] customers to work with health care workloads, so we see a lot of momentum in disease prediction. We do diabetic retinopathy prediction, readmission prediction — all those sorts of things.
To that end, we announced [this week that] Bristol Myers Squibb is using SageMaker to accelerate the development of the innovative medicine that they build. Celgene is another really good example — Celgene actually runs Gluon, which is our machine learning library, on top of SageMaker, and they take advantage of the P3 GPUs with the Nvidia Volta under the hood. So, you know, that’s a really good example of the customer that has materially accelerated the ability to be able to bring drugs to market more quickly and more safely.
VentureBeat: Amazon offers a lot of machine learning services to developers, like Rekognition — your computer vision platform — and Amazon Translate. But you have a lot of competition in the space from Google, Microsoft, and others. So how are you differentiating your APIs and services from the rest out there?
Wood: Candidly, we don’t spend a [lot of] time thinking about what our competitors are up to — we tend to be way more customer-focused. We’ve launched 100 new services and features since Reinvent 2017, and no other provider has done more than half of that. I would say 90-95 percent of what we’ve launched has been directly driven by customer feedback, and the other 5-10 percent is driven by our attempts to read between the lines and try to figure out what customers don’t quite know to ask for yet.
SageMaker is really helpful in cases where customers have data which they believe has differentiating value. Then, there are application developers who may not have a lot of training data available or who just want to add some level of intelligence to their application quickly — that’s where Rekognition, Rekognition Video, Transcribe, Comprehend, Polly, Lex, and Translate come in.
We joke about this, but our broader mission is really to make machine learning boring and totally vanilla, just part of the course of doing business and another tool in the tool chest. Machine learning, we kind of forget, used to be a huge investment requirement in the hundreds of millions of dollars to get up and running. It was completely out of reach, and I think we’ve made huge progress in a very, very short amount of time.
We have a saying in Amazon: It’s still day one for the internet. And for machine learning, we haven’t even woken up and had our first cup of coffee yet. But there’s a ton of excitement and momentum. We have tens of thousands of active developers on the platform and 250 percent growth year over year. Eight out of 10 machine learning workloads run on AWS — twice as many as any other provider. And customers really value that focus on continuous platform improvement. I’m excited about where we’re headed.
VentureBeat: Voice recognition and natural language processing, in particular, are extremely competitive spaces right now. I know you said you don’t think too much about what your competitors are doing, but what kind of gains have you made relative to the market?
Wood: These services are off to a great start, and we see contact centers being a really big area.
A lot of customers use Amazon Lex as their first point of contact. The National Health Service (NHS) in the U.K. ran a pilot where they introduced a Lex chatbot, and it was able to handle 40 percent of their call volume. This is the centralized health provider in all of the U.K., so that’s really meaningful in terms of patients getting to talk to somebody more quickly, or NHS being able to operate its contact center more efficiently.
[This week] we announced Channel Splitting, where we were able to take call center recordings — two recordings, one of the agent and one of the customer — in the same file, split out the channel, transcribe them both independently, and merge the transcripts together. You get a single file out, and then you can take that and you can pass it off to Comprehend to find out what’s going on the in the conversation and what people were talking about. You can also run compliance checks to see if contact center agents are saying scripts exactly as they’re designed to be said.
From an efficiency perspective, large contact centers are expensive and difficult for most organizations to run, and from Lex through to the management, compliance, analytics, and insight you can get from the data there, we think they’re a really compelling AWS use case.
VentureBeat: Shifting gears a bit. You mentioned inclusion a bit earlier, and as you probably know, with respect to computer vision, we’ve got a long way to go — facial recognition is an especially difficult thing for developers and infrastructure providers to get right. So how do you think it might be tackled? How can we improve these algorithms that, for example, appear to be biased against people of color and certain ethnicities and races?
Wood: It’s the classic example of garbage in, garbage out. If you’re not really careful about where you get your data from and if you accidentally with good intentions introduce some selection criteria on the data in the perfect representative set, you’re going to introduce inaccuracies. The good news is that with machine learning, you can identify, measure, and systematically reduce those inaccuracies.
One of the key benefits of our services like SageMaker is the quicker you can train and retain models, the quicker you can identify areas of accuracy and start to narrow down the inaccuracies. So in that respect, any investment that we make, such as SageMaker Streaming Algorithms, contributes to spinning that flywheel faster and allows developers to iterate and build more sophisticated models that overcome some of the noise inside the data.
Basically, investment in our frameworks allows developers to build more sophisticated models, train models more quickly, and operate more efficiently in a production environment. All of it helps.