SageMaker Serverless Inference illustrates Amazon's philosophy for ML workloads

Amazon just unveiled Serverless Inference, a new option for SageMaker, its fully managed machine learning (ML) service. The goal for Amazon SageMaker Serverless Inference is to serve use cases with intermittent or infrequent traffic patterns, lowering total cost of ownership (TCO) and making the service easier to use.

VentureBeat connected with Bratin Saha, AWS VP of Machine Learning, to discuss where Amazon SageMaker Serverless fits into the big picture of Amazon's machine learning offering and how it affects ease of use and TCO, as well as Amazon's philosophy and process in developing its machine learning portfolio.

Amazon SageMaker is on an ever-growing trajectory

Inference is the productive phase of ML-powered applications. After a machine learning model has been created and fine-tuned using historical data, it is deployed for use in production. Inference refers to taking new data as input and producing results based on that data. For production ML applications, Amazon notes, inference accounts for up to 90% of total compute costs.

According to Saha, Serverless Inference has been an oft-requested feature. In December 2021, SageMaker Serverless Inference was introduced in preview, and as of today, it is generally available.

Serverless Inference enables SageMaker users to deploy machine learning models for inference without having to configure or manage the underlying infrastructure. The service can automatically provision and scale compute capacity based on the volume of inference requests. During idle time, it turns off compute capacity completely so that users are not charged.

This is the latest addition to SageMaker's options for serving inference. SageMaker Real-Time Inference is for workloads with low latency requirements in the order of milliseconds. SageMaker Asynchronous Inference is for inferences with large payload sizes or requiring long processing times. SageMaker Batch Transform to run predictions on batches of data, and SageMaker Serverless Inference is for workloads with intermittent or infrequent traffic patterns.

SageMaker Serverless Inference comes on the heels of the SageMaker Inference Recommender service, introduced among a slew of AI and machine learning announcements at AWS re:Invent 2021. Inference Recommender helps users with the daunting task of choosing the best out of the 70 plus available compute instance options, and managing configuration to deploy machine learning models for optimal inference performance and cost.

Overall, as Saha said, reducing TCO is a top priority for Amazon. In fact, Amazon has published an extensive analysis on the TCO of SageMaker. According to that analysis, Amazon SageMaker is the most cost-effective choice for end-to-end machine learning support and scalability, offering 54% lower TCO than other options over three years.

Of note here, however, is what those "other options" are. In its analysis, Amazon compares SageMaker to other self-managed cloud-based machine learning options on AWS, such as Amazon Elastic Compute Cloud EC2 and Amazon Elastic Kubernetes Service EKS. According to Amazon's analysis, SageMaker results in lower TCO when factoring in the cost of developing the equivalent of the services it offers from scratch.

That may be the case, but arguably, users might find a comparison to services offered by competitors such as Azure Machine Learning and Google Vertex AI more useful. As Saha related, Amazon's TCO analysis reflects its philosophy of focusing on its users, rather than the competition.

Another key part of Amazon's philosophy according to Saha is striving to build an end-to-end offering, and prioritizing user needs. Product development has a customer-driven focus: customers are consulted regularly, and it's their input that drives new feature prioritization and development.

SageMaker seems to be on an ever-growing trajectory, which also includes expanding the scope in terms of target audience. With the recent introduction of SageMaker Canvas for no-code AI model development Amazon wants to enable business users and analysts to create ML-powered applications as well.

SageMaker Serverless Inference and Amazon's double bottom line with SageMaker

But what about Amazon's double bottom line with SageMaker - better ease of use and lower TCO?

As Tianhui Michael Li and Hugo Bowne-Anderson note in their analysis of SageMaker’s new features on VentureBeat, user-centric design will be key in winning the cloud race, and while Sagemaker has made significant strides in that direction, it still has a ways to go. In that light, Amazon's strategy of converting more EC2 and EKS users to SageMaker and expanding the scope to include business users and analysts makes sense.

According to a 2020 Kaggle survey, SageMaker usage among data scientists is at 16.5%, even though overall AWS usage is at 48.2% (mostly through direct access to EC2). At this point, it looks like only Google Cloud offers something comparable to Serverless Inference, via Vertex Pipelines.

At first glance, SageMaker seems more versatile as for supported frameworks, and more modular compared to Google Vertex AI – something which Saha also highlighted as an area of focus. Vertex Pipelines seems to correspond to SageMaker Model Building Pipelines, but is end-to-end serverless.

As Li and Bowne-Anderson note, while Google’s cloud service holds a third-place ranking overall (behind Microsoft Azure and AWS), it holds a strong second place for data scientists according to the Kaggle Survey.

The introduction of Serverless Inference plays into the ease of use theme, as not having to configure instances is a big win. Saha told VentureBeat that switching between different inference options is possible, and it's done mostly via configuration.

As Saha noted, Serverless Inference can be used to deploy any machine learning model, regardless of whether it has been trained on SageMaker or not. SageMaker’s built-in algorithms and machine learning framework-serving containers can be used to deploy models to a serverless inference endpoint, but users can also choose to bring their own containers.

If traffic becomes predictable and stable, users can update from a serverless inference endpoint to a SageMaker real-time endpoint without the need to make changes to their container image. Using Serverless Inference, users also benefit from SageMaker’s features, including built-in metrics such as invocation count, faults, latency, host metrics and errors in Amazon CloudWatch.

Since its preview launch, SageMaker Serverless Inference has added support for the SageMaker Python SDK and model registry. SageMaker Python SDK is an open-source library for building and deploying ML models on SageMaker. SageMaker model registry lets users catalog, version and deploy models to production.

Ease of use and TCO

Ease of use may be hard to quantify, but what about TCO? Surely, Serverless Inference should reduce TCO for the use cases where it makes sense. However, Amazon does not have specific metrics to release at this point. What it does have, however, is early adopter testimonies.

Jeff Boudier, director of product at Hugging Face, reports having tested Amazon SageMaker Serverless Inference and being able to significantly reduce costs for intermittent traffic workloads while abstracting the infrastructure.

Lou Kratz, principal research engineer at Bazaarvoice, says that Amazon SageMaker Serverless Inference provides the best of both worlds, as it scales quickly and seamlessly during bursts in content and reduces costs for infrequently used models.

SageMaker Serverless Inference has increased the maximum concurrent invocations per endpoint limit to 200 for the GA launch from 50 during preview, enabling use of Amazon SageMaker Serverless Inference for high-traffic workloads. The service is now available in all the AWS Regions where Amazon SageMaker is available, except for the AWS GovCloud (U.S.) and AWS China.

Amazon SageMaker is on an ever-growing trajectory

SageMaker Serverless Inference and Amazon's double bottom line with SageMaker

Ease of use and TCO

More