Amazon peeled back the curtain on its SageMaker AI service today, unveiling how its customers are able to train machine learning models at massive scale while keeping costs down. The company uses novel techniques to keep the needed amount of computing power locked down while providing similar performance.
When SageMaker takes in data to train a model, it uses a streaming algorithm that only makes one pass over the data that it gets fed. While other algorithms can see exponential increases in the amount of time and processing power needed, Amazon’s algorithms don’t. (To be clear — the more data gets fed through SageMaker’s streaming algorithms, the more training the system does, but the computational cost of doing so remains constant over time, rather than scaling exponentially.) As data is streamed into the system, the algorithm adjusts its state — a persistent representation of the statistical patterns present in the information fed into SageMaker for training a particular system.
That state isn’t a trained machine learning model, though: It’s an abstraction of the data fed to SageMaker that can then be used to train a model. That provides a number of useful advantages, like making it easier for Amazon to distribute training of a model. SageMaker can compare the states of the same algorithms working on different data across multiple machines over the course of the training process, to make sure that all the systems are correctly sharing a representation of the data they’re being fed.
That same representation makes it easier to optimize the hyperparameters of a resulting machine learning model. Those parameters, which govern certain functions of the model, are key to creating the best machine learning system. Traditionally, data scientists would optimize those parameters by repeatedly training the same model with different parameters each time and picking the model that creates the most accurate final result.
However, that can be a time-consuming process, especially for models built using large amounts of data. With SageMaker, Amazon doesn’t have to do all the heavy lifting of retraining, since it can just use the streaming algorithm’s state.
All of this is in the service of creating a system that can handle incredibly large datasets running at global scale, something that’s important both for Amazon’s work on its own AI projects, as well as customers’ needs.
Amazon’s streaming algorithms are comparable to other, more traditional methods of training particular AI systems when it comes to accuracy, according to Swami Sivasubramanian, the company’s vice president of AI. However, those streaming algorithms only work with certain types of algorithms, like k-means clustering. SageMaker supports training other types of AI systems, including neural networks, but those require multiple passes.
It’s currently difficult to evaluate exactly what Amazon’s doing, since the company has yet to release a technical paper describing how SageMaker’s streaming algorithms work. Sivasubramanian said Amazon is holding off on publication for now, but pointed out that the company has a history of releasing papers describing its technical achievements (including one he coauthored with company CTO Werner Vogels).
Unsurprisingly, Sivasubramanian remained tight-lipped about Amazon’s exact plans for SageMaker in the future. But the company considers AI to be a key area of future product development, so it’s likely that we’ll see it continue to evolve from here.
Update March 20: This story has been updated to include more details on what sorts of AI systems SageMaker’s streaming algorithms support, as well as clarify