Amazon AWS launches Redshift ML to let developers train models with SQL

Amazon today announced the general availability of Redshift ML, which lets customers use SQL to query and combine structured and semi-structured data across data warehouses, operational databases, and data lakes. The company says Redshift ML can be used to create, train, and deploy machine learning models directly from an Amazon Redshift instance.

In the past, Amazon Web Services (AWS) customers who wanted to process data from Amazon Redshift to train an AI model would have to export the data to an Amazon Simple Storage Service (Amazon S3) bucket and configure and start training. This required many different skills and usually more than one person to complete, raising the barrier to entry for enterprises looking to forecast revenue, predict customer churn, detect anomalies, and more.

With Redshift ML, customers can create a model using an SQL query to specify training data and the output value they want to predict. For example, to create a model that predicts the success rate of marketing activities, a customer might define their inputs by selecting database columns that include customer profiles and results from previous marketing campaigns. After running an SQL command, Redshift ML exports the data from Amazon Redshift to an S3 bucket and calls Amazon SageMaker Autopilot to prepare the data, select an algorithm, and apply the algorithm for model training. Customers can select the algorithm to use if they opt not to defer to SageMaker Autopilot.

Redshift ML handles all of the interactions between Amazon Redshift, S3, and SageMaker, including the steps involved in training. When the model has been trained, Redshift ML uses Amazon SageMaker Neo to optimize the model for deployment and makes it available as an SQL function. Customers can use the SQL function to apply the model to their data in queries, reports, and dashboards.

Redshift ML is available today in the following AWS regions:

U.S. East (Ohio)
U.S. East (North Virginia)
U.S. West (Oregon)
U.S. West (San Francisco)
Canada (Central)
Europe (Frankfurt)
Europe (Ireland)
Europe (Paris)
Europe (Stockholm)
Asia Pacific (Hong Kong)
Asia Pacific (Tokyo)
Asia Pacific (Singapore)
Asia Pacific (Sydney)
South America (São Paulo)

With Redshift ML, customers only pay for what they use. When training a new model, they pay for the Amazon SageMaker Autopilot and S3 resources used by Redshift ML. And when making predictions, there's no additional cost for models imported into their Amazon Redshift cluster. Redshift ML also allows customers to use existing Amazon SageMaker endpoints for inference. In that case, the usual SageMaker pricing for real-time inference applies.

Amazon Redshift, which launched in preview in 2012 and in general availability a year later, is based on an older version of the open source relational database management system PostgreSQL 8.0.2. According to a Cloud Data Warehouse report published by Forrester in Q4 2018, Amazon Redshift has the largest number of Cloud data warehouse deployments, with more than 6,500 to date.

More