Businesses with large data sets on their hands can rest a little easier tonight — at least if they’re Amazon Web Services subscribers. Amazon this evening announced general availability of AWS Lake Formation, a fully managed service that facilitates the building, securing, and management of data lakes. Customers with instances in the US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Tokyo), and Europe (Ireland) regions can take advantage starting today.
“Our customers tell us that Amazon S3 is the ideal place to house their data lakes, which is why AWS hosts more data lakes than anyone else — with tens of thousands and growing every day. They’ve also told us that they want it to be easier and faster to set up and manage their data lakes,” said AWS vice president of databases, analytics, and machine learning Raju Gulabani. “That’s why we built AWS Lake Formation, so customers can spend more time learning from their data and innovating, rather than wrestling that data into functioning data lakes. [W]e’re excited to see how customers use it as one of the building blocks for growing and transforming their businesses and customer experiences.”
For the uninitiated, Lake Formation — which was announced last November at AWS re:Invent conference in Las Vegas — automates a number of the steps typically involved in creating a data lake, including collecting, cleaning, deduplicating, and cataloging data and making the data available for analytics in provisioned and configured storage. Using Lake Formation, customers bring data into a data lake from a range of sources — using predefined templates — and then define policies to govern access by different groups within the organization while the data is automatically classified and prepared.
That’s not all Lake Formation can do. It provides a centralized dashboard where admins can manage the aforementioned data access policies, governance, and auditing across multiple analytics engines, in addition to a searchable catalog that describes available data sets and their use. In the coming months, it’ll enable engineers to analyze data within those data sets using their choice of AWS analytics and machine learning services, including (but not limited to) Amazon Redshift, Amazon Athena, and AWS Glue, with Amazon EMR, Amazon QuickSight, and Amazon SageMaker.
Panasonic Avionics Corporation, Accenture, online fashion and lifestyle platform Zalando, AI and big data software company Quantiphi, Life360, eye care products giant Alcon, and biotech company Amgen are among the early adopters of AWS Lake Formation. “We wanted to create a data platform with the ability to manage the security settings for all the different applications in our environment,” said Panasonic Avionics director of cloud and data services Anand Desikan. “[With] AWS Lake Formation, we can now define policies once and enforce them in the same way, everywhere, for multiple services we use, including AWS Glue and Amazon Athena. The enhanced level of control gives us secure access to data and metadata for columns and tables, not just for bulk objects, which is an important part of our data security and governance standard.”
The global data lake market is anticipated to reach $12.01 billion by 2024, according to Advanced Market Analytics. Microsoft offers its own fully managed solution in Azure Data Lake, while Google boasts a suite of data lake processing and analytics tools in Cloud Datalab, Dataproc, and Dataflow.