Google Cloud federates warehouse and lake, BI and AI

Google Cloud is making a series of announcements today, covering a range of its data, analytics and AI services. A combination of preview and general availability (GA) releases are being launched today that, together, will shore up Google's data and AI story, as it competes with Amazon Web Services (AWS) and Microsoft Azure.

In a blog post, Gerrit Kazmaier, Google Cloud's GM for databases, data analytics, and Looker said "With the dramatic growth in the amount and types of data, workloads, and users, we are at a tipping point where traditional data architectures — even when deployed in the cloud — are unable to unlock its full potential. As a result, the data-to-value gap is growing."

Perhaps in response, the overarching theme to Google's announcements today is bringing things together. Google Cloud's data warehouse and data lake will be more integrated; Google's organically developed business intelligence (BI) components will work in a more coordinated way with the Looker BI technology that Google acquired in 2020; and Google's analytics and AI components will work together more seamlessly as well.

A warehouse near the lake

Perhaps the most important of today's announcements is the launch in preview of a new data lake offering, called BigLake. As you might imagine from the name, this service will make data lakes stored in Google Cloud Storage (GCS) far better integrated with BigQuery, Google Cloud's data warehouse service. Not only will Google Cloud customers be able to query data in the lake and warehouse together, from services like Spark, Presto and even TensorFlow, but the security and governance of data in the lake and the warehouse can be unified as well.

This coordination of lake and warehouse will resonate with fans of the so-called lakehouse model, while still respecting that data lake and data warehouse technologies each have relative strengths. In other words, customers will have a choice of which data to store where, and can still have a unified query and governance experience. GA of this service will likely come by the end of the calendar year.

Google is also announcing something called Spanner change streams, a change data capture service that will replicate data in real time from Google Cloud Spanner into BigQuery, Pub/Sub or Google Cloud Storage. This offering seems quite comparable to Microsoft's Azure Cosmos DB change feed. This service isn't available yet, but Google says it's "coming soon."

A big (BI) deal

Six years ago, Google brought out its self-service BI product called Google Data Studio, making it easy for business users to create visualizations on data stored in a variety of repositories and platforms. Later, extensions were made to make Google Sheets more data-savvy, too. But then Google Cloud acquired indie BI player Looker as well, leaving customers and industry journalists (including this one) to wonder what the future held for Data Studio.

Google is clarifying that story today, explaining that Google Data Studio can now connect to data contained in Looker models, and that Google Connected Sheets can do likewise. Looker, you see, includes the Explore data query and visualization front-end, but it also has a back-end of sorts, allowing customers to create comprehensive models that blend data from different sources, and which define the elements of that blended data that constitute the model's measures (metrics) and dimensions (categories, like product, time, and location, used to aggregate or drill down on the metrics).

Looker models are created in a special language called LookML (the "ML" stands for markup language, not machine learning) and those models will now be readable by Google Data Studio and Google Sheets, allowing them to serve developers, enterprise BI analysts, self-service BI business users and spreadsheet users as well.

AI, meet BI

Google has, for quite some time, seen itself as the leading contender to create the first-class cloud for artificial intelligence (AI). And while the company's AI prowess is quite apparent, Google Cloud's AI was until recently more a collection of individual services. The assortment included a cloud TensorFlow service, an array of Web API-based cognitive services, and an in-database AI service called BigQuery ML (where, this time, the ML does stand for "machine learning"). Meanwhile, Microsoft's Azure Machine Learning and AWS' SageMaker were offering more integrated machine learning platforms, even if sometimes by virtue of a common brand.

Google's answer to this was its Vertex AI service, released to general availability in May of last year. And here again, Google Cloud is focusing on cohesion and integration. An important part of the service, Vertex AI Workbench, being released to GA today, integrates natively with BigQuery, Serverless Spark, and Dataproc.

Today, Google is adding a new Model Registry to Vertex AI. Think of a model registry in the machine learning world as comparable to a data catalog in the database and analytics world, in that it's a searchable, central repository and governance tool for all of an organization's machine learning models. Google also points out, maintaining that overarching theme of unification, that the model registry will catalog models living both in Vertex AI and in BigQuery ML.

Analytics stack redux

What's interesting about all of Google's announcements today, is how reminiscent they are of patterns that have shown up in the analytics and BI worlds already. For example, creating a side-by-side data warehouse/data lake environment is very much like what Microsoft's Azure Synapse Analytics had done already: bring together the former Azure SQL Data Warehouse with Azure Data Lake Storage, Spark and a data lake query engine.

On the BI side, bringing together native and acquired technologies is very reminiscent of what Microsoft, IBM, SAP, and Oracle did back in the 2000s when they made their own BI acquisitions, of ProClarity, Cognos, BusinessObjects and Hyperion, respectively. Even the notion of Google using Looker's semantic layer technology to glue it together with Data Studio and Connected Sheets is not unprecedented. To this day, BusinessObjects "Universes," also a semantic data model technology, are a centerpiece of SAP's BI story, both on-premises and in the company's Analytics Cloud service.

In many ways, the major cloud providers of today mirror the enterprise "mega vendors" of fifteen to twenty years ago. And, fittingly, Google Cloud's data and analytics announcements today show that the enterprise stack model is very much alive, even in the era of the cloud.

A warehouse near the lake

A big (BI) deal

AI, meet BI

Analytics stack redux

More