Where Teradata could go with its data lakehouse

Last week Teradata offered its long-awaited response to the emergence of the data lakehouse. As VentureBeat's George Lawton reported, Teradata has always differentiated itself by stretching the capabilities of analytics, first with massively parallel processing on its own specialized machines, and more recently, with software-defined appliances tuned for variations in workloads — from compute-intensive to IOPS (input/output operations per second)-intensive. And since the acquisition of Aster Data Systems over a decade ago, Teradata morphed from solving big analytics problems to solving any analytics problem with a diverse portfolio of analytic libraries stretching SQL to new areas such as path or graph analytics.

With the cloud, we’ve been waiting for when Teradata would fully exploit cloud object storage, which is the de facto data lake. So the dual announcements last week of VantageCloud Lake Edition and ClearScape Analytics were logical next steps on Teradata’s journey to the data lakehouse. Teradata is finally making cloud storage a first-class citizen and opening it up to its wide analytics portfolio.

But unlike Teradata’s previous moves to parallelized and polyglot analytics, where it led the field, this time with the lakehouse, it has company. The announcement might not have mentioned the lakehouse word, but that’s what it was all about. As we noted several months back, almost everyone in the data world including Oracle, Teradata, Cloudera, Talend, Google, HPE, Fivetran, AWS, Dremio and even Snowflake has felt compelled to respond to Databricks, which introduced the data lakehouse.

Teradata’s path to the data lakehouse

Nonetheless, Teradata approaches the data lakehouse with some unique twists and is all about optimization. Teradata’s secret sauce has always been about highly optimized compute, interconnects, storage and query engines, along with workload management designed to run compute resources up to 95% utilization. When commodity hardware got good enough, Teradata introduced IntelliFlex where performance and optimizations could be configured through software. The capability to optimize for hardware not-invented-here opened the door to Teradata optimizing for AWS, and down the road, the other hyperscalers.

Teradata introduced VantageCloud a year ago, and late last year ran a 1,000+ node benchmark that no other cloud analytics provider has so far matched. But this was for a more conventional data warehouse using customary block storage.

The complication in making the lakehouse happen was developing a table format for data sitting in cloud object storage. That allows all the niceties associated with data warehouses, such as ACID transactions, which are key to ensuring consistency of data, more granular security and access controls, and raw performance. Databricks fired the first shot with Delta Lake, and more recently, other providers from Snowflake to Cloudera and others have embraced Apache Iceberg, the common thread being that this is all based on open source technology. For Lake Edition, Teradata went its own way with its own data lake table format, which the company claims delivers superior performance compared to Delta and Iceberg.

The other side of the lakehouse coin is software. Aside from its SQL engine, which has been designed to handle large, complex queries that can join up to hundreds of tables, Teradata has a large portfolio of analytic libraries that run in-database. This has been one of Teradata’s best-kept secrets. Largely the legacy of the Aster Data acquisition over a decade ago, these analytics were specially tuned to exploit the underlying parallelism, and they went well beyond SQL, encompassing functions such as n-Path, graph, time series analysis, and machine learning, all accessed through SQL extensions.

Formally branding the portfolio as ClearScape Analytics, Teradata is finally drawing attention to the fact that it is a holistic analytics platform and not simply a data warehouse, data lake or lakehouse. As part of the announcement, Teradata beefed up the time series and MLOps content. But when we deal with the data lake, data scientists are very opinionated on choosing their own languages or tools. And so, VantageCloud will also support a ring-our-own-analytics option for those preferring to write Python and work from Jupyter notebooks or their own workbenches, and currently has integrations with Dataiku, KNIME and Alteryx. ClearScape analytics will be available, both for VantageCloud Lake Edition and the standard Enterprise Edition.

Lake Edition and ClearScape Analytics are promising starts for Teradata as data lakehouse. There’s little question that Teradata’s scale and support of polyglot analytics made lakehouse a question of when, not if. And branding the analytics portfolio is more than just a marketing exercise, as it finally shines the spotlight on what had been a well-kept secret: Teradata’s differentiation goes beyond the optimized SQL engine and infrastructure to include analytics optimized for that engine. VantageCloud takes the analytics portfolio full circle by unleashing the portfolio on cloud object storage, and, with usage-based pricing, potentially opens up the portfolio for more discretionary workloads compared to the days when customers were running on-premises with firm ceilings on capacity.

A wish list for Teradata

That leaves our wish list for what Teradata should do next. In summary, we want to see Teradata venture further out of its comfort zone to draw new audiences of users. Admittedly, with the lakehouse, the challenge is not unique to Teradata, as Databricks, for example, looks to draw in business analysts while Snowflake courts data scientists.

To draw that new audience, Teradata should lower entry barriers and put open source on a more level footing with its proprietary environment. With Lake Edition, Teradata has dramatically lowered its entry pricing to $5,000/month. That is a marked drop from the six- and seven-figure annual contracts that Teradata customers typically pay, but we’d like to see Teradata go further with a freemium offering that allows new users to kick the tires. Heck, even incumbents not known for discount pricing like Oracle have embraced free tiers.

As for open source, there are a couple of pathways that we’d like to see Teradata further develop. The first is drawing non-Teradata users to ClearScape Analytics through optimized APIs to open source Delta and/or Iceberg data lakes. While performance might not be on par with Teradata’s own data lake table format, it could be made “good enough.”

Conversely, we’d like to see parallel efforts with so-called BYO analytics, drawing the Python crowd through optimized APIs with Teradata’s own data lake table format. For instance, we would like to see Teradata team up with Anaconda for juice performance of the Conda Python library portfolio, much as Anaconda is already doing with Snowflake. At the end of the day, it’s all about the analytics.

Teradata’s path to the data lakehouse

A wish list for Teradata

More