Driving data transformations with column-aware metadata

Product architecture is everything. If you don’t believe me, take it from Snowflake CEO Frank Slootman’s new book Amp it Up. His most prolific quote? “All of our successes at the three companies where I’ve been CEO trace back to superior architecture.”

Thoughtful and premeditated product architecture is the single most important best friend you’ll have in this industry. It’s the answer to every fear-based question: “What about your competitor who’s raised [x]$ money?” “Why would customers choose your product over something else?” – and the answer would be because your product has been built from the ground up with something that’s almost impossible to replicate: a column-aware architecture.

Take the classic example of a house. Think of a smart house that is pre-designed to use geothermal energy. You dig the coils underground, build the house, and connect it to your IoT, seeing real-time savings and can make decisions on how to use your energy. Now think of a house from the ’50s with owners who want to retrofit geothermal energy to its brick masonry foundation. It’s extremely expensive, the property may not even be compatible for digging, and your house still can’t talk to your energy usage seamlessly.

This is the unique value of a built-from-scratch, column-aware product in the data transformations market. Poetically similar to how companies reacted when Snowflake pioneered the “compute vs. storage” concept, players in the data transformations space will soon try to add column-aware metadata to their messaging. However, trying to retrofit column-aware metadata to an already existing platform simply won’t scale. It will result in less accuracy, expensive costs, UI pitfalls and difficult inner workings of the application layers, ultimately leading to brittle integrations and a poor user experience.

For all readers involved, what exactly is column-aware metadata? It’s the ability to leverage column names and mappings for easily applying transformations within a data set. For example, when creating a type two dimension, you can easily identify and track changes from specific columns such as address, name, phone number or any other column in your table. Column-level lineage is a profound problem for organizations trying to be data-driven and is compounded by how large the scale of the project is. Being column-aware also allows users to generate SQL in a graphical interface vs. a code-driven IDE that requires that input manually.

Benefits of a column-aware architecture include:

One other area that is often overlooked in the data transformation category is the importance of tracking the state of your data warehouse specifically for change management. Data warehouses and data projects live in a constantly changing state. Because of that, having visibility into historical changes or in other words, time-lineage is key for data explainability governance. State management is also critical for persistent data environments that require incremental changes or change-tracking to accommodate streams. When managing a project at scale with thousands of tables and tens of thousands of columns across dozens of business units, being able to preview and apply changes is a must.

Without valuing state management or column-awareness, data teams are destined to end up with data warehouses that are disorganized, inefficient, poorly governed and likely will become unsuccessful as the project grows. Simply put, a column-aware architecture and state management built from the ground up is good for the environment, just like geothermal energy.

Satish Jayanthi is the cofounder and CTO of Coalesce.

Welcome to the VentureBeat community!

Our guest posting program is where technical experts share insights and provide neutral, non-vested deep dives on AI, data infrastructure, cybersecurity and other cutting-edge technologies shaping the future of enterprise.

Read more from our guest post program — and check out our guidelines if you’re interested in contributing an article of your own!

More