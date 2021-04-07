Join GamesBeat Summit 2021 this April 28-29. Register for free or grab a discounted VIP pass today.

Trifacta today announced it has integrated its data preparation tools with the data warehouse platform based on the open source Apache Spark framework provided by Databricks. This is in addition to repositories based on an open source data built tool (DBT) software for transforming data that is maintained by Fishtown Analytics.

In both cases, Trifacta is extending the reach of the tools it provides for managing data pipelines to platforms that are widely employed in the cloud to now process and analyze data, Trifacta CEO Adam Wilson said.

Trifacta traces its lineage back to a research project that involved professors from Stanford University and the University of California at Berkley and created a visual tool to enable data analysts to load data without having programming skills. In effect, Trifacta automated extract, transform, and load (ETL) processes that had previously required an IT specialist with programming skills to perform.

There is no shortage of visual tools that let end users without programming skills migrate data. Trifacta, in the meantime, has extended its offerings to a platform that enables organizations to manage the data pipeline process on an end-to-end basis as part of an effort to meld data operations (DataOps) with machine learning operations (MLOps). The goal is to enable data analysts to self-service their own data requirements without requiring any intervention on the part of an IT team, Wilson noted.

Google and IBM already resell the Trifacta data preparation platform, and the company has established alliances with both Amazon Web Services (AWS) and Microsoft. Those relationships enable organizations to employ Trifacta as a central hub for moving data in and out of cloud platforms. The alliance with Databricks and the support for DBT further extend those capabilities at a time when organizations have begun to more routinely employ multiple cloud frameworks to process and analyze data, Wilson said.

In general, data engineering has evolved into a distinct IT discipline because of the massive amount of data that needs to be moved and transformed. While visual tools make it possible for data analysts to self-service their own data requirements, organizations are now also looking to programmatically move data onto clouds as part of a larger workflow. Many of the individuals that have ETL programming expertise, often referred to as data engineers, are now in even higher demand than data analysts, Wilson said.

Once considered the IT equivalent of a janitorial task that revolved mainly around backup and recovery tasks, Wilson noted data engineering is now the discipline around which all large-scale data science projects now revolve. In fact, IT professionals that previously had ETL skills have reinvented themselves to become data engineers, Wilson added.

“In the last 12 months, data engineering has become the hottest job in all of IT,” Wilson said.

It remains to be seen just how automated data engineering processes can become in the months and years ahead. Not only is there more data to be processed and analyzed than ever; the types of data that need to be processed have never been more varied. Going forward, a larger percentage of data will be processed and analyzed on edge computing platforms, where it is created and consumed. But the aggregated results of all that data processing will still need to be shared with multiple data warehouse platforms residing in the cloud and in on-premises IT environments.

Regardless of where data is processed, the sheer volume of data moving across the extended enterprise will continue to exponentially increase. The issue now is figuring out how to automate the movement of that data in a way that scales much more easily.