Trifacta today announced it has integrated its data preparation tools with a data warehouse platform based on the open source Apache Spark framework provided by Databricks. This is in addition to repositories based on an open source data built tool (DBT) that is maintained by Fishtown Analytics.
In both cases, Trifacta is extending the reach of tools it provides for managing data pipelines to platforms that are widely employed in the cloud to process and analyze data, Trifacta CEO Adam Wilson said.
Trifacta traces its lineage back to a research project that involved professors from Stanford University and the University of California at Berkley and resulted in a visual tool that enables data analysts without programming skills to load data. In effect, Trifacta automated extract, transform, and load (ETL) processes that had previously required an IT specialist to perform.
There is no shortage of visual tools that let end users without programming skills migrate data. But Trifacta has extended its offerings to a platform that enables organizations to manage the data pipeline process on an end-to-end basis as part of its effort to meld data operations (DataOps) with machine learning operations (MLOps). The goal is to enable data analysts to self-service their own data requirements without requiring any intervention on the part of an IT team, Wilson noted.
Google and IBM already resell the Trifacta data preparation platform, and the company has established alliances with both Amazon Web Services (AWS) and Microsoft. Those relationships enable organizations to employ Trifacta as a central hub for moving data in and out of cloud platforms. The alliance with Databricks and the support for DBT further extend those capabilities at a time when organizations have begun to more routinely employ multiple cloud frameworks to process and analyze data, Wilson said.
In general, data engineering has evolved into a distinct IT discipline because of the massive amount of data that needs to be moved and transformed. While visual tools make it possible for data analysts to self-service their own data requirements, organizations are now also looking to programmatically move data to clouds as part of a larger workflow. Many individuals that have ETL programming expertise, often referred to as data engineers, are now in even higher demand than data analysts, Wilson said.
Once considered the IT equivalent of a janitorial task that revolved mainly around backup and recovery tasks, data engineering is now the discipline around which all large-scale data science projects revolve, Wilson noted. In fact, IT professionals with ETL skills have reinvented themselves to become data engineers, Wilson added.
“In the last 12 months, data engineering has become the hottest job in all of IT,” Wilson said.
It remains to be seen just how automated data engineering processes can become in the months and years ahead. Not only is there more data to be processed and analyzed than ever, the types of data that need to be processed have never been more varied. Going forward, a larger percentage of data will be processed and analyzed on edge computing platforms, where it is created and consumed. But the aggregated results of all that data processing will still need to be shared with multiple data warehouse platforms residing in the cloud and in on-premises IT environments.
Regardless of where data is processed, the sheer volume of data moving across the extended enterprise will continue to increase exponentially. The issue now is figuring out how to automate the movement of that data in a way that scales much more easily.
VentureBeatVentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the subjects of interest to you
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
- networking features, and more