To further strengthen our commitment to providing industry-leading coverage of data technology, VentureBeat is excited to welcome Andrew Brust and Tony Baer as regular contributors. Watch for their articles in the Data Pipeline.
Prophecy, a company providing a low-code platform for data engineering, has launched a dedicated integration for Databricks, enabling anyone to quickly and easily build data pipelines on the Apache Spark-based data platform.
The task of developing data pipelines, which deliver vital data for business intelligence and machine learning, is a complex one. Dozens of data engineers have to program them individually and then run scripts to test, deploy and manage their entire workflow in production. The process takes a lot of time and is considered far from feasible, especially with the growing volume of internal and external data across enterprises.
Prophecy for Databricks
With this integration, anyone using Databricks, be it a seasoned data engineer or a non-programmer data citizen, can leverage a visual, drag-and-drop canvas to develop, deploy and monitor data pipelines. It turns the visual data pipeline into 100% open-source Spark code (PySpark or Scala), with interactive development and execution to verify that the pipeline works correctly every step of the way.
“The main benefit (of this integration) is productivity. Instead of data engineers having to manually code in notebooks, they can use Prophecy to quickly and easily drag-and-drop components to interactively create and test data pipelines, increasing their productivity,” Raj Bains, the CEO and cofounder of Prophecy, told Venturebeat.
“The next benefit is that it makes working with Apache Spark / Databricks accessible to non-programmers, dramatically increasing the pool of people that can do data engineering. Overall, these capabilities will enable companies to scale data engineering to keep up with the flood of incoming data,” he added.
How to connect?
Databricks users can integrate Prophecy with their existing data stack through the Partner Connect feature of the lakehouse platform. Once the solution is connected, it can be launched from within the Databricks’ user interface [UI] to simplify the orchestration and management of pipelines on any cloud. The solution will also support additional tools such as Delta Lake.
“From a technical standpoint, Databricks’ Partner Connect provides an easy on-ramp to Prophecy from the Databricks’ UI. With a few clicks, Databricks’ customers have access to Prophecy,” Bains said.
While data engineering companies like Matillion also offer integration with Databricks through Partner Connect, they are limited to transformations in SQL. Prophecy, as Bains emphasized, provides two things that no other such product provides – turning visual data pipelines into 100% open-source Spark code in Scala or PySpark and extensibility.
“In addition, Prophecy’s integration with Databricks is very deep and includes the support for Spark Streaming, Delta Lake, and Databricks Jobs for scheduling — no other product has such close and extensive integration,” he added.
According to IDC, global data creation is leaping at an annual growth rate of 23% and is expected to touch 181 zettabytes by 2025. In that situation, solutions like Prophecy will come in handy to keep up. The company, which raised $25 million earlier this year, is also looking to build integrations with other data platforms, including Snowflake.