Head over to our on-demand library to view sessions from VB Transform 2023. Register Here
Databricks announced today that it is acquiring the privately-held data governance platform vendor Okera. The plan is for Okera’s technology to be integrated into Databricks’ existing data governance solution, Unity Catalog, providing more AI-powered functionality.
“By bringing on the talented Okera team and leveraging their domain expertise, we’ll accelerate the Unity Catalog roadmap and provide best-in-class governance for the lakehouse,” Reynold Xin, Databricks cofounder and chief architect, told VentureBeat.
Financial terms of the deal have not been publicly disclosed.
Based in San Francisco, Okera was founded in 2016 and raised $29.6 million in funding prior to being acquired. Okera’s focus in recent years has been on using artificial intelligence for data governance and data security.
VB Transform 2023 On-Demand
Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.
Databricks, on the other hand, has raised a staggering $3.5 billion in venture capital to build out its data lakehouse and AI technologies. Databricks has recently been making headlines for its entry into the generative AI space with the launch of Dolly, its ChatGPT clone.
Databricks and Okera were hardly strangers prior to the acquisition announcement. Xin noted that Nong Li, Okera’s co-founder and CEO, is widely known for creating Apache Parquet, which is an open-source standard storage format that Databricks and the rest of the industry builds on. Li has also previously worked at Databricks and led the vectorized Parquet and codegen efforts that resulted in Apache Spark 2.0’s 10x performance improvement.
What Okera brings to Databricks
Whether it’s for analytics or machine learning (ML), data is foundational. Being able to properly govern that data is critical both for accuracy as well as security and compliance.
Xin said that with Okera, customers will be able to use AI to discover, classify and govern all their data, analytics and AI assets with attribute-based and intent-based access policies. Governance is also about observability — which is another area where Okera’s technology will help. Xin noted that Okera will help to support Databricks’ data observability on the lakehouse, enabling organizations to centrally audit and report sensitive data usage across analytics and AI applications.
Going a step further, the combination of Okera and Databricks will enable users to automatically trace data lineage down to the column level.
“The idea is that customers will get a holistic view of their data estate across clouds,” Xin said.
New security controls are on the way
Part of governance is also being able to provide the necessary controls to allow only authorized access. That’s an area where Okera’s technology will also be helpful to the Databricks platform in the future.
“Okera has also been developing a new isolation technology that can support arbitrary workloads while enforcing governance control without sacrificing performance,” Xin said. “It will help enterprises cover the whole spectrum of applications in the new world efficiently.”
The isolation technology is currently in private preview and has been tested by a number of joint Databricks and Okera customers on their AI workloads already.
Guardrails or governance? What’s needed for AI?
As AI becomes more powerful and versatile, the question of how to ensure its safety and ethical use has gained urgency. One of the leading companies in the field, Nvidia, unveiled a new initiative last month called NeMo Guardrails, which aims to help developers monitor and regulate the output of generative AI models that can create realistic text, images and speech.
Xin and Databricks also see the need for guardrails, as well as governance for AI.
“In this new world of AI, managing guardrails on the underlying data that AI models, like LLMs, are trained on is critical to mitigating biases and maintaining compliance if they’re trained on private data,” Xin said. “For transparency, it’s also critical to be able to trace data lineage so you can be sure these models are relevant, up-to-date and trustworthy.”
Xin commented that Okera’s AI-driven tagging classification for all data and AI assets provides a holistic view of sensitive data, like personally identifiable information (PII). He add that it will help customers enforce those guardrails — not only on the underlying data, but also ML models and features
“AI can provide extreme value to organizations looking to harness their data, but as plenty of AI pioneers have pointed out, it can also be misused, which is why thoughtful guidelines are necessary,” Xin said. “The way we both see it, the principles of governance — accountability, standardization, compliance, quality and transparency — apply as much to AI as to data.”
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.