HPE acquires Pachyderm to boost AI dev

Hewlett Packard Enterprise (HPE) today announced that it has acquired privately-held open-source vendor Pachyderm to boost artificial intelligence (AI) development capabilities and enable reproducible AI at scale.

The San Francisco-based Pachyderm was founded in 2014 and had raised $28 million in funding to date. Financial terms of the acquisition are not being publicly disclosed.

Pachyderm develops an open-source based technology for data pipelines used to enable machine learning (ML) operations workflows. With Pachyderm, users can also define data transformation for how source data should be manipulated and configured so it is optimized for AI. The whole data pipeline approach is set up in a way that makes it easily reproducible, such that it's easier for data scientists to understand how data that flows into a model is collected and used.

Pachyderm will integrate with HPE's ML Development System

The Pachyderm technology is set to be integrated into the HPE Machine Learning Development System, which is an application suite that helps enterprises to build AI applications. The technology behind the HPE Machine Learning Development system was gained via the acquisition of Determined AI in 2021.

"Pachyderm has been a partner of ours for some time and we were regularly seeing them as a complementary technology in customer engagements," Evan Sparks, chief product officer for AI at HPE (and former cofounder of Determined AI), told VentureBeat. "We have been focused on training AI models and Pachyderm is focused on the data piece, the part that comes in before model training with getting data ready and doing it in a way that's reproducible."

The challenge of AI reproducibility

The issue of explainable AI has been a hot button topic in recent years.

The basic idea behind explainable AI is to not have a "black box" that just computes results without anyone being able to understand, or explain, how the results were achieved. Ensuring there isn't bias is a key goal of explainable AI, as is fairness.

An underlying component of enabling explainable AI is to have reproducible AI. The concept of reproducible AI is about having a set of steps for data collection, model creation and inference that are repeatable in a consistent manner.

"Our customers are folks that are trying to deploy AI at scale for real production use cases, for everything from insurance underwriting, to cars that drive themselves, to discovering new drugs that are going to be used in it to save lives," Sparks said. "Those sorts of use cases either have really strong financial consequences, or in some cases are life and death."

With those consequences in mind, Sparks said that enterprises really want a lot of confidence behind the models that they are deploying. A cornerstone of confidence is knowing that if an organization takes the same data, with the same model, that it will be able to generate the same output.

With Pachyderm, Sparks said that that goal is to make sure that the data pipeline, of how data comes from a source and into a model, is consistent and reproducible. He noted that Pachyderm's technology alone is not enough for a complete explainable AI approach, which also requires capabilities for model testing. Sparks said that HPE works with a number of different partner technologies to help support explainable AI capabilities for the model itself.

How Pachyderm works to enable reproducible AI

The Pachyderm technology has a number of different capabilities that help support reproducible AI efforts.

Sparks said that Pachyderm provides data lineage tracking, which is the ability to trace where data comes from. The technology also provides data versioning capabilities that enables data scientists to understand and manage different versions of data.

What stood out for Sparks in particular about the Pachyderm technology is its ability to transform data so it's useful for AI. He explained that for some use cases, there might be a need for an AI model to combine data coming from multiple sources.

As an example, an autonomous vehicle company will have computer vision data coming in from cameras in the car as well as LIDAR (light detection and ranging) data. That data probably lives in two different places and it comes in several formats. For the machine learning models to do their job, there is a need to combine that data first before training the model. That type of complex transformation is one that Pachyderm could help to enable in a reproducible approach.

Looking forward, Sparks said that the overall goal for the HPE AI product portfolio is to enable an end-to-end platform for model development and deployment at scale.

"We're looking at how we develop an end-to-end offering around AI at scale, and what it needs to look like," Sparks said. "Pachyderm is a very complementary piece to this overall portfolio view of the world."

Pachyderm will integrate with HPE's ML Development System

The challenge of AI reproducibility

How Pachyderm works to enable reproducible AI

More