, provider of a framework for managing machine learning operations (MLOps), today announced updates to its open source Data Version Control (DVC) and Continuous Machine Learning (CML) open source projects.

CML is an open source library for automating tasks such as model training and evaluation, comparing ML experiments across their project history, and monitoring changing datasets. Deployed as a set of Docker containers, it enables IT teams to apply many of the same DevOps automation principles that are used to build applications to the development of AI models by using a continuous integration and continuous delivery (CI/CD) platform, CEO and founder Dmitry Petrov said.

The latest version of CML adds a command cml-runner that streamlines configuring and provisioning of cloud instances from within a Git repository in a way that reduces bash scripting clutter. It also provides support for an Iterative Terraform Provider for configuring cloud services that replaces the need to install Docker Machine.

DVC provides a Git-like interface for managing version control of data and models. It is built on top of Git, allowing users to create lightweight metafiles through which MLOps teams can more easily manage the large files that are typically required to train an AI model. Those files can be stored in the cloud or using on-premises network storage platforms, rather than requiring organizations to store every file in a Git repository, such as GitHub, GitLab, or Bitbucket.

The latest version of DVC adds templates for creating ML pipelines and iterative Foreach stages, access to lightweight ML experiments, ML model checkpoints, and an open source library for metrics logging. contends rival platforms are too prescriptive and is making a case for a more modular alternative to proprietary AI platforms, such as AWS SageMaker, Microsoft Azure ML Engineer, and Domino Data Labs. That approach also provides data science teams with the ability to swap best-of-breed tools in and out, instead of being forced to only employ the tools made available by a single vendor, Petrov said.

“I don’t believe in monolithic approaches,” Petrov said. “AI teams should be able to replace one tool with another.”

Regardless of their approach to MLOps, organizations of all sizes are now trying to share components to accelerate development of AI models. It can currently take months for a data science team to create an AI model. But that process can be significantly shortened if it’s possible to reuse files, pipelines, experiments, and even entire models stored in a Git repository. In effect, and other platforms are enabling organizations to manage the AI development lifecycle using the same processes developers employ to accelerate software development. That’s especially critical as organizations realize that AI models will need to be both continuously updated and ripped and replaced as new data sources become available.

Those processes can also span multiple organizations that are increasingly collaborating on the development of AI models, Petrov noted. In many of those cases, ML artifacts will need to be shared across multiple cloud and on-premises platforms. It’s unlikely two or more organizations will have standardized on the same proprietary platform to construct AI models.

It’s too early to say to what degree organizations will standardize on open source tools and platforms for MLOps. Most of the organizations building AI models are using open source tools such as TensorFlow, which Petrov noted suggests many of those organizations are already predisposed to use open source software.

The one thing that is apparent is that AI model building is entering a new phase of industrialization. In place of something painstakingly managed by individual data science teams, organizations are looking to transform the building of AI models into a bonafide manufacturing process.


VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more
Become a member