Algorithmia founder on MLOps' promise and pitfalls

MLOps, a compound of machine learning and information technology operations, sits at the intersection of developer operations (DevOps), data engineering, and machine learning. The goal of MLOps is to get machine learning algorithms into production.

While similar to DevOps, MLOps relies on different roles and skill sets: data scientists who specialize in algorithms, mathematics, simulations, and developer tools, and operations administrators who focus on upgrades, production deployments, resource and data management, and security. While there is significant business value to MLOps, implementation can be difficult in the absence of a robust data strategy. Kenny Daniel, founder and CTO of Algorithmia, the company behind the enterprise MLOps platform, spoke with VentureBeat about the buzz around MLOps, its benefits, and its challenges.

This interview has been edited for clarity and brevity.

VentureBeat: How does MLOps work?

Kenny Daniel: MLOps is applying the lessons of DevOps and software engineering best practices to the world of machine learning. MLOps includes all the capabilities that data science, product teams, and IT operations need to deploy, manage, govern, and secure machine learning and other probabilistic models in production. MLOps combines the practice of AI/ML with the principles of DevOps to define an ML lifecycle that exists alongside the software development lifecycle (SDLC) for a more efficient workflow and more effective results. Its purpose is to support the continuous integration, development, and delivery of AI/ML models into production at scale.

We break down MLOps specifically into 10 core capabilities across the Deployment and Operations stages of the three-step ML lifecycle (Development, Deployment, Operations). Across the Deployment phase of the ML lifecycle we have:

Training integration -- broad language and framework support for any DS tooling.
Data services -- native data connectors for popular platforms, as well as permissions and access controls.
Model registration integrated with your docs, IDEs, and SCMs, with searchability and tagging so you know the provenance of all your models in production.
Algorithm serving and pipelining -- allowing for complex assemblies of models required to support the app -- this should be hands-off maintenance.
Model management -- how you control access for version management, A/B testing, source and licensing control, and build history management.

Across the Operational phase, there are also five core capabilities:

Model operations -- which is how you control usage and performance in production, includes approval process and permission control.
Infrastructure management, which includes fully automated infrastructure, redundancy, autoscaling, on-premise, cloud, and multi-region support.
Monitoring and reporting -- visibility into the "who, what, where, why, and when" of MLOps.
Governance, logging, reporting, customer metrics for internal and external compliance.
Security, across all stages, including data encryption, network security, SSO and proxy compliance, permission, and controls.

VentureBeat: The nature of the AI deployment depends on the organization's maturity. In this case, what needs to be in place for an organization to be ready for MLOps?

Daniel: MLOps becomes relevant when trying to get machine learning models into production. This will typically happen only after a data science program is established and projects are well underway. But waiting until the model is built is too late and will result in delays in getting to production if the MLOps story is not solved.

VentureBeat: What are common mistakes with MLOps?

Daniel: Leaving the responsibility on the individual data scientists to navigate the IT/DevOps/security departments on their own. This sets up a recipe for failure, where success depends on a specialized team navigating a completely different software engineering domain. We've seen a lot of companies that will hire teams of data scientists and machine learning engineers and set them loose building models. At the point where they've built a model and need to get it deployed and ready to handle production traffic, there are a number of things that need to be in place. These are things that are considered mandatory in the modern IT environment, not just for machine learning: source code management, testing, continuous integration and delivery, monitoring, alerting, and management of the software development lifecycle. Being able to effectively manage many services, and many versions of those services, is especially critical in machine learning, where models may be retrained and updated on a constant basis. That's why it's critical for companies to answer the question of "What is our MLOps story?" and what is the organization's process for going from data, to modeling to production.

VentureBeat: What is the most common use case with MLOps?

Daniel: Large enterprises use us for mission-critical applications. The most common use cases we see are those that are critical to scaling complex applications to gain agility, accuracy, or speed to market; anyplace where a faster transaction has a material impact to value. Merck, for example, speeds up the analysis of complex compounds for drug discovery and vaccine development. EY accelerates fraud detection by updating models more frequently and reducing false positives by over 30% with those better-performing models. Raytheon will support development of the U.S. Army's Tactical Intelligence Targeting Access Node program.

VentureBeat: How has the advent of low-code/no-code helped/hindered MLOps?

Daniel: I am generally skeptical of low/no code solutions. The good thing is that because they are typically opinionated about the applications they produce, they often come with a solid MLOps story out of the box. The downside is that while they might be quick to get working on a simple demo, most real-world applications will have complexity that goes beyond what no-code tools can support. The customization becomes critical for applications in production.

VentureBeat: DevOps quickly went into DevSecOps as developers realized that we should be integrating security operations into development as well. Is there a security element for MLOps?

In our research, security, along with governance, is the top challenge that organizations face when deploying ML models to production. There absolutely is a security element for MLOps, and it is converging with more traditional data and network security. Enterprise-grade security is definitely something ML Engineers must consider as a first-order capability of any MLOps domain. I'm talking about data encryption at rest and in flight, unique model containment, API pairings, private and public certificate authority, proxy support, SSO integration, key management, and potentially air-gapped deployment support for high-security usage.

More