MLops: Making sense of a hot mess

The MLops market may still be hot when it comes to investors. But for enterprise end users, it may seem like a hot mess.

The MLops ecosystem is highly fragmented, with hundreds of vendors competing in a global market that was estimated to be $612 million in 2021 and is projected to reach over $6 billion by 2028. But according to Chirag Dekate, a VP and analyst at Gartner Research, that crowded landscape is leading to confusion among enterprises about how to get started and what MLops vendors to use.

“We are seeing end users getting more mature in the kind of operational AI ecosystems they're building – leveraging Dataops and MLops,” said Dekate. That is, enterprises take their data source requirements, their cloud or infrastructure center of gravity, whether it's on-premise, in the cloud or hybrid, and then integrate the right set of tools.

But it can be hard to pin down the right set of tools.

“In most cases, we are tracking close to 300-plus MLops companies – each claims to offer MLops, but they offer piecemeal capabilities,” said Dekate.

Some might offer a feature store, for example, while others might offer a model training environment or model deployment capabilities.

“Some of the most common questions we get asked are, ‘Where do we start?’ ‘How do we scale?’ ‘How do we navigate the vendor mix?’” he said. “Should they start with a platform approach, essentially leveraging Amazon SageMaker, Microsoft Azure, or Google Vertex? Or should they piece together a custom tool chain where they partner with different solution providers or a startup ecosystem?”

Different MLops approaches can work

MLops emerged as a set of best practices less than a decade ago, to address one of the primary roadblocks preventing the enterprise from putting artificial intelligence (AI) into action — the transition from development and training to production environments.

This is essential because nearly one out of two AI pilots never make it into production. And for those that do, it takes over seven months, on average, to go from pilot to production, said Dekate, who added that this is actually an improvement over 2021 — when it took over 8.5 months.

Dekate, who provides strategic advice to CIOs and IT leaders on MLops and operational AI systems, points out that organizations are still increasing investments in AI this year in order to address a triple squeeze of inflation and recessionary risk, talent challenges, and global supply chain challenges. What they are struggling with is the best MLops approach to take in a crowded vendor landscape.

Dekate said he has seen both extreme approaches – completely cloud-native and completely best-of-breed – work, depending on the organization.

Enterprises that are cloud-native tend to leverage Amazon, Google or Microsoft-native stacks because they enable them to leverage their existing enterprise investments and offer easier integration.

“From an integration perspective, cloud-native approaches work better for entities that are more cloud-mature,” he said. “But for startups that want capabilities that cloud service providers are not able to deliver, integrating a right set of patchwork and events actually is a preferred approach.”

But most enterprises create a hybrid strategy. That is, they use Amazon, Azure or Google as a backplane, depending on their enterprise center of gravity, and plug in components where capabilities might be missing or where they need hyper-customization or specialization.

“The components and engine parts might vary, but at the baseline, what they're trying to do is they're trying to industrialize AI at scale,” he said.

No MLops tech stack rules them all

Still, in the world of MLops there is currently no single technology stack that stands above the rest as a complete offering.

“I think cloud-native stacks like Amazon Azure and Google Vertex are close to offering a complete solution, but enterprise end users tell me that even in cloud ecosystems they have to piece things together,” Dekate said. “There might be a feature store somewhere, or there might be a model engineering ecosystem somewhere.”

One of the cloud ecosystem’s biggest weaknesses is the struggle to address the on-premise opportunity, he added. Most of today’s enterprises are hybrid in nature, so an Amazon SageMaker-like experience might not necessarily translate to an on-premise ecosystem stack.

“That is where you see entities like DataRobot and MLflow and Domino Data Labs start to offer differentiation,” he said. “What they try to offer is an infrastructure or deployment context-agnostic stack – essentially decoupling your data and analytics pipeline from your deployment context.”

Each of those, he explains, offers unique capabilities. “Some might tout Lego-like integration capabilities that enable seamless integration, while others like DataRobot might claim that they have advanced auto ML capabilities,” he said. “Many of these bespoke entities are trying to offer differentiation by addressing some of the weaknesses that do exist in cloud native stacks.”

Some of these ecosystems are now setting out to offer a complete data-to-deployment experience, thanks to partnerships or acquisitions, he added.

“If you look at DataRobot strategies and how they have evolved, they initially were really strong in auto ML,” he said. “Their main MO was that they would accelerate modern development and monitor training and validation. What they have done since is through acquisition. They're now trying to offer data pipelines and offering model deployment. So now DataRobot can offer these comprehensive experience streams, even if they lack components.”

Risks of best-of-breed MLops

But most enterprises are challenged in partnering with some MLops companies because many are relatively new, small-scale enterprises, which exposes them to extreme risks.

“More than likely, enterprises are going to start out with their existing cloud-native stack first, because it fundamentally simplifies the integration challenges that they may run into,” Dekate said. “It also lowers the risk profile that they eventually might get into by stitching pieces together.”

The best-of-breed stack does have its advantages, however.

“It enables you to customize a lot and deliver the best-in-class solution for your ecosystem,” he said. “The risk is that a lot of these vendors will face extreme market pressures, resulting in either companies going bankrupt or companies getting acquired, folded or integrated. That exposes both the vendor and end user community in unique ways.”

The companies that will be more successful will likely be those companies proactively creating a holistic solution, either through partnerships or acquisitions, he added. But the pure play, niche, specialized companies will be “more of an acquisition target than a differentiation target.”

Overcoming MLops market chaos

The bottom line is that all organizations need a version of MLops, he explained, adding that they should focus on the capabilities MLops promises to deliver, rather than responding to vendor hype.

“It’s about standardizing how you go from feature engineering to model development, to model validation, to model deployment,” he said. “What you're trying to reduce is the repetitive activities that you constantly engage in through standardization – I think MLops is absolutely essential towards creating sustainable AI pipelines.”

What is concerning, he explained, is the overuse of the term MLops, where companies that only focus on parts of the ecosystem — such as feature stores — are marketing themselves as MLops companies.

“It is essentially creating incredible chaos and confusion in end users' minds,” he said.

Even Gartner has to go through complex engagements to understand what a company is actually truly offering.

"And even then, it's not clear,” he said. “We actually have to put them side by side in large Excel sheets before we can actually identify true areas of differentiation because it's really, really complicated.”

Dekate recommends enterprise end users focus on what they actually need to standardize their data and model pipelines.

“At the end of the day, what you're trying to achieve is standardizing the practices so that you can operationalize your AI ecosystems at scale,” he said.

MLops maturation over the next year

Over the next 12 months, Dekate expects a more mature MLops end-user and vendor market ecosystem to evolve.

“I suspect you're going to see some bundling of capabilities because right now, it is hyper-fragmented and we are reaching a point where these specialized niches are, frankly speaking, unsustainable,” he said. “Very rarely are end users going to chase after niche capabilities to engineer an AI production pipeline.”

The result, he said, will likely be a market churn.

“It’s not necessarily an AI winter as much as a maturation and an evolution of a more, complete, more reliable, more comprehensive AI stack,” he said. “If I were to bet, I think a lot of [MLops] will be increasingly cloud-native and a lot more cloud-oriented.”