The MLops company making it easier to run AI workloads across hybrid clouds

There is no shortage of options for organizations seeking places in the cloud, or on-premises to deploy and run machine learning and artificial intelligence (AI) workloads. A key challenge for many though is figuring out how to orchestrate those workloads across multi-cloud and hybrid-cloud environments.

Today, AI compute orchestration vendor Run AI is announcing an update to its Atlas Platform that is designed to make it easier for data scientists to deploy, run and manage machine learning workloads across different deployment targets including cloud providers and on-premises environments.

In March, Run AI raised $75 million to help the company advance its technology and go-to-market efforts. At the foundation of the company's platform is a technology that helps organizations manage and schedule resources on which to run machine learning. That technology is now getting enhanced to help with the challenge of hybrid cloud machine learning.

"It's a given that IT organizations are going to have infrastructure in the cloud and some infrastructure on-premises," Ronen Dar, cofounder and CTO of Run AI, told VentureBeat. "Companies are now strategizing around hybrid cloud and they are thinking about their workloads and about where is the right place for the workload to run."

The increasingly competitive landscape for hybrid MLops

The market for MLops services is increasingly competitive as vendors continue to ramp up their efforts.

A Forrester Research report, sponsored by Nvidia, found that hybrid support for AI workload development is something that two-thirds of IT decision-makers have already invested in. It's a trend that is not lost on vendors.

Domino Data Lab announced its hybrid approach in June, which also aims to help organizations run in the cloud and on-premises. Anyscale, which is the leading commercial sponsor behind the open-source Ray AI scaling platform, has also been building out its technologies to help data scientists run across distributed hardware infrastructure.

Run AI is positioning itself as a platform that can integrate with other MLops platforms, such as Anyscale, Domino and Weights & Biases. Lior Balan, director of sales and cloud at Run AI, said that his company operates as a lower level solution in the stack than many other MLops platforms, since Run AI plugs directly into Kubernetes.

As such, what Run AI provides is an abstraction layer for optimizing Kubernetes resources. Run AI also provides capabilities to share and optimize GPU resources for machine learning that can then be used to benefit other MLops technologies.

The complexity of multicloud and hybrid cloud deployments

A common approach today for organizations to manage multicloud and hybrid clouds is to use the Kubernetes container orchestration system.

If an organization is running Kubernetes in the public cloud or on-premises, then a workload could run anywhere that Kubernetes is running. The reality is a bit more complex, as different cloud providers have different configurations for Kubernetes and on-premises deployments have their own nuances. Run AI has created a layer that abstracts the underlying complexity and difference across public cloud and on-premises Kubernetes services to provide a unified operations layer.

Dar explained that Run AI has built its own proprietary scheduler and control plane for Kubernetes, which manages how workloads and resources are handled across the various types of Kubernetes deployments. The company has added a new approach to its Atlas Platform that allows data scientists and machine learning engineers to run workloads from a single user interface, across the different types of deployments. Prior to the update, data scientists had to use different interfaces to log into each type of deployment in order to manage a workload.

In addition to now being able to manage workloads from a single interface, it's also easier to move workloads across different environments.

"So they can run and train workloads in the cloud, and then switch and deploy them on premises with just a single button," Dar said.

The increasingly competitive landscape for hybrid MLops

The complexity of multicloud and hybrid cloud deployments

More