Run:AI lands $75M to dynamically allocate hardware resources for AI training

While interest in AI remains high among enterprise organizations, particularly for its potential to improve decision-making and automate repetitive tasks, many of these businesses are struggling to deploy AI into production. In a February survey from IDC, only a third of companies claimed that their entire organization was benefitting from an enterprise-wide AI strategy. The same poll found that 69% of companies hadn't yet reached production with AI, and instead remained in the experimentation, evaluation, or prototyping phases.

The challenges vary from organization to organization, but some common themes include infrastructure and data. The high upfront costs of hardware drive many companies to the cloud, which is often expensive and difficult to monitor. (A 2021 Anodot study found that fewer than 20% of companies were able to immediately detect spikes in cloud costs.) Meanwhile, data quality issues like a lack of data curation, data governance, and data literacy are introducing compliance risks such as biased algorithms.

Inspired to search for a solution, Omri Geller, Ronen Dar, and Meir Feder several years ago founded Run:AI, a platform that creates an abstraction layer to optimize AI workloads. Run:AI attempts to allocate workloads such that available hardware resources are maximized, considering factors like network bandwidth, compute resources, cost, and data pipeline and size.

Run:AI today announced that it raised $75 million in a series C led by Tiger Global Management and Insight Partners with participation from TLV Partners and S Capital VC, bringing its total capital raised to $118 million. The company plans to use the investment to grow its team and consider future, "strategic" acquisitions.

Optimizing AI

Dar and Geller founded Run:AI after studying together at Tel Aviv University under Feder, who specializes in information theory. Dar was a postdoc researcher at Bell Labs and R&D and algorithms engineer at Apple, Anobit, and Intel. Geller was a member of the Israeli military, where he led large-scale projects and deployments.

"AI is the new technology that's going to provide a competitive edge for companies. We believe that enterprises will not be able to lead their domain without AI capabilities," Geller told VentureBeat via email. "AI is so fundamental that it 'opens the books' and will create a new world order with new leaders. Companies that create capabilities to let computers learn faster and have more innovative capabilities will dominate their domains. That's why companies are investing in AI."

Run:AI essentially "breaks up" AI models into fragments that run in parallel, according to Geller -- an approach that has the added benefit of cutting down on hardware memory usage. This in turn enables models that would otherwise be constrained by hardware, chiefly GPU memory, to run ostensibly unimpeded on-premises, on public clouds, or at the edge.

Exactly how Run:AI allocates workloads depends on the policies defined by an organization. Policies in Run:AI create quotas for different projects. Enterprise IT and data science teams can also create logical fractions of GPUs or execute jobs across multiple GPUs or nodes.

Toward the end of 2021, Run:AI added support for both MLflow, a tool for managing the AI lifecycle, and Kubeflow, an open source framework for machine learning operations. The company also added integrations with Apache Airflow, software that can be used to create, schedule, and monitor data workflows.

"When Run:AI starts work with a new customer, we typically see a GPU utilization rate of between 25% and 30% … GPUs tend to be idle during nonwork hours (e.g., nights, weekends). They can also be idle during work breaks (e.g., coffee breaks, lunch). [And] they can be idle when a researcher is building [an AI] model," Raz Rotenberg, software team lead at Run:AI, explains in a blog post. "Increasing GPU utilization and minimizing idle times can drastically reduce costs and help achieve model accuracy faster. To do this, one needs to improve the sharing of GPU resources."

Competition

While Run:AI has relatively few direct competitors, other startups are applying the concept of dynamic hardware allocation to AI workloads. For example, Grid.ai offers software that allows data scientists to train AI models across GPUs, processors, and more in parallel. Nvidia, for its part, sells AI Enterprise, a software suite of tools and frameworks that enable companies to virtualize AI workloads on Nvidia-certified servers.

Some customers might be skeptical, too, of how well Run:AI can adjust allocations depending on the architecture of different AI systems. And while it does work with custom chips like Google's tensor processing unit (TPU), which can accelerate certain AI workloads, Run:AI remains focused on GPU usage, which might not suit every data science organization's needs.

But Run:AI -- which works closely with Amazon Web Services and VMware -- claims to be going strong with a customer base spanning "dozens" of Fortune 500 and startup finance, automotive, healthcare, gaming, and academic organizations with "thousands" of users. Annual recurring revenue grew nine times over the last year while Run:AI's workforce more than tripled.

And if surveys are anything to go by, Run:AI won't have a shortage of potential customers. A Statista poll in January found that only around 19% of companies have established a data culture in their organization. And with cloud services spending hitting an estimated $304.9 billion last year, according to Gartner, companies will likely continue to look for on-premises alternatives to bolstering their AI infrastructure.

"IT needs to serve the business goals, and if the business goal is to bring AI to market sooner, making it the responsibility of IT to deliver faster and Run:AI is what allows them to do that," Geller continued. "The C-suite are gong ho on Run:ai because they can innovate faster and to produce AI solutions faster to create a competitive advantage.

Optimizing AI

Competition

More