HPE releases ML development system to help companies deploy AI at scale

It is a conundrum throughout the enterprise sector: artificial intelligence (AI) and machine learning (ML) modeling delivers great business value across a variety of use cases. But achieving this requires significant time and monetary investments in AI infrastructure.

And many organizations aren’t there yet — meaning that engineers often spend the majority of their time performing manual tasks and infrastructure management rather than building, training and deploying models.

“Enterprises seek to incorporate AI and ML to differentiate their products and services, but are often confronted with complexity in setting up the infrastructure required to build and train accurate AI models at scale,” said Justin Hotard, executive vice president and general manager for HPC and AI at Hewlett Packard Enterprise (HPE).

The problem with AI and ML deployment across the enterprise

There’s no doubt that investment in AI/ML is continuing to rise and at a significant pace: According to Tortoise Intelligence, worldwide investment has increased by 115% since 2020, marking the largest year-over-year growth in two decades. Similarly, Fortune Business Insights estimates the ML market size to grow from nearly $21.2 billion in 2022 to $209.91 billion in 2029, a compound annual growth rate of nearly 40%.

But while organizations prioritize AI/ML over other IT initiatives, they continue to run into post-deployment operational issues, lagging deployments and often disparate infrastructure complexities.

In a recent survey performed by Comet, 68% of respondents reported scrapping anywhere from 40% to 80% of their AI/ML experiments. This was due largely to “woefully inadequate” budgets and breakdowns and mismanagement of data science lifecycles beyond normal iterative processes of experimentation.

HPE to the rescue

As a means to help simplify and speed up this process, HPE today released a new Machine Learning Development System. The ready-to-use system allows users to immediately build and train AI models at scale and realize faster value. It builds on HPE’s acquisition in summer 2021 of Determined AI. The San Francisco startup built an open-source AI training platform that has now transitioned to the HPE Machine Learning Development Environment.

“Users can speed up the typical time-to-value to start realizing results from building and training machine models, from weeks and months, to days,” Hotard said.

Traditionally, he pointed out, adopting infrastructure to support model development and training at scale has required a complex, multistep process. This involves the purchase, setup and management of a highly parallel software ecosystem and infrastructure.

By contrast, he said, the HPE Machine Learning Development System is fully integrated and ready-to-use, combining software and specialized computing including accelerators, networking and services. It can scale AI model training with minimal code rewrites or infrastructure changes and helps to improve model accuracy with distributed training, automated hyperparameter optimization and neural architecture search – all of which are key to ML algorithms, Hotard explained.

The system delivers optimized compute, accelerated compute and interconnect, which support scale modeling for a mix of workloads. Its small configuration begins at 32 GPUs, which has shown to deliver roughly 90% scaling efficiency for workloads including computer vision and natural language processing (NLP), Hotard said.

For example, German AI startup Aleph Alpha applied the new HPE system to train multimodal AI including large natural language processing (NLP) and computer vision models. The company was able to set up a new system combining and monitoring hundreds of GPUs in just a couple of days and began training on it within two days.

The company established customized hyperparameter optimization and perform experiment tracking for collaboration, Hotard explained. AI assistants have been able to perform complex texts, higher level understanding summaries and searches of highly specific information in hundreds of documents. They have also been able to leverage specialized knowledge in conversational contexts.

“By combining image and text processing in five languages with almost humanlike context understanding, the models push the boundaries of modern AI for all kinds of language and image-based transformative uses cases,” Hotard said.

All told, the Machine Learning Development System can improve ML team collaboration by providing a faster path to more accurate models, Hotard said, while also enabling flexibility that can help future-proof AI infrastructure. It “combines our proven end-to-end HPC solutions for deep learning with our innovative machine learning software platform into one system to provide a performant out-of-the box solution to accelerate time to value and outcomes with AI,” he said.

The problem with AI and ML deployment across the enterprise

HPE to the rescue

More