Puzzled about how to run your artificial intelligence (AI), machine learning (ML), and deep learning (DL) applications at scale, with maximum performance, and minimum cost? There are lots of cloud-based options available, but what about for workloads deployed with on-premises infrastructure or in a hybrid architecture? You’re not alone: this puzzle has confounded many data scientists and IT professionals in large enterprises as they embark on their AI / ML / DL initiatives.

It has been a long while coming, but now we can solve this puzzle by running ML / DL workloads with containers and graphics processing units (GPUs). Why, you may ask, is now the right time? We have had containers and GPUs for a long time. ML and DL applications have been deployed in containers for almost as long. And we have had GPUs surfaced into containers for at least the past few years.

All true, but until recently it was still challenging to run large-scale, GPU-enabled ML and DL applications in a distributed environment with containers. Recent developments have changed all that. The pieces of the puzzle are starting to fit together.

Let’s take a look at the pieces of the puzzle. Every data scientist knows that the training and inference of ML and DL predictive models is compute-intensive. The use of hardware accelerators, such as GPUs, is key to providing the level of compute power required so that those models can make predictions in reasonable amounts of time.

But using GPUs in a large-scale enterprise environment can be challenging, especially for on-premises deployments:

  • They require a complex stack of software spanning the operating system, middleware code, and application libraries that can be difficult to install and maintain.
  • GPUs are not easily shared. And when they are shared, there is often little visibility into their utilization. This makes it hard to accurately predict demand and plan for future GPU infrastructure needs.
  • The utilization of GPUs by an ML / DL application changes dramatically while running even a single workload. This means that even if GPUs are shared by using containers, they will be not fully utilized unless the GPUs can be switched between containers while the application is running!

Containers can help with the first of these puzzle pieces. They make it easier to bundle and deploy consistent versions of middleware and application software; containers also provide the portability to run your applications on any infrastructure, whether on-premises or in a public cloud.

To address the other challenges, we need container orchestration. Most container orchestrators, such as Kubernetes, support some form of GPU resource sharing. However, this resource sharing does not fully address the above issues.

But now there are solutions, such as HPE’s BlueData container platform, that can solve this puzzle. To address the final pieces of this puzzle requires new capabilities, including:

  • On-demand, elastic provisioning of GPU resources: Containerized ML / DL applications are quickly and easily deployed with access to one or more GPUs. New containerized environments can be provisioned on-demand and then deprovisioned (releasing the GPUs) when no longer needed.
  • Pause and restart GPU-enabled containers: The ability to pause a container and release the attached GPUs, while preserving the current state of the application running within the container. This allows IT admins to monitor usage and reassign the GPU when the GPU-specific code is executed.
  • A unified console for GPU resource management: Monitor and manage a shared pool of GPU resources, with application visibility and usage reporting for GPU utilization across multiple hosts.
  • Support for multiple GPU models and versions: Ensure the correct deployment of specific container images on hosts with compatible versions of GPU hardware and operating system drivers.

Now enterprises have the ability to configure and tune their platforms for their own specific needs, running distributed ML / DL applications with GPUs at scale — while minimizing costs and ensuring maximum performance. They can deploy these environments using infrastructure resources from their preferred public cloud provider, or from their on-premises data center. They can dynamically move CPU, memory, and GPU resources between containerized computing nodes so as to minimize the costs and runtime of their training and inference jobs. All the pieces of the puzzle finally fit together.

Thomas Phelan is an HPE Fellow and the Co-Founder of BlueData (acquired by HPE).

Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. Content produced by our editorial team is never influenced by advertisers or sponsors in any way. For more information, contact sales@venturebeat.com.