Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.
Liqid has integrated its software for dynamically composing compute and storage resources on high performance computing (HPC) environments with open source Slurm Workload Manager software used to orchestrate jobs on these platforms.
The integration of Liqid Matrix Software with the open source orchestration engine will make it easier for IT organizations to dynamically scale HPC workloads up and down as needed, Liqid CEO Sumit Puri said. That capability has become more critical as IT teams increasingly run AI workloads on HPC platforms configured with graphical processor units (GPUs), Puri added.
Liqid Matrix Software makes it possible to dynamically aggregate bare-metal resources — such as GPUs, x86 and Arm processors, NVMe storage, network integration cards (NICs), host bus adaptors, field-programmable gate arrays, and memory — and then assign them to a specific workload. It also provides peer-to-peer connectivity that enables those resources to be aggregated across multiple HPC systems.
Slurm, meanwhile, is an orchestration engine widely employed in HPC environments to dynamically scale resources in much the same way Kubernetes does in IT environments running containers. The one prerequisite is systems running Liqid Matrix Software need to support the Peripheral Component Interconnect (PCI) Express 3.0 expansion bus standard, which provides I/O virtualization capabilities. Most recently, Liqid revealed it is collaborating with Broadcom to created reference kits for the 4.0 of PCI Express, which doubles the overall throughput available.
“For the first time in history, every device in the datacenter speaks a common language,” Puri said.
Liqid iso also working with VMware to make its software available via the console VMware provides to manage virtual infrastructure. VMware most recently expanded its alliance to Nvidia to make GPUs more accessible to the average IT administrator.
Organizations are looking to maximize utilization rates on HPC platforms to increase the value of investments they have made in existing platforms, Puri noted. Most recently, Liqid won a $32 million contract from the U.S. Department of Defense to maximize utilization of a pair of supercomputers located at the Supercomputing Resource Center at Aberdeen Proving Ground in Maryland, which provide access to 15 petaflops of performance. Those systems are based on Intel Xeon Platinum 9200 CPUs featuring Intel DL Boost technology and Nvidia A100 Tensor Core GPUs.
Rather than having to rely on HPC platforms built using proprietary processors found in, for example, a Cray supercomputer, Liqid is betting that more HPC workloads will wind up being deployed on lower-cost commercial processors from Intel, Arm, and Nvidia. The software Liqid provides makes it possible to manage systems based on those processors as if they were one logical entity.
It’s not clear to what degree AI workloads will be running on-premises versus on the cloud, where orchestration is generally managed by the cloud service providers. However, given the prevalence of HPC platforms that have already been paid for and deployed, it’s highly probable that many organizations will prefer to leverage what amounts to an already sunk cost. In other cases, security and compliance concerns require IT organizations to continue to invest in on-premises systems.
Regardless of approach, HPC platforms are about to become a mainstay of many IT environments as the number of AI workloads continues to increase. Longer-term, those workloads are going to migrate to the network edge, Puri said. As that trend continues to evolve, Puri said will become crucial for IT teams to manage bare-metal infrastructure at higher levels of abstraction.
But given the cost of GPUs, most IT organizations will likely remain anxious to optimize any platform that makes use of them for the foreseeable future.
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.