Liqid integrates HPC management tool with Slurm orchestration engine

Join Transform 2021 for the most important themes in enterprise AI & Data. Learn more.

Liqid has integrated its software for dynamically composing compute and storage resources on high performance computing (HPC) environments with open source Slurm Workload Manager software used to orchestrate jobs on these platforms.

The integration of Liqid Matrix Software with the open source orchestration engine will make it easier for IT organizations to dynamically scale HPC workloads up and down as needed, Liqid CEO Sumit Puri said. That capability has become more critical as IT teams increasingly run AI workloads on HPC platforms configured with graphical processor units (GPUs), Puri added.

Liqid Matrix Software makes it possible to dynamically aggregate bare-metal resources — such as GPUs, x86 and Arm processors, NVMe storage, network integration cards (NICs), host bus adaptors, field-programmable gate arrays, and memory — and then assign them to a specific workload. It also provides peer-to-peer connectivity that enables those resources to be aggregated across multiple HPC systems.

Slurm, meanwhile, is an orchestration engine widely employed in HPCs environments to dynamically scale resources in much the same way Kubernetes does in IT environments running containers. The one prerequisite is systems running Liqid Matrix Software need to support the Peripheral Component Interconnect (PCI) Express 3.0 expansion bus standard, which provides I/O virtualization capabilities. Most recently, Liqid revealed it is collaborating with Broadcom to created reference kits for the 4.0 of PCI Express, which doubles the overall throughput available.

“For the first time in history, every device in the datacenter speaks a common language,” Puri sais.

Liqid iso also working with VMware to make its software available via the console VMware provides to manage virtual infrastructure. VMware most recently expanded its alliance to Nvidia to make GPUs more accessible to the average IT administrator.

Organizations are looking to maximize utilization rates on HPC platforms to increase the value of investments they have made in existing platforms, Puri noted. Most recently, Liqid won a $32 million contract from the U.S. Department of Defense to maximize utilization of a pair of supercomputers located at a Supercomputing Resource Center housed at Aberdeen Proving Ground in Maryland that provides access to 15 petaflops of performance. Those systems are based on Intel Xeon Platinum 9200 CPUs featuring Intel DL Boost technology and Nvidia A100 Tensor Core GPUs.

Rather than having to rely on HPC platforms built using proprietary processors found in, for example, a Cray supercomputer, Liqid is betting that more HPC workloads will wind up being deployed on lower-cost commercial processors from Intel, Arm, and Nvidia. The software Liqid provides makes it possible to manage systems based on those processors as if they were one logical entity.

It’s not clear to what degree AI workloads will be running on-premises versus on the cloud, where orchestration is generally managed by the cloud service providers. However, given the prevalence of HPC platforms that have already been paid for and deployed, it’s highly probable that many organizations will prefer to leverage what amounts to an already sunk cost. In other cases, security and compliance concerns require IT organizations to continue to invest in on-premises systems.

Regardless of approach, HPC platforms are about to become a mainstay of many IT environments as the number of AI workloads continues to increase. Longer term, those workloads are going to migrate to the network edge, Puri noted. As that trend continues to evolve, Puri said will become crucial for IT teams to manage bare-metal infrastructure at higher levels of abstraction.

But given the cost of GPUs, most IT organizations will likely remain anxious to optimize any platform that makes use of them for the foreseeable future.

VentureBeat

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Source: Read Full Article