Astera Labs announces memory acceleration to clear datacenter AI/ML bottlenecks

Astera Labs today announced key advancements to clear up performance bottlenecks in enterprise datacenters caused by the massive data needs of AI and ML applications.

Timed to coincide with Supercomputing21, a conference for high-performance computing that happens this week, the company is launching what it claims is the industry's first memory accelerator platform based on the Compute Express Link (CXL) standard for interconnecting general purpose CPU processors and various other datacenter devices.

The news is significant because clearing bottlenecks in datacenters has become a holy grail for the major vendors of processors. Their customers are struggling with performance, bandwidth, and latency issues as they piece together different types of processors like CPUs, GPUs, and AI accelerators that are required to drive powerful applications like AI.

By combining its existing Aries product (for PCIe retimers) with the newly announced Taurus (for smart cables) and Leo SoC (for CXL memory accelerators), Astera Labs says it can become the leading cloud connectivity provider and (more than) double its revenue annually to address the $1 billion pipeline opportunity it sees, with an overall estimated total addressable market of $8 billion by 2025, which is being fueled by the growth of AI.

The goal is to create a faster connectivity backbone that provides low-latency interconnects, shares resources, and stays efficient with tricky technologies like cache. Also, Astera Labs says its fully cloud-based approach provides significant advantages in design productivity and quality assurance.

Feeding data to memory accelerators

One of the persistent challenges in computing is to ensure that CPUs and other accelerators can be fed data. This has become a major issue given the explosive growth of AI, where model sizes have doubled in as little time as every three and a half months. In recent years, DRAM scaling has not kept up with Moore's law, which means memory is becoming a more limiting and costlier factor than compute. The CXL protocol, based on standard PCIe infrastructure, is an alternative to the standard DIMM slot for DRAM. It can also be used to attach accelerators to the CPU.

Intel proposed the CXL standard in 2019, and its industry adoption is targeted to coincide with PCIe 5.0 in 2022. Compared to PCIe 5.0, CXL adds multiple features such as cache coherency across CPU and accelerators and also has a much lower latency. In the future, CXL 2.0 will add rack-level memory pooling, which will make disaggregated datacenters possible.

Astera Labs already has some products that are used by cloud service providers, such as PCIe and CXL retimers, but is aiming to expand this portfolio with these new announcements.

Memory accelerator for CXL 2.0

Leo, which Astera calls the industry's first memory accelerator platform for CXL 2.0, is designed to make it possible for CXL 2.0 to pool and share resources (memory and storage) across multiple chips in a system -- including the CPU, GPU, FPGA, and SmartNIC -- and make disaggregated servers possible. Leo further offers built-in fleet management and diagnostic capabilities for large-scale server deployments, such as in the cloud or enterprises.

"CXL is a game-changer for hyperscale datacenters, enabling memory expansion and pooling capabilities to support a new era of data-centric and composable compute infrastructure," Astera Labs CEO Jitendra Mohan said. "We have developed the Leo SoC [system on a chip] platform in lockstep with leading processor vendors, system OEMs, and strategic cloud customers to unleash the next generation of memory interconnect solutions."

CXL consists of three protocols: CXL.io, CXL.cache, and CXL.memory. However, only the implementation of CXL.io is mandatory. For the artificial intelligence use case of a cache-coherent interconnect between memory, the CPU, and accelerators such as GPUs and NPUs (neural processing units), the CXL.memory protocol is relevant. Although the latency of CXL is higher than a standard DIMM slot, it is similar to current (proprietary) inter-CPU protocols such as Intel's Ultra Path Interconnect (UPI). Because one of the goals of CXL 2.0 is to enable resource pooling at the rack-scale, the latency will be similar to today's solutions for internode interconnects. CXL.memory further supports both conventional DRAM and persistent memory, in particular Intel's Optane.

The Leo SoC memory accelerator platform positions Astera to play a critical role to support the industry in adopting CXL-based solutions for AI and ML. Because CXL is based on PCIe 5.0, Leo supports a bandwidth of 32 GT/s per lane. The maximum capacity is 2TB.

"Astera Labs' Leo CXL Memory Accelerator Platform is an important enabler for the Intel ecosystem to implement a shared memory space between hosts and attached devices," Jim Pappas, director of technology initiatives at Intel, said.

"Solutions like Astera Labs' Leo Memory Accelerator Platform are key to enable tighter coupling and coherency between processors and accelerators, specifically for memory expansion and pooling capabilities," Michael Hall, director of customer compatibility at AMD, agreed.

Inside CXL

Digging a bit deeper into CXL, the Intel-proposed standard was the last one for a cache-coherent interconnect to be announced. For example, Arm was already promoting its CCIX standard, and various other vendors were working on a similar solution in the Gen-Z Consortium. However, with the absence of Intel -- still the dominant vendor in the datacenter -- in these initiatives, they gained little traction. So once Intel proposed CXL as an open interconnect standard based on the PCIe 5.0 infrastructure, the industry quickly moved to back the CXL initiative, as Intel promised support in its upcoming Sapphire Rapids Xeon Scalable processors.

Within six months of the CXL announcement, Arm announced that it, too, would move away from its own CCIX in favor of CXL. Earlier this month, the Gen-Z Consortium announced that it had signed a letter of intent (following a previous memorandum of understanding) to transfer the Gen-Z specifications and assets to the CXL Consortium, making CXL the "sole industry-standard" going forward.

Other vendors have already announced support. In 2021, Samsung and Micron each announced that they would bring DRAM based on the CXL interconnect to the market. In November, AMD announced that it would start to support CXL 1.1 in 2022 with its Epyc Genoa processors.

Outside of CXL

Astera also announced Taurus SCM, which pertains to smart cable modules (SCM) for Ethernet. These "smart cables" serve to maintain signal integrity as bandwidth doubles in 200G, 400G, and 800G Ethernet (which is starting to replace 100GbE) in 3m or longer copper cables, and they support latencies up to 6x lower than the spec. Other smart features include security, cable degradation monitoring, and self-test. The cables support up to 100G-per-lane serializer-deserializer (SerDes).

Astera Labs is an Intel Capital portfolio company. The startup is partnering with chip providers such as AMD, Arm, Nvidia, and Intel's Habana Labs, which have also supported the CXL standard. In September, the company announced a series C $50 million investment at a $950 million valuation.

Feeding data to memory accelerators

Memory accelerator for CXL 2.0

Inside CXL

Outside of CXL

More