Armv9 is Arm's first major architectural update in a decade

Arm, the leader in chips used in everything from mobile devices to supercomputers, has unveiled Armv9, the company's first major architectural change in a decade. The new designs should result in 30% faster performance over the next two chip generations.

Arm is a chip architecture company that licenses its designs to others, and its customers have shipped more than 100 billion chips in the past five years. Nvidia is in the midst of acquiring Cambridge, United Kingdom-based Arm for $40 billion, but the deal is waiting on regulatory approvals.

In a press briefing, Arm CEO Simon Segars said Armv9 will be the base for the next 300 billion Arm-based chips. Arm's customers have shipped more than 180 billion chips to date, and those chips touch more than 70% of the world's population, Segars said.

"We're extremely excited to be sharing Arm's vision of the next decade of computing with you," Segars said.

The new architecture has processing that balances economics, design freedom, and accessibility advantages of general-purpose computing devices with specialized processors that handle tasks like digital signal processing and machine learning. The company says Armv9 also takes security and artificial intelligence features to new levels.

Arm previously launched its Armv8 architecture in 2011, and that became its most successful platform in history as the foundation for smartphone chips, internet of things (IoT) devices, and a wide range of industrial devices. Arm has more than 6,500 employees, about 80% of whom are engineers.

At the current rate, 100% of the world's shared data will soon be processed on Arm; either at the endpoint, in the data networks or the cloud, Segars said. Such pervasiveness conveys a responsibility on Arm to deliver more security and performance, along with other new features in Armv9, he added.

The new capabilities in Armv9 will accelerate the move from general-purpose to more specialized compute across every application as AI, IoT, and 5G gain momentum globally.

Back in 2011, Arm launched its 64-bit processing architecture, enabling Arm devices to make the leap from low-power mobile devices to high-end supercomputers.

"The Arm architecture is not a static thing. We keep on innovating and evolving to meet the ever changing needs of the computing world," said Richard Grisenthwaite, chief architect, in a press briefing. "In our increasingly connected world, we're seeing Arm processors being used at all stages. The collection of data often starts with ultra-low-power IoT devices based on the Arm profile processes, or from the Arm-based smartphones that virtually all of us carry all of the time. ... It continues to be the processor of choice."

Security is computing's greatest challenge

Arm's Confidential Compute architecture is aimed at making computing more secure on the inside.

To address the greatest technology challenge today -- securing the world's data -- the Armv9 roadmap introduces the Arm Confidential Compute Architecture (CCA). Confidential computing shields portions of code and data from access or modification while in use, even from privileged software, by performing computation in a hardware-based secure environment.

The Arm CCA will introduce the concept of dynamically created Realms, usable by all applications, in a region that is separate from both the secure and non-secure worlds. Segars said that Realms are much like software containers, which isolate code in certain ways, but with hardware support.

For example, in business applications, Realms can protect commercially sensitive data and code from the rest of the system while it is in use, at rest, and in transit. In a recent Pulse survey of enterprise executives, more than 90% of the respondents believe that if confidential computing were available, the cost of security could come down, enabling them to dramatically increase their investment in engineering innovation.

"The Arm Confidential Compute architecture will introduce the concept of dynamically created Realms, usable by ordinary programs in a separate computation world from either the non-secure or secure world that we have today," Grisenthwaite said. "Realms use a small amount of trust and a testable management software that is inherently separated from the operating system."

AI everywhere

The internet of things will rely on Armv9.

The ubiquity and range of AI workloads demands more diverse and specialized solutions. For example, it is estimated there will be more than eight billion AI-enabled voice-assisted devices in use by the mid-2020s, and 90% or more of on-device applications will contain AI elements along with AI-based interfaces like vision or voice.

To address this need, Arm partnered with Fujitsu to create the Scalable Vector Extension (SVE) technology, which is at the heart of Fugaku, the world's fastest supercomputer. Building on that work, Arm has developed SVE2 for Armv9 to enable enhanced machine learning (ML) and digital signal processing (DSP) capabilities across a wider range of applications.

"I am excited about the new generation of Arm instruction sets and technological capabilities," said Patrick Moorhead, an analyst at Moor Insights & Strategies. "Performance-wise, Arm is making it easier to integrate ML capabilities into the end product. It's important to recognize that for most performance cases, especially CPU, it's more about the architecture of the design versus the instruction set. So in other words, chip designers still need to architect something performant. Security is dramatically improving too, and if we had these technologies fully enabled today, it could ward off most all of the known attacks. I also think Arm thought about the future with 'Realms' even though it won't be out day one."

SVE2 enhances the processing ability of 5G systems, virtual and augmented reality, and ML workloads running locally on CPUs, such as image processing and smart home applications. Over the next few years, Arm will further extend the AI capabilities of its technology with substantial enhancements in matrix multiplication within the CPU, in addition to ongoing AI innovations in its Mali graphics processing units (GPUs) and Ethos network processing units (NPUs).

Segars noted that one customer, Johnson Controls, has been working on automation and control equipment in buildings for more than a century. "They're a major user of Arm-based chips, and now we're talking to them about the enhanced AI and security features coming with the new Armv9 architecture being launched today," Segars said. "One upgrade JC is considering is the use of AI-powered digital twins monitoring key equipment in real time within the company, as well as aggregating data in the cloud."

Johnson Controls has already used Arm chips to manage chiller systems and cut energy use by more than 50%.

Segars also said that Arm-based devices could prove that someone has been vaccinated against COVID-19. The smartphone could be used for that, and it could store medical information, but to be comfortable with that, Segars said he would want advanced encryption running on the device beyond what is possible today. He would want features like memory tagging to help eliminate memory cybersecurity issues. "Our first smartphone product with an Armv9 CPU will be commercially available by the end of this year," Segars said.

Besides security, Armv9 supports specialized AI, DSP, and XR workloads. Segars said he also expects Arm's combination with Nvidia will advance areas such as graphics computing.

Maximizing performance

Arm's specialized processors.

Over the past five years, Arm designs have increased CPU performance annually at a rate that outpaces the industry, Segars said. He added that Arm will continue this momentum into the Armv9 generation with expected CPU performance increases of more than 30% over the next two generations of mobile and infrastructure CPUs.

However, as the industry moves from general-purpose computing toward ubiquitous specialized processing, annual double-digit CPU performance gains are not enough. Along with enhancing specialized processing, Arm's Total Compute design methodology will accelerate overall compute performance through focused system-level hardware and software optimizations and increases in use-case performance.

By applying Total Compute design principles across its entire IP portfolio of automotive, client, infrastructure, and IoT solutions, Armv9 system-level technologies will span the entire IP solution, as well as improving individual IP. Additionally, Arm is developing several technologies to increase frequency, bandwidth, and cache size, and reduce memory latency to maximize the performance of Armv9-based CPUs.

"There was very little detail in the disclosures. Realms should improve security, particularly for multiuser cloud systems (such as Amazon Web Services)," said Linley Gwennap, principal analyst at the Linley Group, in an email. "The memory protection stuff sounded interesting, but it seems like it might be years away. In summary, v9 offers a much smaller improvement than v8."

The next decade of computing

Arm's campus in Cambridge, United Kingdom.

Grisenthwaite said that addressing the demand for more complex AI-based workloads is driving the need for more secure and specialized processing, which will be the key to unlocking new markets and opportunities.

"It's been an amazing, tragic, and enlightening year, no matter where we've been living or working," Segars said. "Now, it's time to rebuild a world that's inherently more resilient. In computers, one of the most urgent needs is expanding the data processing capacity in the cloud. We can't just do that at any cost. Transforming the cloud isn't just about the more. It's about different, especially when it comes to the performance per watt of traditionally power-hungry datacenter chips."

Arm collected supporting comments from customers including Ampere Computing, Cadence, Crytek, Foxconn, Fujitsu, Google, Marvell, MediaTek, Nvidia, NXP, Oppo, Red Hat, Renesas Electronics, Samsung, Siemens, Synopsys, Unity Technologies, Vivo, VMware, and Xiaomi Group.

"We're not just focusing on the CPU and GPU either, but looking at all of compute, as well as maximizing performance by deploying new system technologies that provide additional gains," Segars said. "And we are broadening the architecture to execute even more compute, such as DSP and AI on the CPU."

Security is computing's greatest challenge

AI everywhere

Maximizing performance

The next decade of computing

More