Interested in learning what's next for the gaming industry? Join gaming executives to discuss emerging parts of the industry this October at GamesBeat Summit Next. Register today.
Intel has tens of thousands of chip designers, and they’ve been busy designing new chips that the computer chip giant hopes can bring it back into a leadership position.
The Santa Clara, California-based Intel’s top chip architects — as well as CEO Pat Gelsinger — touted the designs at Intel Architecture Day 2021 in the hopes of instilling confidence in its leadership of the basic PC computing platform, the x86 architecture. Intel has had a hard time in the past few years with both manufacturing delays and weaker designs compared to rival Advanced Micro Devices. Intel has lost market share to AMD, and that’s why Gelsinger was brought back to run the company in February.
Now we’re able to glance at the different central processing units (CPUs), graphics processing units (GPUs), and other chips that the company has been designing in recent years. One of the chip solutions, Ponte Vecchio, will have more than 100 billion transistors, the on-off switches of digital technology. All told, Intel had 14 different architects and designers speak, starting with Raja Koduri, a former AMD graphics executive and Intel’s senior vice president and general manager of the Accelerated Computing Systems and Graphics Group.
Koduri showed off Intel’s first performance hybrid architecture, code-named Alder Lake, with features that include the intelligent Intel Thread Director for scheduling processing tasks. Intel also described Sapphire Rapids, the next-generation Intel Xeon Scalable processor for the datacenter. And it showed new infrastructure processing units and upcoming graphics architectures, including the Xe HPG and Xe HPC microarchitectures, and Alchemist and Ponte Vecchio system-on-chip computing solutions.
“Architecture is the alchemy of hardware and software,” said Koduri in a press briefing. “It blends the best transistors for a given engine, connects them through advanced packaging, integrates high-bandwidth, low-power caches, and equips them with high-capacity, high-bandwidth memories and low-latency scalable interconnects for hybrid computing clusters in a package, while also ensuring that all software accelerates seamlessly. … The breakthroughs we disclosed today demonstrate how architecture will satisfy the crushing demand for more compute performance as workloads from the desktop to the data center become larger, more complex and more diverse than ever.”
Some analysts were impressed with the event.
“The big takeaway for me was how comprehensive and far-reaching the architecture story was,” said Bob O’Donnell, the president of Technalysis Research, in an email to VentureBeat. “Another big point was how completely they have pivoted to multi-chip, multi-core designs. Whether it’s tiles or chiplets, it seems clear that the future of high-powered semis is clever packaging and combinations of different technologies into a bigger, more comprehensive design.”
One of the big projects, previously code-named Gracemont, went by the simple name of the Efficient Core. This chip family will have cores, or computing sub-brains, that are designed for power efficiency. Four Efficient Cores on one chip will use 80% less power (than two predecessor Skylake cores running four threads) and have 80% better performance than the previous generation, Koduri said.
Compared with Skylake, Intel’s most prolific CPU microarchitecture, the Efficient-core delivers 40% more single-threaded performance at the same power, or the same performance while consuming less than 40% of the power. A Skylake chip would have to consume five times more power to deliver the same performance as an Efficient Core-based chip, said Stephen Robinson, an Intel Fellow, in the press briefing.
Different chips in this family will vary based on the number of cores that are jammed into the larger chip, which will be kept as small as possible for power consumption and cost reasons. This chip family can run at a low voltage to reduce overall power consumption while creating the power headroom to operate at higher frequencies. And it has better ways to predict how to process each computing thread.
Intel’s new Performance Core microarchitecture, previously code-named Golden Cove, is designed for speed and pushes the limits of low latency and single-threaded application performance.
Workloads are growing in their code footprint and demand more execution capabilities. Datasets are also massively growing along with data bandwidth requirements. Intel’s new Performance Core provides a significant boost in general-purpose performance and better support for large code footprint applications. It can do more things in parallel and more tasks in a given time. It will be roughly 19% faster than 11th Gen Intel Core processors (dubbed Cypress Cove) currently on the market.
It will also have Intel Advanced Matrix Extensions, the next-generation, built-in AI acceleration advancement, for deep learning inference and training performance. It includes dedicated hardware and new instruction set architecture to perform matrix multiplication operations significantly faster than in the past.
Alder Lake client chip
One of the new chips is Alder Lake, a client PC processor that will combine both core types — Performance Core and Efficient Core — on a single chip. It will be built on the Intel 7 manufacturing process, which is equivalent to 7-nanometer chip production.
“I think that Intel will be very competitive with AMD on mobile form factors, especially on battery life with Alder Lake designs, but it remains to be seen what the total performance will be when you combine the 8 high-performance cores with the 8 efficiency cores and whether that will be competitive with AMD’s 12 Core or 10 Core offerings,” Sag said. “Intel is claiming an average uplift of performance on the high-performance cores of 19%, which is considerable and could give them back the single-thread performance crown, but the final product when it gets into reviewers’ hands will ultimately determine that.”
To feed data into this processor, Intel has designed three independent fabrics, each with real-time, demand-based heuristics. The compute fabric can support up to 1,000 gigabytes per second (GBps), which is 100 GBps per core or per cluster and connects the cores and graphics through the last level cache to the memory.
Intel Thread Director
In order for Performance Cores and Efficient Cores to work seamlessly with the operating system, Intel has developed an improved scheduling technology called Intel Thread Director. Built directly into the hardware, Thread Director provides low-level telemetry on the state of the core and the instruction mix of the thread. It empowers the operating system to place the right thread on the right core at the right time. Thread Director is dynamic and adaptive – adjusting scheduling decisions to real-time compute needs – rather than a simple, static rules-based approach. Intel is optimizing Thread Director for the best performance on Microsoft’s upcoming Windows 11 operating system.
Xe HPG Microarchitecture and Alchemist SoCs
Yes, Intel is getting back into the standalone graphics chip business in direct competition with AMD and Nvidia for the first time in many years.
Xe HPG is a new discrete graphics microarchitecture designed to scale to enthusiast-class performance for gaming and creation workloads. The Xe HPG microarchitecture powers the Alchemist family of system-on-chips (SoCs), and the first related products are coming to market in the first quarter of 2022 under the Intel Arc graphics chip brand. The Xe HPG microarchitecture features a new Xe-core, a compute-focused, programmable and scalable element. Three other graphics chips — Battlemage, Celestial, and Druid — will join Alchemist in coming years.
Taiwan’s TSMC will manufacture the Xe HPG on its N6 process node, which is very unusual for Intel. Intel also has a graphics enhancement dubbed XeSS, which uses deep learning to synthesize images in games that are close to the quality of native high-resolution rendering. Nvidia has a similar technology called DLSS, and this tech will enable games that would only be playable at lower quality settings or lower resolutions to run smoothly at higher quality settings and resolutions.
“Xe HPG [Arc] is confirmed that it will be on a competitive TSMC N6 process node and appears to have all the features you would expect from a leadership graphics part. This could possibly help to alleviate some of the pressure on the GPU market,” said Anshel Sag, a senior analyst at Moor Insights & Strategy, in an email to VentureBeat.
The next-generation Intel Xeon Scalable Processor (code-named Sapphire Rapids) is Intel’s big play in the datacenter chip market.
The processor delivers substantial compute performance across dynamic and increasingly demanding data center usages and is workload-optimized to deliver high performance on elastic compute models like cloud, microservices, and AI. It has a tiled, modular SoC architecture that uses Intel’s ability to connect a bunch of packages in a single solution.
Sapphire Rapids is built on Intel 7 process technology and features Intel’s new Performance Core microarchitecture, which is designed for speed and pushes the limits of low-latency and single-threaded application performance. Some AI applications will be able to run seven times faster on Sapphire Rapids using its new Intel AMX extensions.
Infrastructure Processing Unit
The IPU is a programmable networking device designed to enable cloud and communication service providers to reduce overhead and free up performance for CPUs. IPUs can offload tasks from the CPU, doing tasks like managing storage traffic, which reduces latency while efficiently using storage capacity via a diskless server architecture.
Mount Evans is Intel’s first custom IPU. Mount Evans has been architected and developed hand-in-hand with a top cloud service provider and integrates learnings from multiple generations of field programmable gate arrays (FPGA) SmartNICs.
“I would say the Mount Evans news is a big deal since we don’t yet know who the CSP partner is on the design, but the fact that Intel is going to ship an ARM Server part is going to make big news within the industry,” Sag said.
One of the most ambitious chips is Ponte Vecchio, which uses the graphics-focused Xe HPC microarchitecture. It takes all of the parallel processors and the multiple chips in a system to accelerate AI high-performance computing and analytics workload applications. It aims to take Intel’s GPUs deep into the datacenter, Koduri said.
Intel’s first prototypes are running at more than 45 teraflops (TFLOPS FP32) throughput, greater than 5 terabytes per second (TBps) memory fabric bandwidth and greater than 2 TBps connectivity bandwidth. Intel showed a demo showing ResNet inference performance of over 43,000 images per second and greater than 3,400 images per second with ResNet training for AI purposes. It will be built on TSMC’s advanced process technology, dubbed N5.
“At a product specific-level, I think Ponte Vecchio will get the most attention, but I was impressed with what they’re doing with Thread Director on AlderLake,” O’Donnell said. “I think that’s a practical, important step forward that a lot of PC users will be able to appreciate.”
Sag added, “The Ponte Vecchio part is also very interesting because of how complex it is [100 billion transistors] and how many challenges Intel had to overcome to make it possible. It appears to be going after NVIDIA’s A100 with not only a significant amount of Xe cores but also a significant amount of matrix cores which means that this is designed to be an AI powerhouse as well. The fact that it has a throughput of 45 TFLOPS FP32 is quite promising for the part, but will still boil down to software and developer outreach to enable that amount of computing.”
Intel’s foundry partnerships
While Intel is tapping one-time rival TSMC for its chip manufacturing, Intel is also planning to make chips for other companies in Intel factories.
Stuart Pann, a senior vice president in the corporate planning group at Intel, said that the company currently runs as much as 20% of its overall product volume at external foundries, or contract chip manufacturers, and the company is among the top customers of TSMC. Some of that is due to acquisitions.
“We are evolving this integrated device manufacturer model to deepen and expand our partnerships with leading foundries,” Pann said. “The reason is simple: Just as our designers use the right architecture for the right workload, we also choose the node that best fits that architecture. At this point in time, these foundry nodes are the right choice for our discrete graphics products.”
He noted that the upcoming Intel chip Meteor Lake for clients will be made on the upcoming Intel 4 process technology, with some supporting tiles manufactured at TSMC.
“Intel’s hybrid external/internal fab manufacturing approach is just what the company needs right now to regain its inside track, as it builds its IDM 2.0 future,” analyst Altavilla said.
Summing it up
As mentioned, the analysts were positive.
“I think the biggest news was the Alder Lake unveil, with a close second of Intel Alchemist Xe-HPG Graphics detail and XeSS,” said Dave Altavilla, a principal analyst at HotTech Vision And Analysis, in an email to VentureBeat. “I think they can be competitive versus AMD, as Alder Lake’s E-Core and P-Core architecture show good promise to scale across a myriad of workloads efficiently, and with much lower latency and input-output throughput versus the company’s previous generation.”
He added, “On the graphics front early signs of Alchemist are very strong, with Intel also checking all the right boxes supporting the full suite of DirectX Ultimate features set, with ray tracing and variable-rate shading, as well as an open-source and machine learning-accelerated super sampling alternative with XeSS. I think the market is going to wholeheartedly embrace a major third competitor and it’s going to be very interesting to watch versus AMD and Nvidia.”
Intel hopes these architectural breakthroughs will demonstrate how it can be a leader in the next generation of products, Koduri said. Intel didn’t describe the timing for when many of the chips will arrive, with the exception of the graphics launch in early 2022.
“The breakthroughs we disclosed today also demonstrate how architecture will satisfy the crushing demand for more compute performance as workloads from the desktop to the data center become larger, more complex, and more diverse than ever,” Koduri said.
He added, “Looking back at just the past year, technology was at the heart of how we all communicated, worked, played, and coped through the pandemic. Enormous computing power proved crucial. Looking ahead, we face a massive demand for compute – potentially a 1,000x need by 2025. That 1,000-times boost in four years is Moore’s Law to the power of five.”
Gelsinger, who is also a chip architect, closed the day saying, “We face daunting compute challenges that can only be solved through revolutionary architectures and platforms. … Our talented architects and engineers made possible all this technology magic.”
He added, “Intel is back, and our story is just beginning.”
GamesBeat's creed when covering the game industry is "where passion meets business." What does this mean? We want to tell you how the news matters to you -- not just as a decision-maker at a game studio, but also as a fan of games. Whether you read our articles, listen to our podcasts, or watch our videos, GamesBeat will help you learn about the industry and enjoy engaging with it. Discover our Briefings.