VentureBeat: Different things I’m noticing are–Nvidia’s latest chips seem like they’re monstrously large, like 15 billion transistors. That’s what they call some of their AI-designed chips. What’s happening there? Is AI a lot less efficient in terms of processing than traditional PC or data center tasks? Is something about AI processing causing larger problems for chip designers? Hot Chips was very crowded this year. It was almost going through a revival, because everyone wants to learn something about designing AI chips. I don’t know how difficult some of this is becoming.
Singer: The reason why chip sizes grow is primarily because of the data set and the opportunity for high concurrency within the data set. It’s not as much because of the complexity of the problem. It’s primarily because, when you have those large tensors–the more we get into real-life cases, the imaging is becoming more complex. The data sets on speech and language models are becoming larger. You just have large tensors, and you have a lot of opportunity for parallelizing it, because there’s less dependency when you do those waves of compute.
Regardless of efficiency–some companies will do it more efficiently than others. But there’s an opportunity for concurrency. The data is large and there’s a lot of inherent concurrency in the computation. We believe some things can be done more efficiently, so I’m not commenting on others. But inherently, it’s larger data sizes which can be done concurrently.
To your point about complexity, the original networks were actually simple. The original AlexNet and GoogleNet were basically lots of matrix multiplication and convolution, and some basic functions. The more modern topologies are becoming more complex. This drives not necessarily size, but it drives a need for more sophistication in integration between different types of compute.
MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.
I’ll give you an example. When you have things like neural machine translation, NMT, within NMT you have portions that are pure neural net, and there are portions that are sequential. When you look at what’s called, in this case, attention algorithms, you look at your history and try to pick up relevant things, like you do in your brain. When there’s something you need to understand in context, you search for things that you know from the past that might give context to the new information coming in. Sophisticated architectures benefit from the ability to effectively combine the various types of compute. They do neural networks very well, but they also have a system view that integrates it in an optimal manner with other types of compute. Complexity comes from the system view of the solution, integrated very effectively.
VentureBeat: We’re getting into different kinds of AI processing. Earlier problems were not so hard to solve with the way deep learning started here. This is a flower, that’s also a flower, that’s not a flower. But what’s necessary now for harder problems is strong AI. When you’re in a dynamic environment and you’re driving a car and the environment around you is changing, all that data keeps on changing, the deep learning approach isn’t so good at spotting the one hazard coming at you that you need to quickly identify and avoid. Deep learning seems to be too dumb a way to arrive at the conclusion that there’s a threat coming. I don’t know if strong AI means something to everybody or not? But in this case, it’s a simple way of saying that we need to do a different kind of processing.
Singer: This is about machine learning as a whole. It’s deep learning plus other elements. I do believe that as we’re going into real-world problems, it’s going to be a complement of deep learning with other machine learning. There are other machine learning techniques out there that Intel has been working on for multiple years. Deep learning is the one that has the most breakthrough in the last four years, but to go to more complete solutions, absolutely, it has to have a set of capabilities that includes deep learning and other types of machine learning.
Deep learning is exceptionally good at some things, like identifying patterns and anomalies. But as we look at emergent machine learning, there needs to be complementary types of machine learning together with deep learning in order to solve problems. Deep learning still has a lot of growth, even if it’s now coming of age. It’s not anywhere close to tapering off. For full solutions to problems, we need to keep an eye out. We’re investing in other kinds of machine learning, not only in deep learning.
VentureBeat: I spoke to [Intel chip architect] Jim Keller (formerly of Tesla and Apple) recently. It seems like one of the tasks assigned to him and people like him is to recognize all these different architectures within the company and across the whole industry, and then realizing that there are different problems with different ways of solving them. Figuring out what’s going to be the best way to bring all of those things together.
Singer: The question we have is, how do we have a portfolio that’s rich enough to provide optimal solutions for the very different problem spaces — from one watt to 300 or 400 watts, from latency-sensitive to throughput-sensitive. How do you create a portfolio that’s broad enough that it doesn’t have overlaps? Reusing technologies for thing that are similar. That’s a problem that Jim Keller drives, and a lot of us in the architecture leadership are participating in it. We have a portfolio, and we want to have a diverse portfolio that creates great coverage, but with minimal overlap.
VentureBeat: The chip designers, do they have to then come up to speed? If you grew up with x86, and now you have this whole new world of AI processing, are they having to adapt to a lot of things?
Singer: On the hardware side, yes. We have a combination. We have a lot of talent that came from the outside, both in company acquisitions like Movidius and Nervana and others, we also have individual acquisitions of talent. And then we have engineers doing CPU and network processing. A lot of network processing is relevant. They learn new skills. It’s a combination.
To the point about x86, we actually put AI acceleration within x86. x86 has always been something that grows. Floating point was added to x86. AVX and vector processing were added under x86. We have instructions like VNNI that are added under x86. We don’t see x86 as something that’s not AI. It’s a foundation that has AI, but also other things. Then we see dedicated solutions that are primarily AI.
You were asking if I see an integration trend, where technology starts to accelerate. We definitely look at it across the company. Things that can go in are going in, and some things are better fit to the outside, because the way they interact, without caching hierarchy and so on, is more appropriate, at least for the time being. But this trend of having technologies that are on the side and then get integrated in has been in semiconductors for a long time. Whatever fits under the x86 framework has a trend.
VentureBeat: There was that Darwinian adaptation of the CPU. It absorbs different things over time.
Singer: Some things are done when you have them outside. The way you get the data in, the way you get everything close to the compute, there might be some advantages for certain technologies, at least for a while, for there to be this acceleration.
VentureBeat: You’ve been talking more about connecting to other chips and co-processors in ways that–different ways of splitting up the architecture across chips.
Singer: Yes. I mentioned the three parts of the hardware strategy are to make Xeon continuously better, add accelerators, and then the third one, which we didn’t much about, is the system optimization. We talked about the fact that problems tend to be large. The data size is large. Having a multi-chip solution, either multiple hosts or a host and multiple accelerators–how do you partition the problem so it’s worked in parallel by multiple engines? How do effectively move data?
Within the Nervana NNP, for example, we have a very fast interconnect that can connect from one substrate, one fabric, directly to another fabric without going through external memory. We can move data very effectively over a large and well-partitioned problem. We look at it as a system problem.
Now, how much do you put in a single package? There’s always the question of what you put on the same die. Now, with all the multi-die technologies, how do you put these packages on multiple dies, and how do you connect the packages together for a system? It’s a question of partitioning that changes all the time with the type of technologies and the silicon budget we have.