The battle to create the best artificial intelligence chips is underway. Intel is approaching this challenge from its position as a maker of central processing units (CPUs) or the Xeon microprocessors that dominate the datacenter market. Rival Nvidia is attacking from its position as a maker of graphics processing units (GPUs), and both companies are working on solutions that will handle ground-up AI processing.
Nvidia’s GPUs have already grabbed a good chunk of the market for deep learning neural network solutions, such as image recognition — one of the biggest breakthroughs for AI in the past five years. But Intel has tried to position itself through acquisitions of companies such as Nervana, Mobileye, and Movidius. And when Intel bought Nervana for $350 million in 2016, it also picked up Nervana CEO Naveen Rao.
Rao has a background as both a computer architect and a neuroscientist, and he is now vice president and general manager of the Artificial Intelligence Products Group at Intel. He spoke this week at an event where Intel announced that its Xeon CPUs have generated $1 billion in revenue in 2017 for use in AI applications. Rao believes that the overall market for AI chips will reach $8 billion to $10 billion in revenue by 2022.
To get there, Intel will probably have to architect AI chips from the ground up — and beat Nvidia and other startups to the punch. I talked to Rao about the competition at Intel’s Data Centric Innovation Summit this week in Santa Clara, California.
Here’s an edited transcript of our interview.
VentureBeat: There are some interesting numbers. The billion-dollar number for Xeon—it’s interesting in contrast to $130 billion total. It’s a start.
Naveen Rao: In the startup world that’s huge. You’re a $20 billion company all of a sudden. It’s the start of the market. AI is really just beginning. It’s the top of the second inning right now. We have a long way to go.
VentureBeat: It seems your strategy is covering the notion that AI chips need to be architected as AI chips, as opposed to CPUs or GPUs.
Rao: To a certain degree, yes.
VentureBeat: To what degree is that true, that you have to something more from the ground up for AI?
Rao: I’ll give you a great example from our competitor. They did almost exactly this. They took their GPU and they tacked on a tensor core, which is completely an AI thing. You could think of it like, their ecosystem is the GPU, so they stuck this other thing on the die. As opposed to what we do—our ecosystem is the whole computer. It’s the CPU. We build those capabilities into this kind of accelerator. That was good for their strategy, but it wasn’t leveraging the GPU. It was building something almost from the ground up for this problem.
I’d argue that there are already proof points to this end. But that being said, we’re evolving the CPU. There’s an arc of evolution on that, because it supports many different applications. It supports the scale we’ve come to love in the data center on CPU. We want to be careful with how we do it. We have to make sure we can maintain leadership on all the workloads that are important today, and then add new capabilities that are important for the workloads of tomorrow. The way we understand those workloads of tomorrow is building accelerators — which is really the tip of the spear — figuring out what capabilities make sense to move back into the host.
VentureBeat: Would Nvidia just come back and say, “You guys are just tacking on something to do better image recognition”? Do you still feel like we’re in this stage of adding on extras to existing things? What comes after that?
Rao: What comes next is understanding how to move the data effectively for AI workloads near the compute. That’s what the crest line has always been about. How do we effectively do that and accomplish maximum performance? You see it in a GPU today. The utilization is extremely low, because they haven’t done that holistic approach.
Again, it’s a great strategy for them, because they’re leveraging their platform to the hilt. That’s the right thing to do. Likewise, we’ve seen that—inference is on a very fast path to expansion of number of cycles in the data center. Given the position of Xeon in the data center, it’s natural that we add those capabilities. The workload mix shifts over time. This didn’t exist five years ago. Now it’s a billion dollars, as we said. In the context of the larger data center market, which is about $200 billion, that’s tiny. But it’s on a very rapid trajectory for expansion.
VentureBeat: With Nervana, are you getting more into the ground-up AI designs?
Rao: That’s right.
VentureBeat: When you’re doing this, what is totally different, compared to adding things to a CPU?
Rao: The way you manage your data, you don’t generally have automatically managed caches. That’s one aspect. The commitment to a particular data type—you don’t need to support 100 different workloads. You support the ones that matter for AI. You can be more precise about your data type. That has impacts on how many wires you connect everything with. You can optimize the interconnect on the chip, even, based around that, which gives you a performance bump.
The capabilities of distributing workloads—you don’t need to be general. You’re not going to do many different kinds of parallel distributed computing. You’re going to do a particular set of them. You can build the constructs for those sets. It allows you to be more targeted in your technology and get something to work. If you try to boil the ocean from the start you’ll never do it.
VentureBeat: Are there going to be different solutions for inference and training, more specialization?
Rao: I think we’re going to see—you can call them specialized solutions, but really they’re dials tuned in different direction. What we’re already seeing merging is that TCO, performance per watt, is extremely important for inference. You do massive scale. It’s tied in to your application. Tying into the stack and very good TCO are important.
For training, TCO is less important. People are okay with things not being fully occupied all the time, because when the engineer calls and wants a training job done, they want it done as fast as possible. Max performance is greater. The way you can look at that from a technology perspective, one down—okay, where does the power go in my training solution? Memory interfaces are a big part of it, and compute.
On my inference solution I don’t need the memory. It’s not as memory-intensive. I can turn that down. I use different memory technologies. If you’re building a scale-out inference technology, you’re not going to use HBM. It’s too power-hungry. It has better performance, but you don’t need that, so let’s use something lower-power, with better performance per watt. Your caches can be different. The residency of what’s actually used on the chip is different.
Again, you tweak all these parameters. They work like a family of products, but they’re actually different knobs twisted different ways to optimize for a particular task.
VentureBeat: It seemed interesting that for self-driving cars, you have multiple kinds of solutions for different things, whether it’s in the vehicle or in the cloud communicating back to the vehicle. I talked with Vinod Dham about his new startup, AlphaICs. He’s saying that strong AI and agent-based AI may be more necessary to solve the self-driving problem. How seriously is that becoming the direction everyone has to go?
Rao: Let me give you more of a contextual answer for that. If we look at a brain, and we look at parts of it, there’s the ability to process the visual environment, to segment it. Those are plants. That’s a sidewalk. That’s a light. That’s one aspect. That’s what we’re doing today. What we call AI is really processing complex data and simplifying it into things that are potentially actionable. When I build a self-driving car I segment where roads are, where pedestrians are, and feed that into some more declarative coding. “If child in front of me, apply brakes.”
The next level of AI is continually learning and being attached to the environment. As a neuroscientist, this was something I studied. If I want to accomplish a movement, take my hand and move it to this point in space, I have to figure out what motor command needs to be sent, and actually predict what the sensory consequences will be. What will it look like to my eyes and feel like to the sensors in my arm when it achieves that position? If there’s a mismatch, something wrong happened. How do I fix that mismatch? I learn that from my mistakes.
It’s a continually evolving loop of action, environment, consequence, and learning. That’s where we want to get to. I do believe that to get to full autonomy for a robot in the world – which is what an autonomous car is – we probably do need to solve some of those problems. We’re not quite there yet.
VentureBeat: Back in the data center, what kind of solution makes sense as far as supporting the processing that happens in the car?
Rao: The data center you think of as the aggregation point. The car needs to act in a short amount of time. It can’t send things all the way back. It’s going to have inference running there very efficiently and with very low latency constraints. That data is all collected — maybe even collapsed down, doing some first-level processing – and you send it back to the cloud. Multiple cars driving on a road might see an accident or a pothole or something. That can all be transmitted back to the cloud. “At this point in the road there’s a pothole,” and all cars will know how to avoid it.
That’s a great example of how we can do collective learning. The cloud is an essential part because it’s the aggregation point.
VentureBeat: I guess the easy to way to explain it is the car sees a limited area around it, whereas the data center has the entire map of the world?
Rao: We do the same thing as humanity, right? Each human can only experience so much of the world, but we read books. We go to university. We learn from the collective experience of many others. We do that through books and learning, but that’s quite a slow process. If we could do that much faster, just download all those experiences directly, we’d have something new.
VentureBeat: And then the challenge is the real time nature of it. Sometimes a car has to know something right now.
Rao: It has to be able to take an appropriate action, right out of the box.
VentureBeat: Does this mean you have a bunch of parallel teams at Intel, working on different directions?
Rao: Working on different products? We do, of course. You have to do that. But we actually have a very organized effort from an architecture standpoint. We look at all of the products holistically and say, “What IP is making what capability happen?” We want to make sure we’re optimally sharing IP across different things that have the same capabilities.
For instance, if we build some of these capabilities we’re seeing emerge, like the Matrix stuff, the neural network accelerator, and we want to put that into a CPU, we’re going to take that IP and put it in a CPU, or into an FPGA. Or we might take some technology the Movidius side of things, DSPs and low-power compute, and we might add that to a client part for laptops. These are different applications, but the idea is that if we have IP developed for a particular part of the product portfolio, we can share that across others.
VentureBeat: When it comes to AI, how much fits within the x86 umbrella, and how much might be outside of it?
Rao: A lot of it is outside. AVX-512 is outside of x86 already. VNNI and DL Boost, all the things we talk about, are actually outside of x86. They’re part of our extensions that we’ve added.
VentureBeat: I guess you can create whatever you want for IA, but what I wonder is, is there a definition of what would still be used in the PC versus what gets used for AI that’s completely foreign to the PC?
Rao: It’s not so simple. There really isn’t such a clear dichotomy. x86 is a particular contract with software. I’m going to provide these instructions. Software spells those things out. When software is aware of what else is in the system, it can take advantage of that through another library. It’s not necessarily clear that it has to be part of the core instruction set to be something that’s usable by clients.
We see this already. GPUs 20 years ago were an example of that. They provided extensions for graphics capabilities. In fact, even today, we have integrated graphics on our CPUs, but they’re a different set of instructions. There’s a driver that sits between the host and the integrated GPU and provides those capabilities. We’ve already seen that paradigm get a little more fuzzy. It’s not just x86 core anymore.
VentureBeat: Do you put people like [legendary chip architect] Jim Keller on these questions, as to what the architecture should be?
Rao: Right. He’s trying to look at this portfolio-wide. How are we using IP? How are we creating programming models that have the best impact across large developer segments? When something goes into Xeon, we need to be really clear about how we enable that developer community, because arguably the largest developer community in the world runs on Xeon. We have to be very clear about how the programming model works and how it interacts with other instructions.
VentureBeat: Back to tacking things on to Xeon designs, how far does that get you before you start seeing diminishing returns?
Rao: Generation to generation was 2X. We optimized software from the launch of Sky Lake, got to 5.4X, and then we got another 2X on top of that. That’s where we get to 11X. In terms of running out, you really have to look at specific workloads that are emerging at the time. If you took the neural networks of today and projected them out to 2023, I can make some strong predictions about how things are going to look, but I think the world is going to change quite a lot between now and then, to be honest.
Always, with that caveat—memory bandwidth matters. How much we have to move the data around matters. How big are the wires connecting everything? You run out of juice when you run out of memory bandwidth and/or you run out of ways to effectively shuttle the memory around in the architecture. That’s why a specific architecture can yield large gains, because you’re not trying to hit many workloads that have different patterns of data. You have one or two patterns of data movement that you’re optimizing for.
It’s hard to say “run out of juice.” It’s more that these are the constraints you’re under. You’re unlikely, in a general-purpose platform, to build something as optimized as something that’s not general purpose, for that reason.
VentureBeat: For Cooper Lake versus Ice Lake, did you create a dilemma for customers there as to how soon they want their next chip? I assume they want it now, but they have to decide whether to go for 2019 or 2020.
Rao: It comes down to a question of, what’s the use case? When do you want those features? Cooper Lake is going to have the new DL Boost stuff and it’s going to be socket-compatible. It’s platform-compatible with Ice Lake. It’ll be pretty easy for people to switch if they want it. These customers that buy these at the data center level are very rational. They’ll do a strict TCO calculation. Does it make sense right now? We’re getting incredible performance on our 14nm node. Does that next bump help you or not? That’s how the question is asked.
VentureBeat: How confident are you that we can solve the problems of self-driving cars soon? Getting to level five, how soon can we expect some of these things to lead to the breakthroughs we need to get these cars on the road?
Rao: I don’t work on that as much. I look at it in more of a general context. There are hard problems in computing and robotics. This is a subset of both of those. Level five driving is a hard problem, because you need to understand intent. It’s going to be more time than people tend to realize. I’ve always predicted that it’ll be the late 2020s, a 2028 time frame. That’s my guess, personally. That’s not Intel speaking.
I think we’ll get to level four in constrained spaces much earlier, in the next few years. We’re already seeing it. There are still some problems to solve. But getting to a point where you have something truly autonomous in the real world is going to take some time.