VentureBeat: Back in the data center, what kind of solution makes sense as far as supporting the processing that happens in the car?
Rao: The data center you think of as the aggregation point. The car needs to act in a short amount of time. It can’t send things all the way back. It’s going to have inference running there very efficiently and with very low latency constraints. That data is all collected — maybe even collapsed down, doing some first-level processing – and you send it back to the cloud. Multiple cars driving on a road might see an accident or a pothole or something. That can all be transmitted back to the cloud. “At this point in the road there’s a pothole,” and all cars will know how to avoid it.
That’s a great example of how we can do collective learning. The cloud is an essential part because it’s the aggregation point.
VentureBeat: I guess the easy to way to explain it is the car sees a limited area around it, whereas the data center has the entire map of the world?
Rao: We do the same thing as humanity, right? Each human can only experience so much of the world, but we read books. We go to university. We learn from the collective experience of many others. We do that through books and learning, but that’s quite a slow process. If we could do that much faster, just download all those experiences directly, we’d have something new.
VentureBeat: And then the challenge is the real time nature of it. Sometimes a car has to know something right now.
Rao: It has to be able to take an appropriate action, right out of the box.
VentureBeat: Does this mean you have a bunch of parallel teams at Intel, working on different directions?
Rao: Working on different products? We do, of course. You have to do that. But we actually have a very organized effort from an architecture standpoint. We look at all of the products holistically and say, “What IP is making what capability happen?” We want to make sure we’re optimally sharing IP across different things that have the same capabilities.
For instance, if we build some of these capabilities we’re seeing emerge, like the Matrix stuff, the neural network accelerator, and we want to put that into a CPU, we’re going to take that IP and put it in a CPU, or into an FPGA. Or we might take some technology the Movidius side of things, DSPs and low-power compute, and we might add that to a client part for laptops. These are different applications, but the idea is that if we have IP developed for a particular part of the product portfolio, we can share that across others.
VentureBeat: When it comes to AI, how much fits within the x86 umbrella, and how much might be outside of it?
Rao: A lot of it is outside. AVX-512 is outside of x86 already. VNNI and DL Boost, all the things we talk about, are actually outside of x86. They’re part of our extensions that we’ve added.
VentureBeat: I guess you can create whatever you want for IA, but what I wonder is, is there a definition of what would still be used in the PC versus what gets used for AI that’s completely foreign to the PC?
Rao: It’s not so simple. There really isn’t such a clear dichotomy. x86 is a particular contract with software. I’m going to provide these instructions. Software spells those things out. When software is aware of what else is in the system, it can take advantage of that through another library. It’s not necessarily clear that it has to be part of the core instruction set to be something that’s usable by clients.
We see this already. GPUs 20 years ago were an example of that. They provided extensions for graphics capabilities. In fact, even today, we have integrated graphics on our CPUs, but they’re a different set of instructions. There’s a driver that sits between the host and the integrated GPU and provides those capabilities. We’ve already seen that paradigm get a little more fuzzy. It’s not just x86 core anymore.
VentureBeat: Do you put people like [legendary chip architect] Jim Keller on these questions, as to what the architecture should be?
Rao: Right. He’s trying to look at this portfolio-wide. How are we using IP? How are we creating programming models that have the best impact across large developer segments? When something goes into Xeon, we need to be really clear about how we enable that developer community, because arguably the largest developer community in the world runs on Xeon. We have to be very clear about how the programming model works and how it interacts with other instructions.
VentureBeat: Back to tacking things on to Xeon designs, how far does that get you before you start seeing diminishing returns?
Rao: Generation to generation was 2X. We optimized software from the launch of Sky Lake, got to 5.4X, and then we got another 2X on top of that. That’s where we get to 11X. In terms of running out, you really have to look at specific workloads that are emerging at the time. If you took the neural networks of today and projected them out to 2023, I can make some strong predictions about how things are going to look, but I think the world is going to change quite a lot between now and then, to be honest.
Always, with that caveat—memory bandwidth matters. How much we have to move the data around matters. How big are the wires connecting everything? You run out of juice when you run out of memory bandwidth and/or you run out of ways to effectively shuttle the memory around in the architecture. That’s why a specific architecture can yield large gains, because you’re not trying to hit many workloads that have different patterns of data. You have one or two patterns of data movement that you’re optimizing for.
It’s hard to say “run out of juice.” It’s more that these are the constraints you’re under. You’re unlikely, in a general-purpose platform, to build something as optimized as something that’s not general purpose, for that reason.
VentureBeat: For Cooper Lake versus Ice Lake, did you create a dilemma for customers there as to how soon they want their next chip? I assume they want it now, but they have to decide whether to go for 2019 or 2020.
Rao: It comes down to a question of, what’s the use case? When do you want those features? Cooper Lake is going to have the new DL Boost stuff and it’s going to be socket-compatible. It’s platform-compatible with Ice Lake. It’ll be pretty easy for people to switch if they want it. These customers that buy these at the data center level are very rational. They’ll do a strict TCO calculation. Does it make sense right now? We’re getting incredible performance on our 14nm node. Does that next bump help you or not? That’s how the question is asked.
VentureBeat: How confident are you that we can solve the problems of self-driving cars soon? Getting to level five, how soon can we expect some of these things to lead to the breakthroughs we need to get these cars on the road?
Rao: I don’t work on that as much. I look at it in more of a general context. There are hard problems in computing and robotics. This is a subset of both of those. Level five driving is a hard problem, because you need to understand intent. It’s going to be more time than people tend to realize. I’ve always predicted that it’ll be the late 2020s, a 2028 time frame. That’s my guess, personally. That’s not Intel speaking.
I think we’ll get to level four in constrained spaces much earlier, in the next few years. We’re already seeing it. There are still some problems to solve. But getting to a point where you have something truly autonomous in the real world is going to take some time.