Bill Dally recently made the jump from head of the computer science department at Stanford University to being chief scientist at graphics chip maker Nvidia. Now he has to be the chief visionary for the products that Nvidia will make in the future. His expertise is in parallel computing, which he believes will come to dominate the picture. If he’s right, we’ll see a shift of billions of dollars of sales from one part of the chip industry to another. I had the chance to interview him recently about his new role and the future of chip technology.
Bill Dally: I think it is to cast a vision for the future of what a graphic processing unit (GPU) can be. How it can be more than just graphics. And what the opportunities and challenges are in getting from here to there.
Sponsored by VB
VB: What do you mean by that? Will we have just one chip in the PC in the future?
BD: However you package it, the PC of the future is going to be a heterogeneous machine. It could have a small number of cores (processing units) optimized for delivering performance on a single thread (or one operating program). You can think of these as latency processors. They are optimized for latency (the time it takes to go back and forth in an interaction). Then there will be a lot of cores optimized to deliver throughput (how many tasks can be done in a given time). Today, these throughput processors are the GPU. Over time, the GPU is evolving to be a more general-purpose throughput computing engine that is used in places beyond where it is used today.
BD: Both of those terms will blur. Both of the cores are processing cores. They both execute instructions. A CPU today is optimized for single thread performance. That’s becoming less important all of the time. It runs the legacy code (i.e., Windows). People are writing more code that takes advantage of parallel processing in a throughput processor.
The other reason is single-thread performance isn’t scaling. Moore’s Law says we double the number of transistors on a chip every couple of years. It gives us more transistors. Chip architects take more transistors and deliver more performance. And then application writers take more performance and deliver more value to the user. That food chain is broken on the latency processor side. We get more transistors over time, but they don’t make a single-thread processor go any faster. CPU makers respond to this by putting a lot of cores on a chip to try to take advantage of throughput. If you’re going to take advantage of throughput, it’s far more efficient to do that with a throughput-optimized processor than a latency-optimized processor. They can put eight latency-optimized processors on a chip. We can put 240 throughput-optimized cores on a chip. They are each a lot more efficient than the latency optimized hardware that reorders instructions and predicts this and predicts that to get maximum performance out of a thread.
VB: On the PC, we have new software like Microsoft’s upcoming Windows 7 operating system. Lots of multi-touch apps are coming. Is this software showing you what your product roadmap has to be in the future?
BD: In some sense, with Windows 7 and future versions of Mac OS, people are putting more and more code that delivers a better user experience by tapping the GPU. The GPU is becoming a larger factor in delivering value to the end user.
VB: Software still seems to lag far behind hardware. Do you worry that the lag is too far?
BD: I’m not too worried about that because I see the things people are doing with GPUs now. There is always a small worry that if you build it, they won’t come. But they’re already there. So it’s not too much of a worry.
VB: Why did you decide to join Nvidia? You replaced David Kirk as chief scientist. Do you see the world differently than he does?
BD: David and I see the world similarly. That was a reason why it was natural for me to pick up where he had left off. He is more of a graphics person and I am more of a parallel computing person. In terms of Nvidia research, he built it with a strong graphics component. I’m trying to complement that by building strength in other areas.
VB: You go way back to Cray supercomputers.
BD: My first job out of college was at Bell Labs where I was a microprocessor designer. I got a PhD at Caltech. Then I went to MIT. I built parallel computers. I worked with Cray while at MIT. It was on their first massively parallel supercomputers. Then I moved back to California and was a professor at Stanford from 1997 until this January. I was chairman for the last four years at the computer science department.
VB: You were also founder of video processing startup Stream Processors.
BD: I took a leave from Stanford in 2004 to found SPI. I was CEO for the first year and was chairman until recently. I’m still a board member at SPI.
VB: As an executive, what does a chief scientist do?
BD: A small number of people report to me, including Nvidia Research Labs. I have three parts of my job. One is running research. I have plans to grow it when the downturn is over. Another part of the job is influencing product groups. We set a vision and try to nudge the products in the right direction. Where will things be in a decade? Part of the job is to reach out to the outside community: customers, suppliers, and people at universities.
VB: Have your Intel friends said anything to you about joining Nvidia?
BD: I have gotten a couple of emails. They say good luck and ask why I’m at the competition. Various comments of that nature. (Laughs).
VB: It’s interesting how a lot of the rivals are in legal battles now.
BD: I think that’s really unfortunate. We should work on having the best product. That’s where the battle should be fought. When people turn it over to the lawyers, a lot of capital gets destroyed without much value being created.
BD: I’m a pretty happy guy. I see the industry as a big food chain. I’m happy that devices continue to shrink and we get more devices every year. I’m unhappy that energy is not scaling as fast as it once did. Devices are not getting faster as they used to so we are not getting the clock-rate improvements that we once did. The next level of the ecosystem has to do with software. The industry is embracing software improvements. I’m pleased the world is embracing parallel software, including Nvidia’s CUDA programming environment. It’s taking off like wildfire because there are 100 million CUDA enabled GPUs out there. People realize they can write software that runs on a lot of machines. It’s easy to use. The programmers are starting to think and write in parallel. I get frustrated at how long it will take the legacy code to flush itself out of the system.
VB: How big is GPU computing going to get?
BD: If you think of it as throughput computing, it’s going to be all of computing, except in certain niches where single-thread performance is critically important. It’s an interesting transition where throughput computing is mostly graphics now. It’s starting to make progress in scientific computing, video processing, security and a whole bunch of areas. It will become the default for the way we do computing. It’s that food chain. You deliver more value to the user by writing a parallel program.
VB: I heard Mike Abrash give a talk at the Game Developers Conference. He helped get the Xbox and Xbox 360 underway. He said Intel’s Larrabee graphics chip was the most fascinating graphics architecture he had seen in 15 years. He should be in Nvidia’s camp.
BD: I don’t know if I would call it exciting. It’s a pretty straightforward SIMD extension of the x86. My understanding of Larrabee is that it’s a certain number of cores, maybe 16 of them, that run x86 instructions. The cores, to be more energy efficient, go way back to the original Pentium. It’s not that exciting as a graphics architecture. Larrabee has hardware for texture filtering, but they do rasterization and compositing in software. That gives them flexibility, but it costs them a lot. They made a number of choices, like making it x86 compatible, that are expensive and don’t buy them much. You don’t necessarily get much from making the software x86 compatible. You carry the burden of taking the x86 instructions and translating them into some other form that can be processed. That’s a pretty heavy burden to pay. It’s a big chunk of hardware, and to carry it in a chip is not energy efficient. With CUDA on Nvidia, it’s a more natural way to do things.
BD: We certainly do want the game developers. We are the preferred platform for preferred graphics. We are in a constant dialogue with them. They want better performance on graphics. Better anti-aliasing (which helps smooth jagged edges in computer animations). They want things like physics so they can run not just the graphics but do physical modeling. So a wall gets pushed over and it breaks realistically and that creates better experience in the games. They want accurate acoustics. If you knock over a wall, the sound won’t echo off of it and it will sound different. We have our PhysX package that the game developers use so they can deliver physics in games. It’s supported by the features we put into our GPUS. It’s a balance. We get them in our tent with sheer performance. Having the best performance per watt, performance per area. You can measure it by fill rate or gigaflops. We provide raw performance. We also provide features like a shared memory in our streaming multiprocessors, which lets them do a lot more efficient computing on general-purpose tasks. The games people have now don’t use it much. But down the road, our ability to do general-purpose computing is going to become ever more important.
BD: I think what Nintendo did with the Wii is very exciting. It opened up new modalities for humans and computers to interact. But I don’t think those are two mutually exclusive paths. I think they are very complementary. I think what people want to have is both. They want to have rich modalities for users to interact, whether it’s accelerometers with a wand you carry around or cameras that track the user. More robust input of different kinds. They will want photorealistic graphics, sound that reflects the space where the game is being played, and game physics. The interactions should be physically realistic. Cloth draping over water. Wind blowing. Smoke. Those things require a lot more computing power. We always start with the movies. They can devote huge amounts of computing power to the special effects. We try to deliver, some number of years later, in PC graphics, and some years after that, in console graphics, the same experience in real time. That trend will continue. Wii has added another dimension along which people can innovate.
BD: I haven’t seen that movie. Some of the people I work with have worked on movies. The movie people use our Quadro workstation graphics chips. We’re very aware of what they do and compare techniques. We often look at what they do and try to see what we need to do to get movie-quality images. It’s a moving target because movie people are constantly doing better.
VB: There was a new startup that emerged, Caustic Graphics. It was interesting to see that happen, since we started with 50 3-D graphics companies a decade ago and saw it drop down to just Nvidia, AMD and Intel. Now we have another new startup. What do you think of that? Caustic is raising the debate about ray tracing versus rasterization. (We’ve covered this debate, which comes down to how you paint images on a screen).
BD: We raised that debate with our demo last summer of an interactive ray tracer working at 30 frames per second in a complex city scene with a car driving through it. We have shown it on programmable hardware and we have it as a product release. I know little about Caustic beyond what I’ve read in the press. I see they are doing special-purpose hardware. My intellectual curiosity makes me want to find out what they are doing. But the more pragmatic part of me says we can do in software what they’re proposing to do with special hardware. Why would we want to put special purpose hardware in that won’t be used when you’re not doing ray tracing?
When you look at graphics, ray tracing is a niche. It’s great if you have highly reflective surfaces or if you have transparency with reflection and refraction. Or when you have silhouette edges or soft shadows. But for a lot of other things, you want rasterization. You will see hybrid systems with rasterization and ray tracing only when you need it. You don’t want to burden the system with a lot of special purpose hardware that costs a lot.
VB: How will that look in a chip?
BD: It’s all a hybrid of hardware and software. Our current chips have hardware rasterizers. Then we have shaders in software. We decide how to cast the scene and then decide what nuances to apply after that. In some cases, ray tracing should be used.
VB: What do you think about the semiconductor industry being too mature for startups?
BD: I have done several chip startups myself. It’s getting hard. The ante is very high. If you do a chip startup, you need patient investors with very deep pockets. It’s many tens of millions of dollars to get to a first product and $50 million to get to profits. That’s very difficult to do because investors want an exit some multiple over that investment. I am hoping we return to the days of frequent IPOs and get beyond the fire-sale acquisitions. That’s not what you can see right now. If it’s a programmable chip, the cost is even more. There are huge opportunities in graphics and parallel computing. But companies are much better off focusing on software. The best business plan is to have a killer idea on how to do the last part of the value chain. Take the performance of throughput processors and deliver it as a compelling end user experience.
BD: Exactly. Those are great examples of taking throughput computing and doing tasks like stabilizing images or moving video from one device to another. There will be more exciting examples of that over time. We have our GPU Ventures program to invest in companies that can deliver value from GPU computing.
VB: That’s big enough turf to justify a lot of startups?
BD: I think it’s huge. My vision is throughput computing will gradually become all of computing.
VB: You mentioned how hard it is to start things up. What do you think when you hear rumors that Apple may be doing its own chips?
BD: I can understand why they would want to do it.
VB: If we’re talking theoretically, what are the obstacles and attractions of doing that? I suppose it gets harder to clone an iPhone.
BD: Clearly the attraction is differentiation. If you’re buying a merchant part (not exclusive to any one vendor), then somebody else could buy that part. Then your differentiation is all in software. For handheld parts, where battery life is critically important, I can understand someone wanting to differentiate by trying to develop their own secret sauce on things that give them lower power. But the ante is high to get beyond where the merchant chip makers are.
VB: Apple has $29 billion in cash. But starting microprocessors from scratch sounds intimidating.
BD: ARM would be much easier to do than an x86 microprocessor. But it’s not a small undertaking.
VB: Does Nvidia have to think about that too? You have your own ARM-based processors with Tegra. Isn’t that intimidating to do?
BD: There are ARM processors licensed from ARM. Our Tegra part is perhaps the best mobile processor around. It integrates a lot of things. It has an ARM core we license. It has special features we have developed. And it has a slice of the same GPU across our lines. Shortly, you’ll be able to run CUDA from Tegra on up through our high-end Tesla parts. Taking an ARM core and putting it on a chip is relatively easy. Doing all of the special purpose features to put around that core is hard. Doing your own core instead of using ARM, with perhaps lower power, that’s very hard too.
VB: What is your opportunity in mobile?
BD: We bring a number of things to that space. The compute power and battery life of Tegra is really best in class in mobile. As a throughput computer, it uses the same means of delivering parallelism as our other chips. That uniform programming model is compelling for the third party software developers. They want an easy programming model that works across a lot of product lines.
VB: What are the hot job opportunities now? Can you go to college and then get a job as a microprocessor designer like you did?
BD: I think so. Now is a tough economic time across the board. But there are still areas where there is a net job shortage. At Stanford, I was amazed at how few kids were going into computer science given the demand. Microprocessor design is a much smaller world than programming. But people who are good still find jobs. To find the best jobs in the future, you have to think of the future and where the gaps are. Over the next five to ten years, it’s in filling the gap as we move to throughput computing. Moving from serial code to parallel code. It’s not traditional computing, but parallel computing.
VB: Are the schools producing these people?
BD: They’re starting to. CUDA is being taught in more than 100 universities, and we’re helping to develop curriculum for it.
VB: You sound pretty happy. Some people are glum about future jobs in computing.
BD: Well, the world is flat. But the overall demand for computing jobs has been going up more rapidly than they have been getting offshored. There’s a lot of job creation offshore, but also a lot here.
VB’s research team is studying mobile user acquisition: Chime in here, and we’ll share the results.