Connect with top gaming leaders in Los Angeles at GamesBeat Summit 2023 this May 22-23. Register here.
Asked why Nvidia’s latest 40 Series graphics cards cost as much as $1,600, Nvidia CEO Jensen Huang said that Moore’s Law is dead. He explained that the days of constantly falling costs are over, as technology advances in manufacturing have slowed and the pandemic shortage messed things up further.
But don’t worry too much. The advances in both AI and gaming are going to work together to propel the ambitious dreams of humanity, like the metaverse.
Huang spoke at a press Q&A at Nvidia’s online GTC22 conference last week.
Moore’s Law, posited by Intel chairman emeritus Gordon Moore in 1965, stated that the number of components on a chip would double every couple of years. It was a metronome that signaled that every couple of years chip performance would either double or costs would halve.
GamesBeat Summit 2023
Join the GamesBeat community in Los Angeles this May 22-23. You’ll hear from the brightest minds within the gaming industry to share their updates on the latest developments.
And it held true for decades, based mostly on manufacturing advances. But with the laws of physics reaching their limit in terms of miniaturization, those advances are no longer taken for granted. Intel is investing heavily to make the law hold up. But Huang said that smart chip design has to take over, which is why the company shifted to a new architecture for its latest generation of graphics chips. The result for the 40 Series graphics chips is some outstanding performance coming out for PC games just as we head into a global downturn.
Huang believes it’s more important than ever to keep the advances in performance and power efficiency going, as we’re on the cusp of building the metaverse, the 3D universe of virtual worlds that are all interconnected, like in novels such as Snow Crash and Ready Player One. Nvidia has built the Omniverse suite of standardized development and simulation tools to enable that metaverse to happen.
But it won’t be a real metaverse unless it’s real-time and can accommodate lots more people than can access 3D spaces today. Nvidia plans to use the Omniverse to create a digital twin of the Earth, in a supercomputing simulation dubbed Earth 2, so it can predict climate change for decades to come.
With that, we should get the metaverse for free, and we’ll need all the chip processing power available. And he noted that AI, made possible by the graphics chips driven forward by gaming, will enable developers to auto-populate their metaverse worlds with interesting 3D content. In other words, gaming and AI will be helping each other, driving both chips and the metaverse forward. To me, that sounds like a new law is in the making there.
Here’s an edited transcript of the press Q&A. We’ve transcribed the entire press Q&A, which was attended by me as well as a number of other members of the press.
Q: How big can the SaaS business be?
Huang: Well, it’s hard to say. That’s really the answer. It depends on what software we offer as a service. Maybe another way to take it is just a couple at a time. This GTC, we announced new chips, new SDKs, and new cloud services. I highlighted two of them. One of them is large language models. If you haven’t had a chance to look into the effectiveness of large language models and the implications on AI, please do so. It’s important stuff.
Large language models are hard to train. The applications are quite diverse. It’s been trained on a large amount of human knowledge, and so it has the ability to recognize patterns, but it also has within it a large amount of encoded human knowledge. It has human memory, if you will. In a way it’s encoded a lot of our knowledge and skills. If you wanted to adapt it to something that it was never trained to do — for example, it was never trained to answer questions or to summarize a story or to release a breaking news paraphrase. It was never trained to do these things. With a few additional shots of learning, it can learn these skills.
This basic idea of fine tuning, adapting for new skills, or what’s called zero-shot or few-shot learning, it has great implications in a large number of fields. Which is the reason why you see such a large amount of funding in digital biology. Large language models have learned the language of the structure of proteins, the language of chemistry. And so we put that model up. How large can that opportunity be? My sense is that every single company in every single country speaking every single language has probably tens of different skills that their company could adapt, that our large language models could go perform. I’m not exactly sure how big that opportunity is, but it’s potentially one of the largest software opportunities ever. The reason for that is because the automation of intelligence is one of the largest opportunities ever.
The other opportunity we spoke about was OmniVerse cloud. Remember what OmniVerse is. OmniVerse has several characteristics. The first characteristic is that it ingests. It can store. It can composite physical information, 3D information, across multiple layers or what’s called schemas. It can describe geometry, textures and materials. Properties like mass and weight and such. Connectivity. Who is the supplier? What’s the cost? What is it related to? What is the supply chain? I’d be surprised if behaviors, kinematic behaviors — it could be AI behaviors.
The first thing OmniVerse does is it stores data. The second thing it does is it connects multiple agents. The agents can be people. They can be robots. They can be autonomous systems. The third thing it does is it gives you a viewport into this other world, another way of saying a simulation engine. OmniVerse is basically three things. It’s a new type of storage platform. It’s a new type of connecting platform. And it’s a new type of computing platform. You can write an application on top of OmniVerse. You can connect other applications through OmniVerse. For example, we showed many examples with Adobe being connected to AutoDesk applications being connected to various other applications. We’re connecting things. You could be connecting people. You could be connecting worlds. You could be connecting robots. You could be connecting agents.
The best way to think about what we’ve done with OmniVerse — think of it almost like — the easiest way to monetize that is probably like a database. It’s a modern database in the cloud. Except this database is in 3D. This database connects multiple people. Those are two SaaS applications we put up. One is the large language model, and the other is OmniVerse, basically a database engine that will be in the cloud. I think these two announcements — I’m happy that you asked. I’ll get plenty of opportunities to talk about it over and over again. But these two SaaS platforms are going to be very long-term platforms for our company. We’ll make them run in multiple clouds and so forth.
Q: Nvidia has said that it would reduce GPU sell-through into Q4. Do you mean fiscal Q4 or calendar Q4? Can you confirm that the reduced selling will last several more quarters?
Huang: Actually, it depends on — our fiscal Q4 ends in January. It’s off by a month. I can tell you that — because we only guide one quarter at a time, we are very specifically selling into the market a lot lower than what’s selling out of the market. A significant amount lower than what’s selling out of the market. I hope that by that Q4 time frame, some time in Q4, the channel will normalize and make room for a great launch for Ada. We’ll start shipping Ada starting this quarter in some amount, but the vast majority of Ada will be launched next quarter. I can’t predict the future very far these days, but our expectation and our current thinking is that what we see in the marketplace, what we know to be in the channel and the marketing actions we’ve taken, we should have a pretty terrific Q4 for Ada.
Q: What do you think about the progress of the metaverse, especially a real-time metaverse that would be more responsive than the internet we have right now? If it’s coming along maybe slower than some people would like, what are some things that could make it happen faster, and would Nvidia itself consider investing to make that come faster?
Huang: There are several things we have to do to make the metaverse, the real-time metaverse, be realized. First of all, as you know, the metaverse is created by users. It’s either created by us by hand, or it’s created by us with the help of AI. And in the future it’s very likely that we’ll describe some characteristics of a house or of a city or something like that — it’s like this city, like Toronto or New York City, and it creates a new city for us. If we don’t like it we can give it additional prompts, or we can just keep hitting enter until it automatically generates one we’d like to start from. And then from that world we’ll modify it.
The AI for creating virtual worlds is being realized as we speak. You know that at the core of that is precisely the technology I was talking about just a second ago called large language models. To be able to learn from all of the creations of humanity, and to be able to imagine a 3D world. And so from words through a large language model will come out, someday, triangles, geometry, textures and materials. From that we would modify it. Because none of it is pre-baked or pre-rendered — all of this simulation of physics and simulation of light has to be done in real time. That’s the reason why the latest technologies that we’re creating with respect to RTX narrow rendering are so important. We can’t do it [by] brute force. We’ll need the help of AI to do that. We just demonstrated Ada with DLSS3, and the results are pretty insanely amazing.
The first part is generating worlds. The second is simulating the worlds. And then the third part is to be able to put that, the thing you were mentioning earlier about interactivity — we have to deal with the speed of light. We have to put a new type of data center around the world. I spoke about it at GTC and called it a GDN. Whereas Akamai came up with CDN, I think there’s a new world for this thing called GDN, a graphics distribution network. We demonstrated the effectiveness of it through augmenting our GeForce Now network. We have that in 100 regions around the world. By doing that we can have computer graphics, that interactivity that is essentially instantaneous. We’ve demonstrated that on a planetary scale, we can have interactive graphics down to tens of milliseconds, which is basically interactive.
And then the last part of it is how to do raytracing in an augmented way, an AR or VR way. Recently we’ve demonstrated that as well. The pieces are coming together. The engine itself, the database engine called OmniVerse Nucleus, the worlds that are either built by humans or augmented by AI, all the way to the simulation and rendering using AI, and then graphics, GDNs around the world, all the pieces we’re putting together are coming together. At GTC this time you saw us — we worked with a really cool company called ReMap. Their CEO has put together with us, from their design studio, publishing an auto-configurator all the way out to the world, literally with the press of a button. We published an interactive raytraced simulation of cars in every corner of the world instantly. I think the pieces are coming together. Now that Ada is in production, we just have to get Ada stood up in the public clouds of the world, stood up in companies around the world, and continue to build out our distributed GDNs. The software is going to be there. The computing infrastructure is going to be there. We’re pretty close.
Q: Given the inventory issues and physical supply chain issues — we’ve seen that with OmniVerse cloud you’re moving into SaaS. You already have GeForce Now. Do you foresee a point where you’re supplying the card as a service, rather than distributing the physical card anymore?
Huang: I don’t think so. There are customers who like to own. There are customers who like to rent. There are some things that I rent or subscribe to and some things I prefer to own. Businesses are that way. It depends on whether you like things capex or opex. Startups would rather have things in opex. Large established companies would rather have capex. It just depends on — if you use things sporadically you’d rather rent. If you’re fully loaded and using it all the time you’d rather just own it and operate it. Some people would rather outsource the factory.
Remember, AI is going to be a factory. It’s going to be the most important factory of the future. You know that because a factory has raw materials come in and something comes out. In the future the factories will have data come in, and what will come out is intelligence, models. The transformation of it is going to be energy. Just like factories today, some people would rather outsource their factory, and some people would rather own the factory. It depends on what business model you’re in.
It’s likely that we continue to build computers with HP and Dell and the OEMs around the world. We’ll continue to provide cloud infrastructure through the CSPs. But remember, Nvidia is a full stack accelerated computing company. Another way of saying it, I kind of said the same thing twice, but an accelerated computing company needs to be full stack. The reason for that is because there isn’t a magical thing you put into a computer and it doesn’t matter what application it is, it just runs 100 times faster. Accelerated computing is about understanding the application, the domain of the application, and re-factoring the entire stack so that it runs a lot faster.
And so accelerated computing, over the course of the last 25 years — we started with computer graphics, went into scientific computing and AI, and then into data analytics. Recently you’ve seen us in graph analytics. Over the years we’ve taken it across so many domains that it seems like the Nvidia architecture accelerates everything, but that’s not true. We accelerate. We just happen to accelerate 3,000 things. These 3,000 things are all accelerated under one architecture, so it seems like, if you put the Nvidia chip into your system, things get faster. But it’s because we did them one at a time, one domain at a time. It took us 25 years.
We had the discipline to stay with one architecture so that the entire software stack we’ve accelerated over time is accelerated by the new chips we build, for example Hopper. If you develop new software on top of our architecture, it runs on our entire installed base of 300, 400 million chips. It’s because of this discipline that’s lasted more than a couple of decades that what it appears to be is this magical chip that accelerates computing. What we’ll continue to do is put this platform out in every possible way into the world, so that people can develop applications for it. Maybe there’s some new quantum algorithms that we can develop for it so it’s prepared for cryptography in 10 or 20 years. Discovering new optimizations for search. New cybersecurity, digital fingerprinting algorithms. We want the platform to be out there so people can use it.
However there are three different domains where you’ll see us do more. The reason why we’ll do more is because it’s so hard to do that if I did it once myself, not only would I understand how to do it, but we can open up the pieces so other people can understand how to do it. Let me give you an example. Obviously you’ve seen us now take computer graphics all the way to the OmniVerse. We’ve built our own engine, our own systems. We took it all the way to the end. The reason for that is because we wanted to discover how best to do real-time raytracing on a very large data scale, fusing AI and brute force path tracing. Without OmniVerse we would have never developed that skill. No game developer would want to do it. We pushed in that frontier for that reason, and now we can open up RTX, and RTX DI and RTX GI and DLSS and we can put that into everyone else’s applications.
The second area you saw us do this was Drive. We built an end-to-end autonomous car system so I can understand how to build robotics from end to end, and what it means for us to be a data-driven company, an ML ops company in how you build robotics systems. Now we’ve built Drive. We’ve opened up all the pieces. People can use our synthetic data generation. They can use our simulators and so on. They can use our computing stack.
The third area is large language models. We built one of the world’s largest models, earliest, almost before anyone else did. It’s called Megatron 530B. It’s still one of the most sophisticated language models in the world, and we’ll put that up as a service, so we can understand ourselves what it means.
And then of course in order to really understand how to build a planetary-scale platform for metaverse applications — in particular we’ll focus on industrial metaverse applications. You have to build a database engine. We built OmniVerse Nucleus and we’ll put that in the cloud. There are a few applications where we think we can make a unique contribution, where it’s really hard. You have to think across the planet at data center scale, full stack scale. But otherwise we’ll keep the platforms completely open.
Q: I wanted to ask you a bit more about the China export control restrictions. Based on what you know about the criteria for the licenses at this point, do you anticipate all your future products beyond Hopper being affected by those, based on the performance and interconnect standards? And if so, do you have plans for China market specific products that will still comply with the rules, but that would incorporate new features as you develop them?
Huang: First of all, Hopper is not a product. Hopper is an architecture. Ampere isn’t a product. Ampere is an architecture. Notice that Ampere has A10, A10G, A100, A40, A30, and so on. Within Ampere there are, gosh, how many versions of products? Probably 15 or 20. Hopper is the same way. There will be many versions of Hopper products. The restrictions specify a particular combination of computing capability and chip to chip interconnection. It specifies that very clearly. Within that specification, under the envelope of that specification is a large space for us, for customers. In fact the vast majority of our customers are not affected by the specification.
Our expectation is that for the US and for China, we’ll have a large number of products that are architecturally compatible, that are within the limits, that require no licensing at all. However, if a customer would specifically like to have the limits that are specified by the restrictions or beyond, we have to go get a license for that. You could surmise that the goal is not to reduce or hamper our business. The goal is to know who it is that would need the capabilities at this limit, and give the US the opportunity to make a decision about whether that level of technology should be available to others.
Q: I had a recent talk with someone from a big British software developer diving into AI and the metaverse in general. We talked a bit about how AI can help with developing games and virtual worlds. Obviously there’s asset creation, but also pathfinding for NPCs and stuff like that. Regarding automotive, these technologies might be somewhat related to one another. You have situational awareness, something like that. Can you give us insight into how you think this might develop in the future?
Huang: When you saw the keynote, you’ll notice there were several different areas where we demonstrated pathfinding very specifically. When you watch our self-driving car, basically three things are happening. There are the sensors, and the sensors come into the computer. Using deep learning we can perceive the environment. We can perceive and then reconstruct the environment. The reconstruction doesn’t have to be exactly to the fidelity that we see, but it has to know its surroundings, the important features, where obstacles are, and where those obstacles will likely be in the near future. There’s the perception part of it, and then the second part, which is the world model creation. Within the world model creation you have to know where everything else is around it, what the map tells you, where you are within the world, and reconstructing that relative to the map and relative to everyone else. Some people call it localization and mapping for robotics.
The third part is path planning, planning and control. Planning and control has route planning, which has some AI, and then path planning, which is about wayfinding. The wayfinding has to do with where you want to go and where the obstacles are around you and how you want to navigate around it. You saw in the demo something called PathNet. You saw a whole bunch of lines that came out of the front of the cars. Those lines are essentially options that we are grading to see which one of those paths is the best path, the most safe and then the most comfortable, that takes you to your final destination. You’re doing wayfinding all the time. But second is ISAAC for robots. The wayfinding system there is a little bit more, if you will, unstructured in the sense that you don’t have lanes to follow. The factories are unstructured. There are a lot of people everywhere. Things are often not marked. You just have to go from waypoint to waypoint. Between the waypoints, again, you have to avoid obstacles, find the most efficient path, not block yourself in. You can navigate yourself into a dead end, and you don’t want that. There are all kinds of different algorithms to do path planning there.
The ISAAC path planning system, you could see that inside a game. There you could say, soldier, go from point A to point B, and those points are very far apart. In between point A and point B the character has to navigate across rocks and boulders and bushes, step around a river, those kinds of things. And so we would articulate, in a very human way. You saw ISAAC do that, and there’s another piece of AI technology you might have seen in the demo that’s called ASE. Basically it’s Adversarial Skill Embedding. It’s an AI that learned, by watching a whole bunch of humans, how to articulate in a human way from the prompts of words. You could say, walk forward to that stone, or walk forward to waypoint B. Climb the tree. Swing the sword. Kick the ball. From the phrases you can describe a human animation. I’ve just given you basically the pieces of AI models that allow us to take multiplayer games and have AI characters that are very realistic and easy to control. And so the future metaverse will have some people that are real, some people that are AI agents, and some that are avatars that you’ve entered into using VR or other methods. These pieces of technology are already here.
Q: How do you see the future of the autonomous driving business, since you’ve introduced your new chip for autonomous cars? Do you think it’s still in the early stage for this kind of business, or do you see some kind of wave coming up and sweeping the industry? Can you tell us about your strategic thinking in this area?
Huang: First of all, the autonomous car has two computers. There’s the computer in the data center for developing the data processing that’s captured in cars, turning that data into trained models, developing the application, simulating the application, regressing or replaying against all of your history, building the map, generating the map, reconstructing the map if you will, and then doing CIC and then OTM. That first computer is essentially a self-driving car, except it’s in the data center. It does everything that the self-driving car does, except it’s very large, because it collects data from the entire fleet. That data center is the first part of the self-driving car system. It has data processing, AI learning, AI training, simulation and mapping.
And then the second part is you take that whole thing and put it into the car, a small version of it. That small version is what we call in our company — Orin is the name of the chip. The next version is called Thor. That chip has to do data processing, which is called perception or inference. It has to build a world model. It has to do mapping. It has to do path planning and control.
And both of these systems are running continuously, two computers. Nvidia’s business is on both sides. In fact, you could probably say that our data center business for autonomous driving is even larger, definitely larger, and frankly, long-term, the larger of the two parts. The reason for that is because the software development for autonomous vehicles, no matter how many, will never be finished. Every company will be running their own stack. That part of the business is quite significant.
We created OmniVerse — the first customer for OmniVerse is DRIVE Sim, a digital twin of the fleet, of the car. DRIVE Sim is going to be a very significant part of our autonomous driving business. We use it internally. We’ll make it available for other people to use. And then in the car, there are several things philosophically that we believe. If you look at the way that people were building ADAS systems in the past, and you look at the way Nvidia built it, we invented a chip called Xavier, which is really the world’s first software programmable robotics chip. It was designed for high-speed sensors. It has lots of deep learning processors. It has Cuda in it for localization mapping and path planning and control. A lot of people, when I first introduced Xavier, said why would anybody need such a large SOC? It turns out that Xavier wasn’t enough. We needed more.
Orin is a home run. If you look at our robotics business right now, which includes self-driving cars and shuttles and trucks and autonomous systems of all kinds, our entire robotics business is running already larger than $1 billion a year. Orin is on its way — the pipeline is $11 billion now. My sense is that our robotics business is on its way to doubling in a year, and it’s going to be a very big part of our business. Our philosophy, which is very different from people in this area in the past, is that there are several different technologies that come together to make robotics possible. One of them, of course, is deep learning. We were the first to bring deep learning to autonomous driving. Before us it was really based on lidars. It was based on hand-tuned computer vision algorithms that were developed by engineers. We used deep learning because we felt that was the most scalable way of doing it.
Second, everything that we did was software-defined. You could update the software very easily, because there are two computers. There’s the computer in the data center developing the software, and then we deploy the software into the car. If you want to do that on a large fleet and move fast and improve software on the basis of software engineering, then you need a really programmable chip. Our philosophy around using deep learning and a fully software-defined platform was really a good decision. It took a little longer because it cost more. People had to learn how to develop the software for it. But I think at this point, it’s a foregone conclusion that everybody will use this approach. It’s the right way going forward. Our robotics business is on track to be a very large business. It already is a very large business, and it’s going to be much bigger.
Q: On the AI generation you mentioned for Ada, which is not just generating new pixels, but now whole new frames, with the different sources that we have for AI-generated images, we see DALL-E and all these different algorithms blowing up on the internet. For video games, it may not be the best use case for that. But how can any other side of creation — you have technologies like broadcast and things focused on creators. How can other users besides game developers make use of that AI technology to generate new images, to export new frames, to stream at new framerates? Have you been studying that approach to making more use of that AI technology?
Huang: First of all, the ability to synthesize computer graphics at very high framerates using path tracing — not offline lighting, not pre-baked lighting, but everything synthesized in real time — is very important. The reason for that is it enables user-generated content. Remember, I mentioned in the keynote that nine of the world’s top 10 video games today were mods at one time. It was because somebody took the original game and modified it into an even more fun game, into a MOBA, into a five-on-five, into a PUBG. That required fans and enthusiasts to modify a particular game. That took a lot of effort.
I think that in the future, we’re going to have a lot more user-generated content. When you have user-generated content, they simply don’t have the large army of artists to put up another wall or tear down this other wall or modify the castle or modify the forest or do whatever they want to do. Whenever you modify those things, these structures, the world, then the lighting system is no longer accurate. Using Nvidia’s path tracing system and doing everything in real time, we made it possible for every lighting environment to be right, because we’re simulating light. No pre-baking is necessary. That’s a very big deal. In fact, if you combine RTX and DLSS 3 with OmniVerse — we’ve made a version of OmniVerse called RTX Remix for mods. If you combine these ideas, I believe user-generated content is going to flourish.
When you say user-generated worlds, what is that? People will say that’s the metaverse, and it is. The metaverse is about user-generated, user-created worlds. And so I think that everybody is going to be a creator someday. You’ll take OmniVerse and RTX and this neural rendering technology and generate new worlds. Once you can do that, once you can simulate the real world, the question is, can you use your own hands to create the whole world? The answer is no. The reason for that is because we have the benefit in our world of mother nature to help us. In virtual worlds we don’t have that. But we have AI. We’ll simply say, give me an ocean. Give me a river. Give me a pond. Give me a forest. Give me a grove of palm trees. You describe whatever you want to describe and AI will synthesize, right in front of you, the 3D world. Which you can then modify.
This world that I’m describing requires a new way of doing computer graphics. We call it neural rendering. The computing platform behind it we call RTX. It’s really about, number one, making video games, today’s video games, a lot better. Making the framerate higher. Many of the games today, because the worlds are so big, they’ve become CPU limited. Using frame generation in DLSS 3 we can improve the framerates still, which is pretty amazing. On the other hand this whole world of user-generated content is the second. And then the third is the environment that we’re in today.
This video conference that we’re in today is rather archaic. In the 1960s video conferencing was really created. In the future, video conferencing will not be encode and decode. In the future it will be perception and generation. Perception and generation. Your camera will be on your side to perceive you, and then on my side it will be generating. You can control how that generation is done. As a result everybody’s framerate will be better. Everybody’s visual quality will be better. The amount of bandwidth used will be tiny, just a little tiny bit of bandwidth, maybe in kilobits per second, not megabits. The ability for us to use neural rendering for video conferencing is going to be a very exciting future. It’s another way of saying telepresence. There are a whole lot of different applications for it.
Q: I noticed in the presentation that there was no NVlink connector on the cards. Is that completely gone for Ada?
Huang: There is no NVlink on Ada. The reason why we took it out is because we needed the I/Os for something else. We used the I/Os and the area to cram in as much AI processing as we could. And also, because Ada is based on PCIe Gen 5, we now have the ability to do peer-to-peer across Gen 5 that’s sufficiently fast that it was a better tradeoff. That’s the reason.
Q: Back to the trade issue, do you have a big-picture philosophy about trade restrictions and their potential for disrupting innovation?
Huang: Well, first of all, there needs to be fair trade. That’s questionable. There needs to be national security. That’s always a concern. There are a lot of things that maybe somebody knows that we don’t know. However, nothing could be absolute. There just have to be degrees. You can’t have open, completely open unfair trade. You can’t have completely unfettered access to technology without concern for national security. But you can’t have no trade. And you can’t have no business. It’s just a matter of degrees. The limitations and the licensing restrictions that we’re affected by give us plenty of room to continue to conduct business in China with our partners. It gives us plenty of room to innovate and continue to serve our customers there. In the event that the most extreme examples and use of our technology is needed, we can go seek a license.
From my perspective, the restriction is no different than any other technology restriction that’s been placed on export control. Many other technology restrictions exist on CPUs. CPUs have had restrictions for a very long time, and yet CPUs are widely used around the world, freely used around the world. The reason why we had to disclose this is because it came in the middle of the quarter, and it came suddenly. Because we’re in the middle of the quarter we thought it was material to investors. It’s a significant part of our business. To others that were affected, it wasn’t a significant part of their business, because accelerated computing is still rather small outside of Nvidia. But to us it was a very significant part of our business, and so we had to disclose. But the restrictions themselves, with respect to serving customers based on the Ampere and Hopper architectures, we have a very large envelope to innovate and to serve our customers. From that perspective, I’m not at all concerned.
Q: 4000 is finally here, which for you I’m sure feels like a huge launch. The reaction universally I am seeing out there is, oh my God, it costs so much money. Is there anything you would like to say to the community regarding pricing on the new generation of parts? Can they expect to see better pricing at some point? Basically, can you address the loud screams I’m seeing everywhere?
Huang: First of all, a 12” wafer is a lot more expensive today than it was yesterday. It’s not a little bit more expensive. It is a ton more expensive. Moore’s Law is dead. The ability for Moore’s Law to deliver twice the performance at the same cost, or the same performance [for] half the cost in every year and a half, it’s over. It’s completely over. The idea that the chip is going to go down in cost over time, unfortunately, is a story of the past. The future is about accelerated full stack. You have to come up with new architectures, come up with as good a chip design as you can, and then of course computing is not a chip problem. Computing is a software and a chip problem. We call it a full stack challenge. We innovate across the full stack.
For all of our gamers out there, here’s what I’d like you to remember and to hopefully notice. At the same price point, based on what I just said earlier, even though our costs, our materials costs are greater than they used to be, the performance of Nvidia’s $899 GPU or $1599 GPU a year ago, two years ago — our performance with Ada Lovelace is monumentally better. Off the charts better. That’s really the basis to look at it. Of course, the numbering system is just a numbering system. If you go back, 3080 compared to 1080 compared to 980 compared to 680 compared to 280, all the way back to the 280 — a 280, obviously, was a lot lower price in the past.
Over time, we have to create in order to pursue advances in computer graphics on the one hand, deliver more value at the same price point on the other hand, expand deeper into the market as well with lower and lower priced solutions — if you look at our track record, we’re doing all three all the time. We’re pushing the new frontiers of computer graphics further into new applications. Look at all the great things that have happened as a result of advancing GeForce. But at the same price point, our value delivered generationally is off the charts, and it remains off the charts this time. If they could just remember the price point, compare price point to price point, they’ll find that they’ll love Ada.
Q: You talked about everything you’re planning, the big expectations you have from the robotics business. Are there any things that keep you up at night business-wise, that could endanger your business and how it is going at the moment? Are there things you see as challenges you have to cope with?
Huang: This year, I would say that the number of external environmental challenges to the world’s industries is extraordinary. It started with COVID. Then there were supply chain challenges. Then there are entire supply chain shutdowns in China. Entire cities being locked down week to week. More supply chain challenges. All of a sudden, a war in Europe. Energy costs going up. Inflation going sky high. I don’t know. Anything else that can go wrong? However, those things don’t keep me up at night, because they’re out of our control. We try to be as agile as we can, make good decisions.
Three or four months ago we made some very good decisions as we saw the PC market start to slow down overall. When we saw the sell-through, because of inflation, starting to cause the consumer market to slow down, we realized that we were going to have too much inventory coming to us. Our inventory and our supply chain started at the later part of last year. Those wafers and those products are coming at us. When I realized that the sell-through was going to be limited, instead of continuing to ship, we shut ourselves down. We took two quarters of hard medicine. We sold into our customers, into the world, a lot lower than what was selling out of the channel. The channel, just the desktop gaming channel, call it $2.5 billion a quarter. We sold in a lot less than that in Q2 and Q3. We got ourselves prepared, got our channel prepared and our partners prepared, for the Ada launch.
I would say the things we can do something about, we try to make good decisions. The rest of it is continuing to innovate. During this incredible time we built Hopper. We invented DLSS 3. We invented neural rendering. We built OmniVerse. Grace is being built. Orin is being ramped. In the midst of all this we’re working on helping the world’s companies reduce their computing costs by accelerating them. If you can accelerate Hopper, Hopper can accelerate computing by a factor of five times for large language models. Even though you have to add Hopper to the system, the TCO is still improved by a factor of three. How do you improve TCO by a factor of three at the end of Moore’s Law? It’s pretty amazing, incredible results, helping customers save money while we invent new ideas and new opportunities for our customers to reinvent themselves. We’re focused on the right things. I’m certain that all of these challenges, environmental challenges, will pass, and then we’ll go back to doing amazing things. None of that keeps me up at night.
Q: You have started shipping H100. That’s great news for you. The big ramp from the spring. But with Lovelace now out, I’m curious. Are we going to see an L100? Can you provide any guidance on how you’re going to divvy up those two architectures this time around?
Huang: If you look at our graphics business, let’s go all the way back to Turing. During the Turing time — this is only two generations ago, or about four or five years ago — our core graphics business was basically two segments. One of them is desktop PCs, desktop gaming, and the other was workstations. Those were really the two. Desktop workstations and desktop gaming systems. The Ampere generation, because of its incredible energy efficiency, opened up a whole bunch of notebook business. Thin and light gaming systems, thin and light workstations became a real major driving force. In fact, our notebook business is quite large, almost proportionally very similar to our desktop business, or close to it. During the Ampere generation, we were also quite successful at taking it into the cloud, into the data center. It’s used in the data center because it’s ideal for inference. The Ampere generation saw great success for inference GPUs.
This generation you’re going to see several things. There are some new dynamics happening, long-term trends that are very clear. One of them has to do with cloud graphics. Cloud gaming is, of course, a very real thing now around the world. In China cloud gaming is going to be very large. There are a billion phones that game developers don’t know how to serve anymore. They make perfectly good connections, but the graphics are so poor that they don’t know how to take a game built for a modern iPhone 14 and have it run on a phone that’s five years old, because the technology has moved forward so fast. There’s a billion phones installed in just China. In the rest of the world I would think there’s a similar number of phones. Game developers don’t know how to serve those anymore with modern games. The best way to solve that is cloud gaming. You can reach integrated graphics. You can reach mobile devices and so on.
If you could do that for cloud gaming, then you can obviously do that for streaming applications that are graphics-intensive. For example, what used to be workstation applications that would run on PCs, in the future they’ll just be SaaS that streams from the cloud. The GPU will be one of the— currently it’s A4s, A40s, A10s. Those Ampere GPUS will be streaming graphics-intensive applications. And then there’s the new one that’s quite important, and that’s augmented reality streaming to your phone. Short-form videos, image enhancement of videos, maybe re-posing, so that your eyes are making eye contact with everybody. Maybe it’s just a perfectly beautiful photograph and you’re animating the face. Those kinds of augmented reality applications are going to use GPUs in the cloud. In the Ada generation, we’re going to see probably the largest installation using graphics-intensive GPUs in the cloud for AI, graphics, computer vision, streaming. It’s going to be the universal accelerator. That’s definitely going to come. In fact, I didn’t call it L100, I called it L40. L40 is going to be our high-end Ada GPU. It’s going to be used for OmniVerse, for augmented reality, for cloud graphics, for inference, for training, for all of it. L40 is going to be a phenomenal cloud graphics GPU.
Q: It seems like a big part of the stuff you’re releasing, the car side, the medical side — it feels like very few people are in AI safety. It seems like it’s more hardware accelerated. Can you talk about the importance of AI safety?
Huang: It’s a large question. Let me break it down into a few parts, just as a starting point. There’s trustworthy AI questions in general. But even if you developed an AI model that you believe you trust, that you trained with properly curated data, that you don’t believe is overly biased or unnecessarily biased or undesirably biased — even if you came up with that model, in the context of safety, you want to have several things. The first thing is you want to have diversity and redundancy. One example would be in the context of a self-driving car. You want to observe where there are obstacles, but you also want to observe where there is the absence of obstacles, what we call a free space. Obstacles to avoid, free space that you can drive through. These two models, if overlaid on top of each other, give you diversity and redundancy.
We do that in companies. We do that in the medical field. It’s called multimodality and so forth. We have diversity in algorithms. We have diversity in compute, so that we do processing in two different ways. We do diversity using sensors. Some of it comes from cameras. Some of it comes from radar. Some of it comes from structure for motion. Some of it comes from lidar. You have different sensors and different algorithms, and then different compute. These are layers of safety.
And then the next part is, let’s suppose you design a system that you know to be active safety capable. You believe it’s resilient in that way. How do you know that it’s not tampered with? You designed it properly, but somebody came in and tampered with it and caused it to not be safe. We have to make sure that we have a technology called confidential computing. Everything from booting up the system, so that measure at boot that nobody tampered, to encrypting the model and making sure it wasn’t tampered with, to processing the software in a way that you can’t probe it and change it. Even that is affected. And then all the way back to the methodology of developing software.
Once you certify and validate a full stack to be safe, you want to make sure that all the engineers in the company and everybody contributing to it are contributing to the software and improving the software in a way that retains its ability to remain certified and remain safe. There’s the culture. There’s the tools used. There are methodologies. There are standards for documentation and coding. Everything from — I just mentioned tamper-proof in the car. The data center is tamper-proof. Otherwise somebody could tamper with the model in the data center just before we OTA the model to the car. Anyway, active safety, safety design into software, and safety design into AI is a very large topic. We dedicate ourselves to doing this right.
Q: Nvidia had pre-ordered production capacity from TSMC further in advance than normal due to the shortages we were experiencing. Do AIBs also have to pre-order GPU supply that far in advance? With the reduction you’ve seen in prices, like the 3080ti, 3090ti, are there rebates, incentives with any of those prices that AIBs can take advantage of?
Huang: Last year the supply chain was so challenged. Two things happened. One thing is the lead times extended. Lead times used to be about four months from placing a PO on the wafer starts to the time you would ship the products. Maybe slightly longer. Sixteen weeks? It extended all the way to a year and a half. It’s not just the wafer starts. You have substrates to deal with, voltage regulators, all kinds of things in order for us to ship a product. It includes a whole bunch of system components. Our cycle time extended tremendously, number one. Number two, because everything was so scarce, you had to secure your allocation in advance, which then causes you to further secure allocation by probably about a year. Somewhere between normal operating conditions of four months to all of a sudden about two years or so of having to arrange for this. And we were growing so fast. Our data center business was growing nearly 100 percent each year. That’s a multi-billion-dollar business. You can just imagine, between our growth rate and the additional cycle time, how much commitment we had to place. That’s the reason why we had to make the hard decision as demand slowed down, particularly among consumers, to really dramatically slow down shipments and let the channel inventory take care of itself.
With respect to AIBs, the AIBs don’t have to place lead time orders. We ordered the components no matter what. Our AIBs are agile. We carried the vast majority of the inventory. When the market was really hot, the channel, our selling price was all exactly the same. It never moved a dollar. Our component costs kept going up, as people knew last year, but we absorbed all the increases in cost. We passed zero dollars forward to the market. We kept all of our product prices exactly at the MSRP we launched at. Our AIBs had the benefit of creating different SKUs that allowed them to capture more value. The channel, of course, the distributors and retailers, benefited during the time when the product was hot.
When the demand slowed, we took the action to create marketing, what we call marketing programs. But basically discount programs, rebate programs, that allowed the pricing in the market to come back to a price point that we felt, or the market felt, would ultimately sell through. The combination of the commitments that we made, which led to you — you guys saw that we wrote down about a billion dollars worth of inventory. Secondarily, we put a few hundred million dollars into marketing programs to help the channel reset its price. Between these two actions that we took a few months ago, we should be in a good spot in Q4 as Ada ramps hard. I’m looking forward to that. Those decisions were painful, but they were necessary. It’s six months of hardship, and hopefully after that we can move on.
Q: I was wondering if you could address why there wasn’t an RTX 4070, and if a 4070 will arrive. Are you telling consumers to buy a 3000 series card instead?
Huang: We don’t have everything ready to roll everything out at one time. What we have ready is 4090 and 4080. Over time we’ll get other products in the lower end of the stack out to the market. But it’s not any more complicated than — we usually start at the high end, because that’s where the enthusiasts want to refresh first. We’ve found that 4080 and 4090 is a good place to start. As soon as we can we’ll move further down the stack. But this is a great place to start.
Q: What are your thoughts on EVGA halting its production of graphics cards from the RTX 40 series onward? Was Nvidia in close discussion with EVGA as they came to this decision?
Huang: Andrew wanted to wind down the business. He’s wanted to do that for a couple of years. Andrew and EVGA are great partners and I’m sad to see them leave the market. But he has other plans and he’s been thinking about it for several years. I guess that’s about it. The market has a lot of great players. It will be served well after EVGA. But I’ll always miss them. They’re an important part of our history. Andrew is a great friend. It was just time for him to go do something else.
Q: What would you say to the Jensen of 30 years ago?
Huang: I would say to follow your dreams, your vision, your heart, just as we did. It was very scary in the beginning, because as you probably know from our history, we invented the GPU. At the time that we invented the GPU, there was no application for GPUs. Nobody cared about GPUs. At the time we came into the world to build a platform for video games, the video game market was tiny. It barely existed. We spoke about video games completely in 3D, and there weren’t even 3D design tools. You had to create 3D games practically by hand. We talked about a new computing model, accelerated computing, which was the foundation of our company in 1993. That new method of computing was so much work, nobody believed in it. Now, of course, I had no choice but to believe in it. It was our company and we wanted to make it successful. We pursued it with all of our might.
Along the way, slowly but surely, one customer after another, one partner after another, and one developer after another, the GPU became a very important platform. Nvidia invented programmable shading, which now defines modern computer graphics. It led us to invent RTX, to invent Cuda, to develop modern accelerated computing. It led us to AI. It led us to all the things we’re talking about today. All of it, every step of the way, without exception, nobody believed in it. GPU, programmable shading, Cuda, even deep learning. When I brought deep learning to the automotive industry everyone thought it was silly. In fact, one of the CEOs said, “You can’t even detect a German dog. How can you detect pedestrians?” They wrote us off. Deep learning at the time was not perfect, but today it’s of course reached superhuman capabilities.
The advice I would give a young Jensen is to stick with it. You’re doing the right thing. You have to pursue what you believe. You’re going to have a lot of people who don’t believe in it in the beginning, but not because they don’t believe you. It’s just because it’s hard to believe sometimes. How would anybody believe that the same processor that was used for playing Quake would be the processor that modernized computer science and brought AI to the world? The same processor we’re using for Portal turned out to be the same one that led to self-driving cars. Nobody would have believed it. First, you have to believe it, and then you have to help other people believe it. It could be a very long journey, but that’s okay.
GamesBeat's creed when covering the game industry is "where passion meets business." What does this mean? We want to tell you how the news matters to you -- not just as a decision-maker at a game studio, but also as a fan of games. Whether you read our articles, listen to our podcasts, or watch our videos, GamesBeat will help you learn about the industry and enjoy engaging with it. Discover our Briefings.