Nvidia wants the artificial intelligence and deep learning markets badly. It launched its 15-billion transistor Tesla P100 chip in April for deep-learning applications. And now it is announcing two more deep-learning Tesla chips today at an event in China.
Jen-Hsun Huang, CEO of Nvidia, announced the Tesla P4 and Tesla P40 graphics processing units (GPU) at the GPU Technology conference in Beijing. He also announced the company’s TensorRT and DeepStream software to boost A.I. for video inferencing as well. The announcements show that A.I. and deep learning are driving the high end of chip development like never before. They will enable A.I.-based services such as voice-activated assistance, email spam filters, and movie and product recommendation engines.
The new chips are based on the Nvidia Pascal architecture, which started debuting for consumer and business markets this spring. Ian Buck, general manager of accelerated computing at Nvidia, said in an interview with VentureBeat that the new chips give customers several options for how they attack deep learning processing challenges. He said these chips deliver “massive leaps in efficiency and speed.”
Deep learning tasks are divided into training, where a deep learning neural network is trained to recognize patterns, and inferencing, where the trained neural network actually identifies images.
The Tesla P100 focuses on training tasks. But the 7.2 billion-transistor Tesla P4 (with 2,560 CUDA cores) and the 12-billion transistor Tesla P40 (3,840 CUDA cores) are designed to recognize speech, images, or text in response to queries from users and devices.
The market is changing fast. Buck said that neural networks require 10 times more computing power than just a year ago. And he said that current central processing unit (CPU) technology isn’t capable of keeping up in real-time responsiveness for A.I. Intel, of course, disagrees, and it notes that GPU computing is only 3 percent of the current server market. Intel has been beefing up its own A.I. computing technology with a couple of acquisitions, and it is working on a new version of its A.I.-focused Xeon Phi CPUs. Nvidia said its response time is 45 times faster than CPU solutions such as Intel’s latest Broadwell chip, and its new Tesla chips are four times more powerful than graphics chips that came out last year.
“A.I. is everywhere these days,” Buck said. “Google said that one in five searches are now initiated by voice. A.I. is in everything from Skype [having] automatic translation in real time to predicting what treatment babies need in hospitals. It can enable the blind to use sensors that detect the emotions on the faces of people around them, or it can help find a lost child at the mall.”
Buck said the Tesla P4 delivers the highest energy efficiency for hyperscale data centers. It fits in any server with its small form factor, and its low-power design, which starts at 50 watts, helps to make it 40 times more energy efficient than CPUs for inferencing workloads. A server with a single Tesla P4 replaces 13 CPU servers for video inferencing workloads, Nvidia said. It can deliver eight times savings in total cost of ownership, Nvidia said. The P40 uses 250 watts.
Meanwhile, the Tesla P40 delivers maximum throughput for deep learning workloads. With 47 tera-operations per second (TOPS) of inference performance with INT8 instructions, a server with eight Tesla P40 accelerators can replace the performance of more than 140 CPU servers, Nvidia said, At approximately $5,000 per server, this results in savings of more than $650,000 in server acquisition cost, Nvidia said. And it can slash training time from hours to days.
“The P4 fits in any servers and delivers higher efficiency and performance on 50 or 75 watts of power,” Buck said. “The P40 has the highest-performance throughput for scale-up servers.”
Complementing the Tesla P4 and P40 are two software innovations to accelerate AI inferencing: the Nvidia TensorRT and the Nvidia DeepStream software development kit (SDK).
TensorRT is a library created for optimizing deep learning models for production deployment that delivers instant responsiveness for the most complex networks. It maximizes throughput and efficiency of deep learning applications by taking trained neural nets — usually in 32-bit or 16-bit data — and optimizing them for reduced-precision INT8 operations.
The DeepStream SDK taps into the power of a Pascal server to simultaneously decode and analyze up to 93 HD video streams in real time, compared to seven streams with dual CPUs. That can help tackle one of the grand challenges of A.I.: understanding huge amounts of video content for applications such as self-driving cars, interactive robots, filtering, and ad placement.
“Delivering simple and responsive experiences to each of our users is very important to us,” said Greg Diamos, senior researcher at Baidu, in a statement. “We have deployed Nvidia GPUs in production to provide AI-powered services such as our Deep Speech 2 system and the use of GPUs enables a level of responsiveness that would not be possible on un-accelerated servers. Pascal with its INT8 capabilities will provide an even bigger leap forward and we look forward to delivering even better experiences to our users.”
The Tesla P4 will come out in November, and Tesla P40 will be out in October. Nvidia’s partners include Dell, Hewlett Packard, Inspur, Inventec, and Lenovo.