Google details new AI accelerator chips

At Google I/O 2021, Google today formally announced its fourth-generation tensor processing units (TPUs), which the company claims can complete AI and machine learning training workloads in close-to-record wall clock time. Google says that clusters of TPUv4s can surpass the capabilities of previous-generation TPUs on workloads including object detection, image classification, natural language processing, machine translation, and recommendation benchmarks.

TPUv4 chips offers more than double the matrix multiplication TFLOPs of a third-generation TPU (TPUv3), where a single TFLOP is equivalent to 1 trillion floating-point operations per second. (Matrices are often used to represent the data that feeds into AI models.) It also offers a "significant" boost in memory bandwidth while benefiting from unspecified advances in interconnect technology. Google says that overall, at an identical scale of 64 chips and not accounting for improvement attributable to software, the TPUv4 demonstrates an average improvement of 2.7 times over TPUv3 performance.

Google's TPUs are application-specific integrated circuits (ASICs) developed specifically to accelerate AI. They're liquid-cooled and designed to slot into server racks; deliver up to 100 petaflops of compute; and power Google products like Google Search, Google Photos, Google Translate, Google Assistant, Gmail, and Google Cloud AI APIs. Google announced the third generation in 2018 at its annual I/O developer conference and this morning took the wraps off the successor, which is in the research stages.

Cutting-edge performance

TPUv4 clusters -- or "pods" -- total 4,096 chips interconnected with 10 times the bandwidth of most other networking technologies, according to Google. This enables a TPUv4 pod to deliver more than an exaflop of compute, which is equivalent to about 10 million average laptop processors at peak performance

"This is a historic milestone for us -- previously to get an exaflop, you needed to build a custom supercomputer," Google CEO Sundar Pichai said during a keynote address. "But we already have many of these deployed today and will soon have dozens of TPUv4 four pods in our datacenters, many of which will be operating at or near 90% carbon-free energy."

This year's MLPerf results suggest Google's fourth-generation TPUs are nothing to scoff at. When tasked with training a BERT model on a large Wikipedia corpus, training took 1.82 minutes with 256 fourth-gen TPUs -- only slightly slower than the 0.39 minutes it took with 4,096 third-gen TPUs. Meanwhile, achieving a 0.81-minute training time with Nvidia hardware required 2,048 A100 cards and 512 AMD Epyc 7742 CPU cores.

Google says that TPUv4 pods will be available to cloud customers starting later this year.

Cutting-edge performance

More