Graphcore claims its M2000 AI computer hits 1 petaflop

Graphcore, a U.K.-based company developing accelerators for AI workloads, this morning unveiled the second generation of its Intelligence Processing Units (IPUs), which will soon be made available in the company's M2000 IPU Machine. Graphcore claims this new GC200 chip will enable the M2000 to achieve a petaflop of processing power in an enclosure that measures the width and length of a pizza box.

AI accelerators like the GC200 are a type of specialized hardware designed to speed up AI applications, particularly artificial neural networks, deep learning, and machine learning. They're often multicore in design and focus on low-precision arithmetic or in-memory computing, both of which can boost the performance of large AI algorithms and lead to state-of-the-art results in natural language processing, computer vision, and other domains.

The M2000 is powered by four of the new 7-nanometer GC200 chips, each of which packs 1,472 processor cores (running 8,832 threads) and 59.4 billion transistors on a single die, and it delivers more than 8 times the processing performance of Graphcore's existing IPU products. In benchmark tests, the company claims the four-GC200 M2000 ran an image classification model -- Google's EfficientNet B4 with 88 million parameters -- more than 32 times faster than an Nvidia V100 -based system and over 16 times faster than the latest 7-nanometer graphics card. A single GC200 can deliver up to 250 TFLOPS, or one trillion floating-point operations per second.

Beyond the M2000, Graphcore says customers will be able to connect as many as 64,000 GC200 chips for up to 16 exaflops of computing power and petabytes of memory, supporting AI models with theoretically trillions of parameters. That's made possible by Graphcore's IP-Fabric interconnection technology, which supports low-latency data transfers up to rates of 2.8Tbps and directly connects with IPU-based systems (or via Ethernet switches).

The GC200 and M2000 are designed to work with Graphcore's bespoke Poplar, a graph toolchain optimized for AI and machine learning. It integrates with Google's TensorFlow framework and the Open Neural Network Exchange (an ecosystem for interchangeable AI models), in the latter's case providing a full training runtime. Preliminary compatibility with Facebook's PyTorch arrived in Q4 2019, with full feature support following in early 2020. The newest version of Poplar -- version 1.2 -- introduced exchange memory management features intended to take advantage of the GC200's unique hardware and architectural design with respect to memory and data access.

Graphcore, which was founded in 2016 by Simon Knowles and Nigel Toon, has raised over $450 million to date from Robert Bosch Venture Capital, Samsung, Dell Technologies Capital, BMW, Microsoft, and AI luminaries Arm cofounder Hermann Hauser and DeepMind cofounder Demis Hassabis at a $1.95 billion valuation. Its first commercial product was a 16-nanometer PCI Express card -- C2 -- that became available in 2018, and it's this package that launched on Microsoft Azure in November 2019. (Microsoft is also using Graphcore's products internally for various AI initiatives.)

Earlier this year, Graphcore announced the availability of the DSS8440 IPU Server in partnership with Dell and launched Cirrascale IPU-Bare Metal Cloud, an IPU-based managed service offering from cloud provider Cirrascale. More recently, Graphcore revealed some of its other early customers -- among them Citadel Securities, Carmot Capital, the University of Oxford, J.P. Morgan, Lawrence Berkeley National Laboratory, and European search engine company Qwant -- and open-sourced on GitHub libraries for building and executing apps on IPUs.

Graphcore might have momentum on its side, but it's got competition in a market that's anticipated to reach $91.18 billion by 2025. In March, Hailo, a startup developing hardware designed to speed up AI inferencing at the edge, nabbed $60 million in venture capital. California-based Mythic has raised $85.2 million to develop custom own in-memory architecture. Mountain View-based Flex Logix in April launched an inference coprocessor it claims delivers up to 10 times the throughput of existing silicon. And last November, Esperanto Technologies secured $58 million for its 7-nanometer AI chip technology.