Graphcore's AI accelerator chips launch on Microsoft Azure

Graphcore, a U.K.-based company developing accelerators for AI workloads, this morning announced a milestone: Its Intelligence Processing Units (IPUs) have launched on Azure. It marks the first time a large-scale cloud vendor -- Microsoft -- has made publicly available Graphcore's chips.

IPUs on Azure are open for customer sign-up, Graphcore says, with access prioritized for those "focused on pushing the boundaries of [natural language processing]" and "developing new breakthroughs in machine intelligence."

By way of refresher, Graphcore -- which was founded in 2016 by Simon Knowles and Nigel Toon -- has raised $310 million to date from Robert Bosch Venture Capital, Samsung, Dell Technologies Capital, Amadeus Capital Partners, C4 Ventures, Draper Esprit, Foundation Capital, Pitango Capital, and AI luminaries Arm cofounder Hermann Hauser and DeepMind cofounder Demis Hassabis at a $1.5 billion valuation. Its first commercial product was a 16-nanometer PCI Express card -- C2 -- that became available in 2018, and it's this package that's launching on Azure.

"Microsoft and Graphcore have been collaborating closely for over two years. Over this period, the Microsoft team, led by Marc Tremblay, distinguished engineer, has been developing systems for Azure and has been enhancing advanced machine vision and natural language processing models on IPUs," said Toon. "We have been working extensively with a number of leading early-access customers and partners for some time to ensure that [these products are] ready for general release."

The C2 features two interlinked Colossus IPUs, each of which pack 16 cores and 23.6 billion transistors. A single chip's 1,216 IPU can hit over 100 GFLOPS per core (where one GFLOP equals one billion floating point operations per second) paired with 300MB of memory, and run up to 10,000 programs executing in parallel. The per-chip memory bandwidth is 45TB/s, giving the C2 a whole-card bandwidth of 90TB/s -- a theoretical maximum Graphcore claims is 100 times higher than that of HBM2 graphics chip memory.

The C2 is designed to work with Graphcore's bespoke Poplar, a graph tool chain designed for AI and machine learning. It integrates with Google's TensorFlow framework and the Open Neural Network Exchange (an ecosystem for interchangeable AI models), in the latter's case providing a full training runtime, and preliminary compatibility with Facebook's PyTorch is anticipated to arrive in Q4 2019 with full feature support to follow in early 2020.

In a testament to the IPUs' efficiency, Graphcore says that it and Microsoft developers achieved state-of-the-art performance and accuracy with Google's Bidirectional Encoder Representations from Transformers (BERT), a language model that learns relationships between sentences by pretraining on a set of tasks. They trained one BERT variant (BERT Base) in 56 hours with a single IPU server packing eight C2 cards, and they claim customers have seen on average 3 times higher inferencing throughput and an over 20% improvement in latency.

"Natural language processing models are hugely important to Microsoft -- to run our internal AI workloads and for our AI customers on Azure," said Microsoft technical fellow Doug Burger. "We are extremely excited by the potential that this new collaboration on processors with Graphcore will deliver for our customers. The Graphcore offering extends Azure's capabilities and our efforts here form part of our strategy to ensure that Azure remains the best cloud for AI."

On the image recognition side of the equation, Graphcore says that European search engine Qwant managed to achieve gains running Facebook's modular ResNext architecture on IPUs. As Graphcore explains, ResNext comprises repeating blocks that aggregate sets of transformations the IPUs "efficiently" support. Qwant and Graphcore report 3.5 times higher performance in image searches and up to 77 times faster throughput for group convolutions (i.e., cross-correlations in signal and image processing).

"We are now extremely pleased that we are making Graphcore technology commercially available to a wider group of customers," added Toon. "We are looking forward to supporting innovators achieve the next great breakthroughs in machine intelligence on IPUs."

The launch of Graphcore's chips on Azure comes a week after Untether AI, a Toronto-based startup that's developing high-efficiency chips for AI inferencing workloads, nabbed $20 million in venture capital. California-based Mythic has raised $85.2 million to develop custom own in-memory architecture, and there's no shortage of adjacent rivals in a market that's anticipated to reach $91.18 billion by 2025.

San Francisco-based startup AI Storm earlier this year closed a $13.2 million round for its family of AI edge computing chips, and Mountain View-based Flex Logix in April launched an inference coprocessor it claims delivers up to 10 times the throughput of existing silicon. Yet another competitor -- Xnor.ai -- recently debuted an always-on solar-powered device capable of accelerating state-of-the-art machine learning algorithms. And last November, Esperanto Technologies secured $58 million for its 7-nanometer AI chip technology.

Graphcore also announced today that its IPUs are being integrated with Dell server rack technology, which means enterprise customers will be able to build machine intelligence compute on their own premises. More details will be announced at next week's Supercomputing conference in Denver.