Graphcore claims its IPU-POD outperforms Nvidia A100 in model training

Bristol-headquartered Graphcore, a startup developing chips and systems to accelerate AI workloads, appears to be taking on category-leader Nvidia with significant improvements in performance and efficiency.

In the latest MLPerf metrics, Graphcore said its IPU-POD16 server easily managed to outperform Nvidia’s DGX-A100 640GB server. Specifically, when systems were tested to train computer vision model RESNET-50, Graphcore’s unit did the job almost a minute faster. It took 28.3 minutes to train the model, while DGX100 took 29.1 minutes.

Significant time-to-train improvement

The numbers, Graphcore said, represent a 24% jump over the last MLPerf results and can be directly attributed to software optimization. For IPU-POD64, the performance gain was 41%, with the system training RESNET-50 in just 8.50 minutes. Meanwhile, IPU-POD128 and IPU-POD256 -- the flagship scale-up systems from Graphcore -- took just 5.67 minutes and 3.79 minutes to train RESNET-50.

The MLPerf benchmark is maintained by the MLCommons Association, a consortium backed by Alibaba, Facebook AI, Google, Intel, Nvidia, and others that act as an independent steward.

The results also detailed the Graphcore system’s ability to handle natural language processing (NLP) workloads. During the test on NLP model BERT, IPU-POD16’s time-to-train stood at 26.05 minutes in MLPerf’s open category (with flexibility in model implementation), while POD64 and POD128 took just 8.25 and 5.88 minutes, respectively.

However, when compared to the last MLPerf benchmarks, the performance gains on BERT were not as high as those seen in the case of RESNET-50.

Graphcore also tested its systems on other workloads to demonstrate how it would handle new, innovative models that customers are exploring to go beyond RESNET and BERT. Part of this was an experiment with EfficientNet B4, a computer vision model that trained in just 1.8 hours on the company’s IPU-POD256. On IPU-POD16, the same model was trained in 20.7 hours -- more than three times faster than Nvidia DGX A100.

The development positions Graphcore as a major rival for Nvidia, which is already shipping machines to accelerate AI workloads and holds a major footprint in the segment. Other players in the space include Google and Cerebras Systems. Google’s systems have also outperformed Nvidia’s servers in MLPerf tests, although those were preview machines and not readily available in the market.

Graphcore has raised over $700 million so far and was valued at $2.77 billion following its latest fundraising.

Editor's Note: After this story was published, Nvidia reached out with the following response, "Graphcore is comparing their 16-chip to our 8-chip. We believe the correct comparison is Graphcore’s 64-chip to Nvidia’s 64-chip performance. Nvidia’s ResNet-64 is 4.534 minutes, and for Graphcore it’s 8.504 minutes. On BERT 64-chip submissions, Nvidia’s score was 3.04 minutes versus 10.6 for Graphcore." For more information, view the performance results at MLcommons.org.

Significant time-to-train improvement

More