IBM unveiled a new technique today that’s supposed to drastically reduce how much time it takes to train distributed deep learning (DDL) systems by applying a ton of powerful hardware to the task. It works by optimizing data transfer between hardware components that run a deep neural network.
The key issue IBM is trying to solve is that of networking bottlenecks in distributed deep learning systems. While it’s possible to spread the computational load for training a deep neural network out over many computers, that process becomes less and less efficient because of high-latency connections between the hardware doing the actual computation.
PowerAI DDL, a new communication library released in conjunction with an explanatory research paper, aims to improve efficiency by making sure that the systems at play take advantage of all the high-performance connections available. Using PowerAI DDL, IBM was able to train the popular Resnet-50 deep neural network on the ImageNet data set in 50 minutes, using 64 servers, each with four GPUs.
Organizations with enough hardware to really take advantage of PowerAI DDL’s capabilities could see massive improvements in how much time their data scientists have to spend waiting for experiments to run. If experiments run faster, scientists can do more of them, which should produce better results.
IBM’s communication library is being released as part of its PowerAI software package, which allows data scientists and engineers to perform machine learning tasks on the tech titan’s high-performance Power Systems servers. For testing, the company used 64 Power8 S822LC servers, which each come packed with four Nvidia Tesla P100-SXM2 GPUs.
That’s a lot of pricey hardware, but for organizations with cash to burn and a need for high-performance AI computation, it could be just what the doctor ordered.
Releasing the technology through PowerAI should make it easier for people to reap the benefits of IBM’s research, since it’s integrated with a an existing piece of software that’s supposed to just run on Power Systems hardware.
However, that ease of implementation comes at a cost: IBM is only releasing PowerAI DDL for its own hardware and won’t be making the code for the system available as an open source project so that it can be reimplemented on other platforms.
That’s in contrast to Facebook’s distributed neural network optimization work, which came out earlier this month. The social networking giant released its code — which enabled the training of Resnet-50 on 256 GPUs in one hour — under an open source license.
(IBM is no stranger to contributing code to deep learning projects, it just chose not to do so in this case.)
Despite the distribution differences, both of those papers highlight an important frontier in deep learning research. Both companies’ work shows that there’s more to be done when it comes to improving the speed of machine learning systems. The fruits of this acceleration could also go on to benefit other applications, which could have greater follow-on effects.
One of the things that’s important to note in both cases is that while training Resnet is useful as a benchmark, it’s unclear how those results translate to other applications. While it seems likely that the techniques laid out in IBM’s paper should provide additional performance benefits, the company hasn’t done extensive testing yet.