AI accelerator hardware like Google’s Tensor Processing Units and Intel’s Nervana Neural Network Processor promise to speed up AI model training, but because of the way the chips are architected, earlier stages of the training pipeline (like data preprocessing) don’t benefit from the boosts. That’s why scientists at Google Brain, Google’s AI research division, propose in a paper a technique called “data echoing,” which they say reduces the computation used by earlier pipeline stages by reusing intermediate outputs from these stages.
According to the researchers, the best-performing data echoing algorithms can match the baseline’s predictive performance using less upstream processing, in some cases compensating for a four times slower input pipeline.
“Training a neural network requires more than just the operations that run well on accelerators, so we cannot rely on accelerator improvements alone to keep producing speedups in all cases,” observed the coauthors. “A training program may need to read and decompress training data, shuffle it, batch it, and even transform or augment it. These steps may exercise multiple system components, including CPUs, disks, network bandwidth, and memory bandwidth.”
In a typical training pipeline, the AI system first reads and decodes the input data and then shuffles the data, applying a set of transformations to augment it before gathering examples into batches and iteratively updating parameters to reduce error. The researchers’ data echoing approach inserts a stage in the pipeline that repeats the output data of the previous stage before the parameters update, theoretically reclaiming idle compute capacity.
In experiments, the team evaluated data echoing on two language modeling tasks, two image classification tasks, and one object detection task using AI models trained on open source data sets. They measured training time as the number of “fresh” training examples required to reach a target metric, and they investigated whether data echoing could reduce the number of examples needed.
The coauthors report that in all but one case, data echoing required fewer fresh examples than the baseline and reduced training. Furthermore, they note that the earlier echoing is inserted in the pipeline — i.e., before data augmentation compared with after batching — the fewer fresh examples were needed, and that echoing occasionally performed better with larger batch sizes.
“All data echoing variants achieved at least the same performance as the baseline for both tasks … [It’s] a simple strategy for increasing hardware utilization when the training pipeline has a bottleneck in one of the upstream stages,” wrote the team. “Data echoing is an effective alternative to optimizing the training pipeline or adding additional workers to perform upstream data processing, which may not always be possible or desirable.”
VentureBeatVentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the subjects of interest to you
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
- networking features, and more