MIT aims for energy efficiency in AI model training

In a newly published paper, MIT researchers propose a system for training and running AI models in a way that's more environmentally friendly than previous approaches. They claim it can cut down on the pounds of carbon emissions involved to "low triple digits" in some cases, mainly by improving the computational efficiency of the models.

Impressive feats have been achieved with AI across domains like image synthesis, protein modeling, and autonomous driving, but the technology's sustainability issues remain largely unresolved. Last June, researchers at the University of Massachusetts at Amherst released a report estimating that the amount of power required for training and searching a certain model involves the emissions of roughly 626,000 pounds of carbon dioxide -- equivalent to nearly 5 times the lifetime emissions of the average U.S. car.

The researchers' solution, a "once-for-all" network, trains a large model comprising many pretrained sub-models of different sizes that can be tailored to a range of platforms without retraining. Each sub-model can operate independently at inference time without retraining, and the system identifies the best sub-model based on the accuracy and latency trade-offs that correlate to the target hardware's power and speed limits. (For instance, for smartphones the system will select larger subnetworks, but with different structures depending on individual battery lifetimes and computation resources.)

A "progressive shrinking" algorithm efficiently trains the large model to support all of the sub-models simultaneously. The large model is trained first, and then smaller sub-models are trained with the help of the large model so that they learn concurrently. In the end, all of the sub-models are supported, allowing speedy specialization based on the target platform's specifications.

In experiments, the researchers found that training a computer vision model containing over 10 quintillion architectural settings with their approach ended up being far more efficient than spending hours training each sub-network. Furthermore, it didn't compromise the model's accuracy or efficiency -- the model achieved state-of-the-art accuracy on mobile devices when tested against a common benchmark (ImageNet) and was 1.5 to 2.6 times faster in terms of inference than leading classification systems.

Perhaps more impressive, the researchers claim that the computer vision model required roughly 1/1,300 the carbon emissions while training compared with today's popular model search techniques. "If rapid progress in AI is to continue, we need to reduce its environmental impact," said IBM fellow and member of the MIT-IBM Watson AI Lab John Cohn, referring to the study. "The upside of developing methods to make AI models smaller and more efficient is that the models may also perform better."

Available on GitHub are the code and pretrained models for devices like the Samsung Galaxy Note8, Samsung Galaxy Note10, Samsung Galaxy S7 Edge, LG G8, Google Pixel, and Pixel 2. Both are also available for processors like Intel Xeon and graphics cards like Nvidia's GTX 1080Ti, Jetson TX2, and V100.

It's worth noting that MIT's work builds on approaches like that outlined in a 2017 paper titled "Efficient Processing of Deep Neural Networks: A Tutorial and Survey." This research laid out some of the ways to reduce the computational demands of machine learning models, including changes to hardware design, collaboration on hardware design, and the algorithms themselves. Other proposals have called for an industry-level energy analysis and a compute-per-watt standard for machine learning projects.

More