Conventional AI development pipelines require processing power — and lots of it. It’s estimated that the computational baseline for AI research has been doubling every few months, resulting in a 300,000 times increase from 2012 to 2018. While that’s contributed to breakthroughs like highly dexterous robots and skilled poker-playing algorithms, the environmental costs have been enormous. One recent study found that a single model creates a carbon dioxide footprint of 284 tons during training, equivalent to five times the lifetime emissions of an average car.

That’s why scientists at the Allen Institute for AI, Carnegie Mellon University, and the University of Washington advocate ramping up research in green AI, or AI that’s environmentally friendly and “inclusive.” They propose making efficiency a more common evaluation criterion for AI academic papers, alongside accuracy and related measures, and they call for the establishment of a baseline that other researchers could improve on.

“The term green AI refers to AI research that yields novel results without increasing computational cost, and ideally reducing it,” wrote the coauthors. “Papers could be required to plot accuracy as a function of computational cost and of training set size, providing a baseline for more data-efficient research in the future.”

The researchers note the increasing complexity of cutting-edge AI models, beginning with Google’s BERT-large for natural language processing. BERT-large was trained on a data set of three billion word-pieces with 64 tensor processing units — custom-built AI accelerator chips — for four days. OpenAI’s best-performing text-generating model, GPT-2-XL, ingested 40 billion words. And as for specialized models like DeepMind’s AlphaGo, they’re even more reliant on powerful PCs for training and inference: AlphaGo needed 1,920 CPUs and 280 GPUs to play a single game of Go, at a cost of over $1,000 per hour.

Recording the work required to generate a result in AI — the sum of time spent processing a single sample, compiling a training data set, and performing experiments — is one way efficiency might be measured empirically going forward, said the paper’s coauthors. They peg floating point operations (FPO) as a potential metric, which provide an estimate to the amount of overhead performed by a computational process. FPO is imperfect in that it ignores factors like models’ memory consumption and implementations, but it directly computes the amount of work done by running machines as they execute specific instances of models and thus corresponds to the amount of energy consumed. As an added bonus, it’s agnostic to the hardware on which models are run, it’s strongly correlated with models’ running times, and it considers the amount of work done at each time step.

The researchers conceded that FPO alone isn’t enough to foster the development of truly green AI. That’s why they encourage fellow researchers to report budget/accuracy curves observed during model training, which they say would allow developers to make wiser decisions about their model selection and highlight the stability of different approaches. Moreover, they argue for making efficiency an official contribution in major AI conferences, and they support the continued release of pretrained models publicly in order to save others the costs of retraining them.

“When developing a new model, much of the research process involves training many model variants on a training set and performing inference on a small development set. In such a setting, more efficient training procedures can lead to greater savings, while in a production setting more efficient inference can be more important,” wrote the researchers. “We advocate for a holistic view of computational savings which doesn’t sacrifice in some areas to make advances in others … It’s important to reiterate that we see green AI as a valuable option not an exclusive mandate … We want to increase the prevalence of green AI by highlighting its benefits [and] advocating a standard measure of efficiency.”