Google-led paper pushes back against claims of AI inefficiency

Google this week pushed back against claims by earlier research that large AI models can contribute significantly to carbon emissions. In a paper coauthored by Google AI chief scientist Jeff Dean, researchers at the company say that the choice of model, datacenter, and processor can reduce carbon footprint by up to 100 times and that "misunderstandings" about the model lifecycle contributed to "miscalculations" in impact estimates.

Carbon dioxide, methane, and nitrous oxide levels are at the highest they've been in the last 800,000 years. Together with other drivers, greenhouse gases likely catalyzed the global warming that's been observed since the mid-20th century. It's widely believed that machine learning models, too, have contributed to the adverse environmental trend. That's because they require a substantial amount of computational resources and energy -- models are routinely trained for thousands of hours on specialized hardware accelerators in datacenters estimated to use 200 terawatt-hours per year. The average U.S. home consumes about 10,000 kilowatt-hours per year, a fraction of that total.

In June 2020, researchers at the University of Massachusetts at Amherst released a report estimating that the amount of power required for training and searching a certain model involves the emissions of roughly 626,000 pounds of carbon dioxide, equivalent to nearly 5 times the lifetime emissions of the average U.S. car. Separately, leading AI researcher Timnit Gebru coauthored a paper that spotlights the impact of large language models' carbon footprint on marginalized communities.

Gebru, who was fired from her position on an AI ethics team at Google in what she claims was retaliation, was told her work didn't meet Google's criteria for publication because it lacked reference to recent research. In an email, Dean accused Gebru and the study's other coauthors of disregarding advances that have shown greater efficiencies in training and might mitigate carbon impact.

This latest Google-led research, which was conducted with University of California, Berkeley researchers and focuses on natural language model training, defines the footprint of a model as a function of several variables. They include the choice of algorithm, the program that implements it, the number of processors that run the program, the speed and power of those processors, a datacenter's efficiency in delivering power and cooling the processors, and the energy supply mix -- for example, renewable, gas, or coal.

The coauthors argue that Google engineers are often improving the quality of existing models rather than starting from scratch, which minimizes the environmental impact of training. For example, the papers suggests that Google's Evolved Transformer model, an improvement upon the Transformer, uses 1.6 times fewer floating point operations per second (FLOPS) and takes 1.1 to 1.3 times less training time. Another improvement -- sparse activation -- leads to 55 times less energy usage and reduces net carbon emissions by around 130 times compared with "dense" alternatives, according to the researchers.

The paper also makes the claim that Google's custom AI processors, called tensor processing units (TPUs), enable energy savings in the cloud far greater than previous research has acknowledged. The average cloud datacenter is roughly twice as energy efficient as an enterprise datacenter, the coauthors posit, pointing to a recent paper in Science that found that global datacenter energy consumption increased by only 6% compared with 2010, despite computing capacity increasing by 550% over the same time period.

Earlier studies, the paper says, made incorrect assumptions about model training approaches like neural architecture search, which automates the design of systems by finding the best model for a particular task. One energy consumption estimate for Evolve Transformers ended up 18.7 times "too high" and 88 times off in emissions, in the Google-led research team's estimation. And publicly available calculators like ML Emissions and Green Algorithms estimate gross carbon dioxide emissions as opposed to net emissions, which could be up to 10 times lower, the paper says.

"Reviewers of early [research] suggested that ... any tasks run in a green datacenter simply shift other work to dirtier datacenters, so there is no net gain," the coauthors wrote. "It's not true, but that speculation reveals many seemingly plausible but incorrect fallacies: datacenters are fully utilized, cloud centers can't grow, renewable energy is fixed and can't grow, Google ... model training competes with other tasks in the datacenter, training must run in all datacenters, [and] there is no business reason to reduce carbon emissions."

The coauthors evaluated the energy usage and carbon emissions of five recent large natural language processing models, using their own formulas for the calculations. They concluded that:

T5, Google's pretrained language model, used 86 megawatts and produced 47 metric tons of carbon dioxide emissions
Meena, Google's multiturn, open-domain chatbot, used 232 megawatts and produced 96 metric tons of carbon dioxide emissions
GShard, a Google-developed language translation framework, used 24 megawatts and produced 4.3 metric tons of carbon dioxide emissions.
Switch Transformer, a Google-developed routing algorithm, used 179 megawatts and produced 59 metric tons of carbon dioxide emissions
GPT-3, OpenAI's sophisticated natural language model, used 1,287 megawatts and produced metric 552 metric tons of carbon dioxide emissions

"We believe machine learning papers requiring large computational resources should make energy consumption and carbon dioxide emissions explicit when practical," the coauthors wrote. "We are working to be more transparent about energy use and carbon dioxide emissions in our future research. To help reduce the carbon footprint of machine learning, we believe energy usage and carbon dioxide emissions should be a key metric in evaluating models."

Conflict of interest

The thoroughness of the paper belies the conflict of Google's commercial interests with viewpoints expressed in third-party research. Many of the models the company develops power customer-facing products, including Cloud Translation API and Natural Language API. Revenue from Google Cloud, Google's cloud division that includes its managed AI services, jumped nearly 46% year-over-year in Q1 2021 to $4.04 billion.

While the Google-led research disputes this, at least one study shows that the amount of compute used to train the largest models for natural language processing and other applications has increased 300,000 times in 6 years -- a higher pace than Moore's law. The coauthors of a recent MIT study say that this suggests that deep learning is approaching its computational limits. "We do not anticipate [meeting] the computational requirements implied by the targets ... The hardware, environmental, and monetary costs would be prohibitive," the MIT coauthors said.

Even if the Google-led paper's figures are taken at face value, the training of Google's models produced a total of over 200 metric tons of carbon dioxide emissions. That's equivalent to average greenhouse gas emissions from roughly 43 cars or 24 homes over the course of the year. Matching the threshold of emissions reached by training OpenAI's GPT-3 alone would require driving a passenger vehicle just over 1.3 million miles.

It's been established that impoverished groups are more likely to experience significant environmental-related health issues, with one study out of Yale finding low-income communities and those comprised predominantly of minorities experienced higher exposure to air pollution compared to nearby white neighborhoods. A more recent study from the University of Illinois at Urbana-Champaign shows that Black Americans are subjected to more pollution from every source, including industry, agriculture, all manner of vehicles, construction, residential sources, and even emissions from restaurants.

Gebru's work notes that while some of the energy supplying datacenters comes from renewable or carbon credit-offset sources, the majority is not sourced from renewable sources, and many sources in the world aren't carbon neutral. Moreover, renewable energy sources are still costly to the environment, Gebru and coauthors note, and datacenters with increasing computation requirements take away from other potential uses of green energy.

"When we perform a risk/benefit analyses of language technology, we must keep in mind how the risks and benefits are distributed, because they do not accrue to the same people," Gebru and coauthors wrote. "Is it fair or just to ask, for example, that the residents of the Maldives (likely to be underwater by 2100) or the 800,000 people in Sudan affected by drastic floods pay the environmental price of training and deploying ever-larger English language models, when similar large-scale models aren’t being produced for Dhivehi or Sudanese Arabic?"

The Google-led paper and prior works do align on recommendations to reduce the carbon impact of models, at least on the topic of transparency. As have others, the Google coauthors call on researchers to measure energy usage and carbon dioxide emissions and publish the data in their papers. They also argue that efficiency should be an evaluation criterion for publishing machine learning research on computationally intensive models, as well as accuracy and related metrics. Beyond this, the Google-led paper calls for researchers to publish the amount of accelerator hardware they used and how much time they took to train computationally intensive models.

"When developing a new model, much of the research process involves training many model variants on a training set and performing inference on a small development set. In such a setting, more efficient training procedures can lead to greater savings," scientists at the Allen Institute for AI, Carnegie Mellon University, and the University of Washington wrote in a recent paper. "[Increasing] the prevalence of 'green AI' [can be accomplished] by highlighting its benefits [and] advocating a standard measure of efficiency."