OpenAI begins publicly tracking AI model efficiency

OpenAI today announced it will begin tracking machine learning models that achieve state-of-the-art efficiency, an effort it believes will help identify candidates for scaling and achieving top overall performance. To kick-start things, the firm published an analysis suggesting that since 2012, the amount of compute needed to train an AI model to the same performance on classifying images in a popular benchmark -- ImageNet -- has been decreasing by a factor of 2 every 16 months.

Beyond spotlighting top-performing AI models, OpenAI says that publicly measuring efficiency -- which here refers to reducing the compute needed to train a model to perform a specific capability -- will paint a quantitative picture of algorithmic progress. It's OpenAI's assertion that this in turn will inform policy making by renewing the focus on AI's technical attributes and societal impact.

"Algorithmic improvement is a key factor driving the advance of AI. It's important to search for measures that shed light on overall algorithmic progress, even though it's harder than measuring such trends in compute," OpenAI wrote in a blog post. "Increases in algorithmic efficiency allow researchers to do more experiments of interest in a given amount of time and money. [Our] ... analysis suggests policymakers should increase funding for compute resources for academia, so that academic research can replicate, reproduce, and extend industry research."

OpenAI says that in the course of its survey, it found that Google's Transformer architecture surpassed a previous state-of-the-art model -- seq2seq, which was also developed by Google -- with 61 times less compute three years after seq2seq's introduction. DeepMind's AlphaZero, a system that taught itself from scratch how to master the games of chess, shogi, and Go, took 8 times less compute to match an improved version of the system's predecessor -- AlphaGoZero -- one year later. And OpenAI's own Dota 2-playing OpenAI Five Rerun required 5 times less training compute to surpass OpenAI Five -- the model on which it's based -- just three months later.

OpenAI speculates that algorithmic efficiency might outpace gains from Moore's law, the observation that the number of transistors in an integrated circuit doubles about every two years. "New capabilities ... typically require a significant amount of compute expenditure to obtain, then refined versions of those capabilities ... become much more efficient to deploy due to process improvements," OpenAI wrote. "Our results suggest that for AI tasks with high levels of investment [in] researcher time and or compute, algorithmic efficiency might outpace ... hardware efficiency."

As a part of its benchmarking effort, OpenAI says it will start with vision and translation efficiency benchmarks -- specifically ImageNet and WMT14 -- and that it will consider adding more benchmarks over time. (Original authors and collaborators will receive credit.) No human captioning, other images, or other data will be allowed, but there won't be any restrictions on training data used for translation or augmented augmentation.

"Industry leaders, policymakers, economists, and potential researchers are all trying to better understand AI progress and decide how much attention they should invest and where to direct it," OpenAI wrote. "Measurement efforts can help ground such decisions."

OpenAI isn't the first to propose publicly benchmarking of the efficiency of AI models, it's worth noting. Last year, scientists at the Allen Institute for AI, Carnegie Mellon University, and the University of Washington advocated for making efficiency a more common evaluation criterion for AI academic papers, alongside accuracy and related measures. Other proposals have called for an industry-level energy analysis and a compute-per-watt standard for machine learning projects.

More