ARK Invest: AI training costs dropped 100-fold between 2017 and 2019

Machine learning systems are cheaper to train now than ever before. That's the assertion of ARK Invest, which today published a meta-analysis indicating the cost of training is improving at 50 times the pace of Moore's law, the principle that computer hardware performance doubles every two years.

In its report, ARK found that while computing devoted to training doubled in alignment with Moore's law from 1960 to 2010, training compute complexity -- the amount of petaflops (quadrillions of operations per second) per day -- increased by 10 times yearly since 2010. Coinciding with this, training costs over the past three years declined by 10 times yearly; in 2017, the cost to train an image classifier like ResNet-50 on a public cloud was around $1,000, while in 2019, it was around $10.

That's surely music to the ears of startups competing with well-financed firms like Google's DeepMind, which last year recorded losses of $572 million and took on a billion-dollar debt. While some experts believe labs outmatched by tech giants are empowered by their limitations to pursue new research, it's also true that training is an unavoidable expenditure in AI work -- whether within the enterprise, academia, or otherwise.

The findings would appear to agree with -- and indeed source from -- those in a recent OpenAI report, which suggested that since 2012, the amount of compute needed to train an AI model to the same performance on classifying images in the ImageNet benchmark has been decreasing by a factor of 2 every 16 months. According to OpenAI, Google's Transformer architecture surpassed a previous state-of-the-art model -- seq2seq, which was also developed by Google -- with 61 times less compute three years after seq2seq's introduction. And DeepMind's AlphaZero, a system that taught itself from scratch how to master the games of chess, shogi, and Go, took 8 times less compute to match an improved version of the system's predecessor -- AlphaGoZero -- one year later.

ARK posits the decline in costs is attributable to breakthroughs both on the hardware and software side. For example, Nvidia's V100 graphics card, which was released in 2017, is about 1,800% faster than its K80, which launched three years earlier. (Graphics cards are commonly used to train large AI systems.) And between 2018 and 2019, there's been a roughly 800% improvement in training performance on the V100 thanks to software innovations from MIT, Google, Facebook, Microsoft, IBM, Uber, and others.

ARK predicts that at the current rate of improvement, the cost of training ResNet-50 should fall to $1. And it anticipates that the price of inference -- running a trained model in production -- will drop alongside this, settling this year at around $0.03 to run a model that can classify a billion images. (Two years ago, it would've cost $10,000.)

"Based on the pace of its cost decline, AI is in very early days," ARK analyst James Wang wrote. "During the first decade of Moore's Law, transistor count doubled every year -- or at twice the rate of change seen during decades thereafter. The 10 times to 100 times cost declines we are witnessing in both AI training and AI inference suggest that AI is nascent in its development, perhaps with decades of slower but sustained growth ahead."

While the expense of model training generally appears to be on the decline, developing sophisticated machine learning models in the cloud remains prohibitively expensive, it's worth noting. According to a recent Synced report, the University of Washington's Grover, which is tailored for both the generation and detection of fake news, cost $25,000 to train over the course of two weeks. OpenAI reportedly racked up a whopping $12 million to train its GPT-3 language model, and Google spent an estimated $6,912 training BERT, a bidirectional transformer model that redefined the state of the art for 11 natural language processing tasks.

More