We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
When it comes to AI, algorithmic innovations are substantially more important than hardware — at least where the problems involve billions to trillions of data points. That’s the conclusion of a team of scientists at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), who conducted what they claim is the first study on how fast algorithms are improving across a broad range of examples.
Algorithms tell software how to make sense of text, visual, and audio data so that they can, in turn, draw inferences from it. For example, OpenAI’s GPT-3 was trained on webpages, ebooks, and other documents to learn how to write papers in a humanlike way. The more efficient the algorithm, the less work the software has to do. And as algorithms are enhanced, less computing power should be needed — in theory. But this isn’t settled science. AI research and infrastructure startups like OpenAI and Cerberus are betting that algorithms will have to increase in size substantially to reach higher levels of sophistication.
The CSAIL team, led by MIT research scientist Neil Thompson, who previously coauthored a paper showing that algorithms were approaching the limits of modern computing hardware, analyzed data from 57 computer science textbooks and more than 1,110 research papers to trace the history of where algorithms improved. In total, they looked at 113 “algorithm families,” or sets of algorithms that solved the same problem, that had been highlighted as most important by the textbooks.
The team reconstructed the history of the 113, tracking each time a new algorithm was proposed for a problem and making special note of those that were more efficient. Starting from the 1940s to now, the team found an average of eight algorithms per family of which a couple improved in efficiency.
For large computing problems, 43% of algorithm families had year-on-year improvements that were equal to or larger than the gains from Moore’s law, the principle that the speed of computers roughly doubles every two years. In 14% of problems, the performance improvements vastly outpaced those that came from improved hardware, with the gains from better algorithms being particularly meaningful for big data problems.
The new MIT study adds to a growing body of evidence that the size of algorithms matters less than their architectural complexity. For example, earlier this month, a team of Google researchers published a study claiming that a model much smaller than GPT-3 — fine-tuned language net (FLAN) — bests GPT-3 by a large margin on a number of challenging benchmarks. And in a 2020 survey, OpenAI found that since 2012, the amount of compute needed to train an AI model to the same performance on classifying images in a popular benchmark, ImageNet, has been decreasing by a factor of two every 16 months.
There’s findings to the contrary. In 2018, OpenAI researchers released a separate analysis showing that from 2012 to 2018, the amount of compute used in the largest AI training runs grew more than 300,000 times with a 3.5-month doubling time, exceeding the pace of Moore’s law. But assuming algorithmic improvements receive greater attention in the years to come, they could solve some of the other problems associated with large language models, like environmental impact and cost.
In June 2020, researchers at the University of Massachusetts at Amherst released a report estimating that the amount of power required for training and searching a certain model involves the emissions of roughly 626,000 pounds of carbon dioxide, equivalent to nearly 5 times the lifetime emissions of the average U.S. car. GPT-3 alone used 1,287 megawatts during training and produced 552 metric tons of carbon dioxide emissions, a Google study found — the same amount emitted by 100 average homes’ electricity usage over a year.
On the expenses side, a Synced report estimated that the University of Washington’s Grover fake news detection model cost $25,000 to train; OpenAI reportedly racked up $12 million training GPT-3; and Google spent around $6,912 to train BERT. While AI training costs dropped 100-fold between 2017 and 2019, according to one source, these amounts far exceed the computing budgets of most startups and institutions — let alone independent researchers.
“Through our analysis, we were able to say how many more tasks could be done using the same amount of computing power after an algorithm improved,” Thompson said in a press release. “In an era where the environmental footprint of computing is increasingly worrisome, this is a way to improve businesses and other organizations without the downside.”
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.