OpenAI's massive GPT-3 model is impressive, but size isn't everything

Last week, OpenAI published a paper detailing GPT-3, a machine learning model that achieves strong results on a number of natural language benchmarks. At 175 billion parameters, where a parameter affects data's prominence in an overall prediction, it's the largest of its kind. And with a memory size exceeding 350GB, it's one of the priciest, costing an estimated $12 million to train.

A system with over 350GB of memory and $12 million in compute credits isn't hard to swing for OpenAI, a well-capitalized company that teamed up with Microsoft to develop an AI supercomputer. But it's potentially beyond the reach of AI startups like Agolo, which in some cases lack the capital required. Fortunately for them, experts believe that while GPT-3 and similarly large systems are impressive with respect to their performance, they don't move the ball forward on the research side of the equation. Rather, they're prestige projects that simply demonstrate the scalability of existing techniques.

"I think the best analogy is with some oil-rich country being able to build a very tall skyscraper," Guy Van den Broeck, an assistant professor of computer science at UCLA, told VentureBeat via email. "Sure, a lot of money and engineering effort goes into building these things. And you do get the 'state of the art' in building tall buildings. But ... there is no scientific advancement per se. Nobody worries about the U.S. is losing its competitiveness in building large buildings because someone else is willing to throw more money at the problem. ... I'm sure academics and other companies will be happy to use these large language models in downstream tasks, but I don't think they fundamentally change progress in AI."

Indeed, Denny Britz, a former resident on the Google Brain team, believes companies and institutions without the compute to match OpenAI, DeepMind, and other well-funded labs are well-suited to other, potentially more important research tasks like investigating correlations between model sizes and precision. In fact, he argues that these labs' lack of resources might be a good thing because it forces them to think deeply about why something works and come up with alternative techniques.

"There will be some research that only [tech giants can do], but just like in physics [where] not everyone has their own particle accelerator, there is still plenty of other interesting work," Britz said. "I don't think it necessarily creates any imbalance. It doesn't take opportunities away from the small labs. It just adds a different research angle that wouldn't have happened otherwise. ... Limitations spur creativity."

OpenAI is a counterpoint. It has long asserted that immense computational horsepower in conjunction with reinforcement learning is a necessary step on the road to AGI, or AI that can learn any task a human can. But luminaries like Mila founder Yoshua Bengio and Facebook VP and chief AI scientist Yann LeCun argue that AGI is impossible to create, which is why they're advocating for techniques like self-supervised learning and neurobiology-inspired approaches that leverage high-level semantic language variables. There's also evidence that efficiency improvements might offset the mounting compute requirements; OpenAI's own surveys suggest that since 2012, the amount of compute needed to train an AI model to the same performance on classifying images in a popular benchmark (ImageNet) has been decreasing by a factor of two every 16 months.

The GPT-3 paper, too, hints at the limitations of merely throwing more compute at problems in AI. While GPT-3 completes tasks from generating sentences to translating between languages with ease, it fails to perform much better than chance on a test -- adversarial natural language inference -- that tasks it with discovering relationships between sentences. "A more fundamental [shortcoming] of the general approach described in this paper -- scaling up any ... model -- is that it may eventually run into (or could already be running into) the limits of the [technique]," the authors concede.

"State-of-the-art (SOTA) results in various subfields are becoming increasingly compute-intensive, which is not great for researchers who are not working for one of the big labs," Britz continued. "SOTA-chasing is bad practice because there are too many confounding variables, SOTA usually doesn't mean anything, and the goal of science should be to accumulate knowledge as opposed to results in specific toy benchmarks. There have been some initiatives to improve things, but looking for SOTA is a quick and easy way to review and evaluate papers. Things like these are embedded in culture and take time to change."

That isn't to suggest pioneering new techniques is easy. A 2019 meta-analysis of information retrieval algorithms used in search engines concluded the high-water mark was actually set in 2009. Another study in 2019 reproduced seven neural network recommendation systems and found that six failed to outperform much simpler, non-AI algorithms developed years before, even when the earlier techniques were fine-tuned. Yet another paper found evidence that dozens of loss functions -- the parts of algorithms that mathematically specify their objective -- had not improved in terms of accuracy since 2006. And a study presented in March at the 2020 Machine Learning and Systems conference found that over 80 pruning algorithms in the academic literature showed no evidence of performance improvements over a 10-year period.

But Mike Cook, an AI researcher and game designer at Queen Mary University of London, points out that discovering new solutions is only a part of the scientific process. It's also about sussing out where in society research might fit, which small labs might be better able determine because they're unencumbered by the obligations to which privately backed labs, corporations, and governments are beholden. "We don't know if large models and computation will always be needed to achieve state-of-the-art results in AI," Cook said. "[In any case, we] should be trying to ensure our research is cheap, efficient, and easily distributed. We are responsible for who we empower, even if we're just making fun music or text generators."

More