GPT-3's free alternative GPT-Neo is something to be excited about

Updated 7/13/21 to correct the release date of The Pile.

The advent of Transformers in 2017 completely changed the world of neural networks. Ever since, the core concept of Transformers has been remixed, repackaged, and rebundled in several models. The results have surpassed the state of the art in several machine learning benchmarks. In fact, currently all top benchmarks in the field of natural language processing are dominated by Transformer-based models. Some of the Transformer-family models are BERT, ALBERT, and the GPT series of models.

In any machine learning model, the most important components of the training process are:

The code of the model -- the components of the model and its configuration
The data to be used for training
The available compute power

With the Transformer family of models, researchers finally arrived at a way to increase the performance of a model infinitely: You just increase the amount of training data and compute power.

This is exactly what OpenAI did, first with GPT-2 and then with GPT-3. Being a well funded ($1 billion+) company, it could afford to train some of the biggest models in the world. A private corpus of 500 billion tokens was used for training the model, and approximately $50 million was spent in compute costs.

While the code for most of the GPT language models is open source, the model is impossible to replicate without the massive amounts of data and compute power. And OpenAI has chosen to withhold public access to its trained models, making them available via API to only a select few companies and individuals. Further, its access policy is undocumented, arbitrary, and opaque.

Genesis of GPT-Neo

Connor Leahy, Stella Biderman, Leo Gao, Sid Black, Phil Wang, and others formed EleutherAI with the idea of making AI technology that would be open source to the world. One of the first problems the team chose to tackle was making a GPT-like language model that would be accessible to all.

As mentioned before, most of the code for such a model was already available, so the core challenges were to find the data and the compute power. The Eleuther team set out to generate an open source data set of a scale comparable to what OpenAI used for its GPT language models. This led to the creation of The Pile. The Pile, released in Jan 2021, is a 825GB data set specifically designed to train language models. It contains data from 22 diverse sources, including academic sources (Arxiv, PubMed, FreeLaw etc.), Internet webpages (StackExchange, Wikipedia etc.), dialogs from subtitles, Github, etc.

The Pile paper, Arxiv.

For compute, EleutherAI was able to use idle compute from TPU Research Cloud (TRC). TRC is a Google Cloud initiative that supports research projects with the expectation that the results of the research will be shared with the world via open source code, models, etc.

On March 22, 2021, after months of painstaking research and training, the EleutherAI team released two trained GPT-style language models, GPT-Neo 1.3B and GPT-Neo 2.7B. The code and the trained models are open sourced under the MIT license. And the models can be used for free using HuggingFace’s Transformers platform.

Comparing GPT-Neo and GPT-3

Let’s compare GPT-Neo and GPT-3 with respect to the model size and performance benchmarks and finally look at some examples.

Model size. In terms of model size and compute, the largest GPT-Neo model consists of 2.7 billion parameters. In comparison, the GPT-3 API offers 4 models, ranging from 2.7 billion parameters to 175 billion parameters.

estimated here, and GPT-Neo as reported by EleutherAI.

As you can see, GPT-Neo is bigger than GPT-2 and comparable to the smallest GPT-3 model.

Performance benchmark metrics. EleutherAI reports that GPT-Neo outperformed the closest comparable GPT-3 model (GPT-3 Ada) on all NLP reasoning benchmarks.

GPT-Neo outperformed GPT-3 Ada on Hellaswag and Piqa. Hellaswag is an intelligent multi-choice sentence completion benchmark that has a context paragraph and four endings. Piqa measures common sense reasoning where the machine has to pick one out of two sentences that make the most sense. GPT-Neo also outperformed GPT-3 Ada on Winogrande, a benchmark that uses common sense to resolve ambiguous pronouns in a sentence.

However GPT-3 Davinci, the largest version of GPT-3, with about 65 times as many parameters, comfortably beats GPT-Neo in all the benchmarks, as you would expect.

Caption: Model metrics as reported by EleutherAI, except GPT-3 175B, which is from Open AI’s GPT-3 paper.

Examples. Let’s look at a few side-by-side examples of generated text from the largest GPT-3 model (from various GPT-3 Davinci examples found online) and GPT-Neo (that I generated using HuggingFace’s GPT-Neo 2.7B Transformers implementation).

The first example we will look at is completion of ELI-5 format sentences, where the text in italics was the prompt given to the model.

I would say both GPT-Neo and GPT-3 worked equally well in this example.

The next experiment is to check if GPT-Neo can correct grammatically incorrect sentences like GPT-3 can. Again the italicized parts are the prompt provided to the model.

Again, GPT-Neo worked great in correcting “eat” to “ate,” especially considering the model was not specifically trained to do this.

Finally, the famous “discovery of English-speaking unicorns” piece (GPT-2 version here and GPT-3 version here) reimagined and written by GPT-Neo. The prompt given to the model is in italics and bold.

In a shocking finding, scientists discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. The only evidence was a single word: "hello." "The unicorns are very good listeners, so they learned to speak English from watching humans," said Dr. David Orme, chief executive of the Zoological Society of London, in a press release. "In fact, they also learned to speak Spanish from watching humans. They said hello to us in both languages." While the discovery is stunning in and of itself, the news gets even more mind-blowing when you consider that the unicorns were there for thousands of years. They were "living fossils," Orme said, which is a fancy way of saying that their origins have yet to be determined. "It is extremely exciting that we now have evidence of a population of unicorns, that we know to be around and that is living," Dr. John Fitzpatrick, a conservation and anthropology professor at the University of California, said in a press release. "This shows that they are not just a myth and that humans are not the only species on Earth to have developed language." The findings have been published in a new study in the journal Proceedings of the Royal Society B: Biological Sciences. A 'unicorn' from the Andean mountains in Colombia. (Image: The Royal Society) The discovery was made this summer in a remote but stunning valley in the Andean Mountains in Colombia called Bureta. It's believed the unicorns were in their 20s. "It’s a very unusual place to find these animals and at the moment there is no evidence that humans have been there before," Orme said. The scientists said the unicorns had been living in that valley as long as their species has, which is estimated at at least 200,000 years. This means the area's rich history of megafauna, including dinosaurs, pterosaurs and saber-toothed cats, is still far from over. "If it is true in a relatively isolated valley near Bureta Colombia that is more than 200,000 years old and now also having a population of these animals, then Bureta is truly a unique and special place," Fitzpatrick said.

Once again, GPT-Neo was able to generate a coherent, almost-believable article without missing out on the central themes -- unicorn discovery, the English-speaking aspect, the Andes, etc.

All in all, the performance metrics of GPT-Neo 2.7B in NLP benchmarks is better than GPT-3 2.7B (Ada), but much worse than the GPT-3 175B (Davinci). But qualitatively, GPT-Neo 2.7B's completions and writing were as good as even GPT-3 175B (Davinci), the largest GPT-3 model.

The bottom line here is: GPT-Neo is a great open source alternative to GPT-3, especially given OpenAI’s closed access policy.

Abhishek Iyer is the founder of FreeText AI, a company specializing in text mining and review analysis.

Genesis of GPT-Neo

Comparing GPT-Neo and GPT-3

More