MIT researchers develop self-learning language models that outperform larger counterparts

Researchers at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) have achieved a groundbreaking advancement in language modeling in the realm of dominant large language models (LLMs).

The CSAIL team has pioneered an innovative approach to language modeling that challenges the conventional belief that smaller models possess limited capabilities. The research introduces a scalable, self-learning model that surpasses larger counterparts by up to 500 times in specific language understanding tasks, all without reliance on human-generated annotations.

The algorithm developed by the MIT team, named “SimPLE” (Simple Pseudo-Label Editing), utilizes self-training, a technique that allows the model to learn from its own predictions, thereby eliminating the need for additional annotated training data. This model was devised to tackle the challenge of generating inaccurate labels during self-training.

Notably, the research team claims that this inventive approach significantly enhances the model’s performance across various tasks, surpassing notable models such as Google’s LaMDA, FLAN and other GPT models.

A revolution (but limited in scope)

In their paper Entailment as Robust Self-Learners, the MIT research team presents the argument that while recent advancements in language generation with LLMs have brought about a revolution, these models possess a distinct limitation when it comes to understanding tasks.

“Digital calculators are better than GPT-4 in arithmetic because they are designed based on arithmetic principles,” Hongyin Luo, MIT CSAIL postdoctoral associate and research lead author, told VentureBeat. “Our small model is trained to grasp the core principle of language understanding — contextual entailment, while LLMs do not explicitly learn about it. With a clear goal of learning contextual entailment, the parameter efficiency of our model is much higher than LLMs, thus achieving good performance on NLU tasks."

The research also states that, simply put, a competent contextual entailment model must also excel as an natural language understanding (NLU) model.

Moreover, the CSAIL team believes that the implications of this research go beyond mere enhancements in performance. It challenges the prevailing notion that larger models are inherently superior, highlighting the potential of smaller models as equally powerful and environmentally sustainable alternatives.

Enhancing language model understanding through textual entailment

The MIT CSAIL team focused on textual entailment to enhance the model’s comprehension of diverse language tasks. Textual entailment denotes the connection between two sentences, whereby if one sentence (the premise) is true, it is probable that the other sentence (the hypothesis) is also true.

By training the model using a model that recognizes these relationships, the researchers were able to generate prompts to assess whether specific information is entailed by a given sentence or phrase within various tasks. This zero-shot adaptation significantly enhanced the model’s versatility and adaptability.

MIT’s Luo told VentureBeat that although LLMs have showcased impressive abilities in generating language, art and code, they carry considerable computational costs and privacy risks when handling sensitive data. Conversely, smaller models have historically fallen behind their larger counterparts in multi-tasking and weakly supervised tasks.

To address these challenges, the MIT CSAIL researchers employed a natural language-based logical inference dataset to develop smaller models that outperformed much larger models. In addition, by incorporating the concept of textual entailment, researchers endowed the models with the ability to comprehend a broad spectrum of tasks.

Adapting without additional training

These models underwent training to ascertain whether specific information was entailed by a given sentence or phrase, thereby enabling them to adapt to various tasks without requiring additional training.

“The benefit of self-training is that the model can automatically label a large amount of data (create pseudo-labels), but the risk is that the pseudo-labels contain wrong predictions, which might mislead the model or cause overfitting,” said Luo. “Our SimPLE method outperforms all self-training baselines. The method combines two classic AI strategies for robustness: Uncertainty estimation and voting, and provides a more accurate set of predictions.”

Lou explained that language model training traditionally necessitates manual data annotation by humans or utilizing LLM APIs. However, human annotators often label sensitive data, thereby compromising privacy. Additionally, transmitting data to third-party annotators or OpenAI’s API may result in the inadvertent exposure of highly sensitive information.

"Our method allows data annotation without seeing the data," he explained. "An annotator only needs to write a template that describes the task. With this template, our system predicts the relationship between the response and the question, generating high-quality labels. By doing this, the dataset is annotated without sharing any data with the annotator."

Redefining AI model development through self-training

MIT’s research team asserts that the collection of smaller models exhibits versatility across a wide array of AI tasks — ranging from sentiment classification to news categorization — and demonstrate exceptional proficiency in discerning the relationship between two textual components.

The models can also infer sentiment from statements and ascertain the subject matter of news articles based on their content. The researchers achieved remarkable outcomes by reimagining various NLU tasks as entailment tasks.

According to Luo, the self-trained entailment models, which comprise 350 million parameters, outperform supervised language models with 137 to 175 billion parameters. He firmly believes that this pioneering work has the potential to redefine the AI and ML landscape, providing a language modeling solution that is more scalable, dependable and cost-effective.

"The core of the model is predicting entailment relations, while LLMs predict "how to make things read similar to the training data."

"This makes our model more suitable and efficient for language understanding," Luo added. "Our model performs better than LLMs and traditional BERT-based models trained with human-generated labels."

Paving the way for cost-efficient language model training

The paper that outlines this research, authored by Luo, James Glass and Yoon Kim, is scheduled to be presented in July at the Meeting of the Association for Computational Linguistics in Toronto, Canada. The project received support from the Hong Kong Innovation AI program.

With its pioneering approach, the research strives to establish the groundwork for future AI technologies that prioritize scalability, privacy preservation and sustainability.

Lou said that the model contains only 1/500th of the parameters compared to GPT-3-175B, making its deployment significantly easier and resulting in faster inference. The CSAIL team emphasized that organizations would now be able to deploy efficient, robust multi-task models without compromising data privacy or relying on expensive computational resources through the research.

"Our next step involves employing the entailment models in various language-related tasks," said Lou. "Currently, we are engaged in co-training with LLMs to leverage their advantages and further enhance the capabilities of our efficient self-trained models. Additionally, we are working on applying entailment models to measure the alignment between a claim and fact/moral principles, which benefits detecting machine and human-generated misinformation, hate speech and stereotypes."