Facebook open-sources RAG, an AI model that retrieves documents to answer questions

Facebook and AI startup Hugging Face today open-sourced Retrieval Augmented Generation (RAG), a natural language processing model that finds and interprets contextual information to complete a range of tasks. Facebook says that RAG can be tuned to attain state-of-the-art results by altering or supplementing its internal knowledge on the fly, enabling researchers to control what the model knows and doesn't know without wasting time or compute power retraining it.

Beginning today, RAG is available as a component of the Hugging Face transformer library. It integrates with the new Datasets library to provide the indexed knowledge source that RAG relies on.

Cutting-edge work in natural language understanding has produced general-purpose models that, while often flawed, are generalizable. But most efforts to this point have applied these models to tasks where a human could produce the solution without background knowledge, like sentiment analysis.

By contrast, RAG uses input data to retrieve a relevant set of documents from a database like Wikipedia. For instance, given the prompt, "When did the first mammal appear on Earth?," RAG might surface documents for "Mammal," "History of Earth," and "Evolution of Mammals." These are concatenated as context with the input and then fed into the model to produce the output text.

According to Facebook, RAG leverages a form of "late fusion" to integrate knowledge from retrieved documents, meaning it makes answer predictions for document-question pairs before aggregating the final prediction scores. When it has access to documents containing clues to the answer but where the answer isn't stated verbatim, RAG's performance improves further. And RAG even generates answers in certain situations where the answer is not contained in any of the retrieved documents.

When benchmarked on open-domain datasets like NaturalQuestions, which contains questions from Google Search users, Facebook says that RAG showed a knack for generating correct answers in situations where the answer wasn't anywhere to be found. It also excelled at knowledge-intensive natural language questions, which Facebook explored by creating questions inspired by Jeopardy. The Jeopardy questions RAG generated were more specific, diverse, and factual than those from comparable models, perhaps owing to RAG's ability to synthesize responses using disparate pieces of information drawn from multiple sources.

While RAG isn't being used in production at Facebook, according to research manager Sebastian Riedel, the team behind it is actively iterating to mitigate potential bias. They've restricted documents in the training dataset to Wikipedia, which they consider safer than the web crawls many of today’s language models are trained on. They're exploring a version of RAG that minimizes remaining risks so they can get to a point where the outputs are consistently safe. And they're looking into how they can scale RAG, make it multimodal, and have it operate using multiple knowledge sources at once.

"RAG's true strength lies in its flexibility. Changing what a pre-trained language model knows entails retraining the entire model with new documents. With RAG, we control what it knows simply by swapping out the documents it uses for knowledge retrieval," Facebook wrote. "We obtained very strong results on NaturalQuestions, CuratedTrec, and WebQuestions with RAG, demonstrating that state-of-the-art machine reading performance can be achieved with a generative, rather than extractive, reader."

Facebook sees broad potential for RAG, which it asserts will free researchers to deploy solutions to knowledge-intensive tasks with just a few lines of code. "We foresee the potential for future research into knowledge-intensive tasks that are just as easy and accessible as light-knowledge tasks like sentiment analysis today," Facebook wrote. "RAG allows NLP models to bypass the retraining step, accessing and drawing from up-to-date information and then using a ... generator to output the results."

More