VentureBeat presents: AI Unleashed - An exclusive executive event for enterprise data leaders. Network and learn with industry peers. Learn More

In a paper published on the preprint server, Facebook researchers describe Multilingual Autoencoder that Retrieves and Generates (MARGE). It’s a language model that generates words, sentences, and paragraphs by retrieving related words, sentences, and paragraphs in different languages and identifying patterns within them.

The researchers claim MARGE learns to paraphrase, translate, and summarize text without any fine-tuning, a potential step toward systems that can perform any text task from pretraining alone.

In machine learning, pretraining involves training an AI model on a vast amount of data before it’s fine-tuned on a narrow data set tailored to particular tasks, like summarization. Masked models — which pretrain by removing and then reconstructing parts of an input text — are widely used in the language domain. But by design, they have to memorize a vast amount of encyclopedic knowledge to achieve strong performance.

Facebook MARGE AI

Above: A demonstration of MARGE’s translation skills.

MARGE, by contrast, emphasizes paraphrasing while reducing the required amount of knowledge. During pretraining, it ingests batches of “evidence” documents and target documents, and it learns to accurately summarize and translate specific snippets of text (conditioned on the evidence documents) as it susses out the relevance of evidence to each target.


AI Unleashed

An exclusive invite-only evening of insights and networking, designed for senior enterprise executives overseeing data stacks and strategies.


Learn More

MARGE first computes a relevance score between every pair of documents, which encourages it to attend more to relevant evidence documents. It then computes the likelihood of reconstructing each target using a modified seq2seq model, a general-purpose encoder-decoder model for language processing. Lastly, MARGE constructs batches so that evidence documents are relevant to the targets, using the relevance model for retrieval.

During experiments, the researchers created a Transformer model with 960 million parameters dubbed MARGE-NEWS, which comprised 2,048 “workers” that processed sub-batches of four documents (two evidence and two targets) each for 550,000 steps. They further pretrained it for 100,000 steps on Wikipedia data and rebuilt the index every 10,000 steps, so that MARGE-NEWS took on average four monolingual and four cross-lingual links per target document. (The documents spanned 26 different languages in total.)

The researchers report that on the task of cross-lingual sentence retrieval, MARGE outperformed all other unsupervised models (i.e., models that look for patterns in unlabeled data sets) according to one benchmark (BUCC), and performed comparably to Facebook’s leading XLM-R model against another benchmark (Tatoeba). And on BLEU, a metric that measures language translation quality, MARGE achieved 3.58 for German to English — among the highest scores for a system without fine-tuning.

MARGE also edged out state-of-the-art models when tasked with determining whether two sentences are paraphrases and answering questions about documents in Chinese. It struggled in some cases to generate non-English languages, particularly those with non-Latin alphabets, but the researchers report that English-to-French worked well.

“MARGE exhibits strong performance on a range of discriminative and generative tasks in many languages, both with and without fine-tuning … We show that fine-tuning gives strong performance on a range of discriminative and generative tasks in many languages, making MARGE the most generally applicable pre-training method to date,” the coauthors wrote. “Future work should scale MARGE to more domains and languages, and study how to more closely align pre-training objectives with different end tasks.”

It should be noted that the researchers don’t appear to have tested MARGE on data sets designed to uncover gender, racial, ethnic, and other biases, like StereoSet. This is somewhat concerning considering Facebook’s poor ethical track record as of late. A spokesperson recently told VentureBeat the company doesn’t tally diversity statistics by teams like Facebook AI Research, the group that produced this work. And in a recent Twitter exchange, Facebook chief AI scientist Yann LeCun suggested data alone leads to prejudicial AI systems, a position with which scholars like Google ethical AI co-lead Timnit Gebru took issue.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.