Modern techniques in natural language processing (NLP), a branch of artificial intelligence that helps computers interpret human language, are capable of surprising nuance. One example is Facebook’s new NLP integration for Pages, which can automatically ingest the text from a business’ Facebook Page and spit out answers to common questions from customers. But even cutting-edge NLP algorithms share a problem: They’re highly optimized for a specific task.

“Deep learning models are often pretty fragile,” Bryan McCann, a research scientist at Salesforce, told VentureBeat in a briefing. “You can have a model that works for translation, but it might not do well on sentiment analysis or summarization.”

Undeterred, scientists at Salesforce Research, led by chief scientist Richard Socher, took a two-pronged stab at the problem. They developed both a 10-task natural language processing challenge — the Natural Language Decathlon (decaNLP) — and a model that can solve it — the Multitask Question Answering Network (MQAN) — in PyTorch, an open source machine learning library for the Python programming language.

“We designed a general model that can do [lots] of different natural language processing tasks,” McCann said

DecaNLP puts the MQAN through a veritable gauntlet of linguistic tests, including question-answering (in which the model receives a question and a context that contains the information necessary to arrive at an answer), and machine translation (which has the model translate an input document from one language to another). There’s a document summarization test, a natural language inference test, a sentiment analysis test, a semantic role labeling test, a relation extraction test, a goal-oriented dialog test, a query generation test, and a pronoun resolution test.

Salesforce NLP research

Above: An illustration of MQAN training.

Image Credit: Salesforce

To judge the model’s performance, the researchers normalized the results of each test and added them together to arrive at a number between 0 and 1000 — the decaScore.

The researchers found that the MQAN, when jointly trained on all 10 tests without any task-specific modules or parameters, performed at least as well as 10 MQANs trained on each test separately. And in some domains —  specifically, transfer learning for machine translation and named entity recognition, domain adaptation for sentiment analysis and natural language inference, and zero-shot capabilities for text classification — it showed improvement compared to single-task models.

“One of the training tasks involved natural language question translation into a database query language,” McCann said. “We didn’t explicitly optimize for that, but we actually have state-of-the-art performance. We’ve lowered the difficulty bar for anybody trying to solve an NLP problem.”

Socher said the model’s ability to perform well in tasks it hasn’t been trained to do could pave the way for more robust, natural chatbots that are better able to infer meaning from human users’ questions.

Code for obtaining dataset and training will be released today, along with a leaderboard of top decaScores. Model training will take a “few days” on a modern GPU, according to the team.