Baidu, the Beijing conglomerate behind the eponymous Chinese search engine, invests heavily in natural language processing (NLP) research. In October, it debuted an AI model capable of beginning a translation just a few seconds into a speaker’s speech and finishing seconds after the end of a sentence, and in 2016 and 2017, it launched SwiftScribe, a web app powered by its DeepSpeech platform, and TalkType, a dictation-centric Android keyboard.
Building on that and other previous work, Baidu this week detailed ERNIE (Enhanced Representation through kNowledge IntEgration), a natural language model based on its PaddlePaddle deep learning platform. The company claims it achieves “high accuracy” on a range of language processing tasks, including natural language inference, semantic similarity, named entity recognition, sentiment analysis, and question-answer matching, and that it’s state-of-the-art with respect to Chinese language understanding.
The source code and pretrained models are available on Github.
“In recent years, unsupervised pre-trained language models have made great progress on various NLP tasks,” Baidu explained in a blog post. “[But] early work in this field focused on context-independent word embedding. [T]hese models mainly focused on the original language signals, not on semantic units in the text … We considered that if the model can learn the implicit knowledge from texts, its performances on various tasks will be further improved.”
Toward that end, the character-based ERNIE was architected to learn the semantic representation of concepts by ingesting paragraphs containing partially masked words. It’s a versatile approach — Baidu says that unlike systems that rely on word-level modeling to suss out relationships among parts of speech, ERNIE is able to comprehend the “compositional meaning” of sequential characters like “红色,蓝色, 绿色,” which means red, blue and green, respectively.
Furthermore, ERNIE uses a dialogue language model to tackle question-answer scenarios, along with a technique called dialogue response loss. Essentially, it takes two adjacency pairs — two utterances by two speakers, one after the other — and encodes them mathematically to identify the speakers’ roles and learn implicit relationships in the exchange.
To validate ERNIE’s design, the researchers fed it with online encyclopedia articles, news clippings, and forum threads, and had it infer knowledge omitted from sample paragraphs. It managed to correctly fill in prompts like “Relativity is a theory about space-time and gravity, which was founded by _________” (ERNIE’s answer: “Einstein”) and “The surface area of the Earth is 510 million square kilometers, which of 71 percent are ________, 29 percent are land” (ERNIE: “ocean.” And far more impressively, when tested on a benchmark devised by Facebook and New York University researchers (XNLI), it outperformed Google’s BERT on Chinese data.
Baidu says it plans to integrate ERNIE with “a variety of products.” One likely beneficiary is DuerOS, a suite of software developer kits (SDKs), APIs, and turnkey solutions that enable original equipment manufacturers to build Baidu’s voice platform into smart speakers, refrigerators, washing machines, set-top boxes, and more. To date, more than 200 companies have launched 110 DuerOS-powered products, and Baidu announced in November that DuerOS is installed on over 150 million devices and has more than 35 million monthly active users.