Chinese tech giant Baidu today introduced ERNIE 2.0, a conversational AI framework and model that works in Chinese and English. ERNIE 2.0 achieves state-of-the-art results better than Google’s BERT and XLNet in a range of language understanding benchmarks and state-of-the-art results in 9 Chinese natural language tasks.
ERNIE stands for Enhanced Representation through Knowledge Integration, and like Google’s BERT, ERNIE 2.0 relies on a transformer encoder and the BookCorpus data set for training.
ERNIE outperformed other high-ranking NLP models in tasks like sentiment analysis on movie reviews (SST-2) or the ability to infer meaning from a sentence (MNLI).
ERNIE 2.0 builds on the first version of ERNIE, a Chinese language understanding model open-sourced by Baidu earlier this year. Both the original and ERNIE 2.0 achieve state-of-the-art results in a variety of Chinese language benchmarks.
ERNIE 2.0 applies multitask learning and a series of pretraining tasks such as capital letter predictions (since capitalized words often contain proper nouns) and tasks to do things like learn relationships between sentences or dole out semantic understanding.
All pretraining tasks use self-supervised or weak-supervised signals that can be obtained from massive data without human labels.
It also relies on something Baidu refers to as continual pretraining.
“The process of continual pre-training contains two steps. Firstly, we continually construct unsupervised pre-training tasks with big corpus and/or priori knowledge available. Secondly, we incrementally update the ERNIE model via multi-task learning,” a paper on the subject published Monday on arXiv reads.
In other news, earlier this month at its annual AI developer conference, Baidu announced that its Apollo autonomous vehicle system has driven more than 1 million miles in Chinese cities, and shared plans to collaborate with Intel on its Nervana Neural Network project.
Toward Baidu’s goal stated two years ago to make its Duer conversational platform available to everyone, Baidu also announced Duer now powers voice commands for 400 million devices, up from 100 million one year ago.