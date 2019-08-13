Nvidia today said it’s trained the world’s largest language model, one in a series of updates by the GPU maker aimed at the advancement of conversational AI to the next level.

To achieve this, Nvidia utilizes model parallelism, a way to split a neural network into pieces, a technique to create models that are too big to fit within the memory of a single GPU. The model uses 8.3 billion parameters and is 24 larger than BERT and 5 times larger than OpenAI’s GPT-2.

Also announced today: Fastest training and inference times of Bidirectional Encoder Representations (BERT), a popular model that was state-of-the-art when it was open-sourced by Google in 2018.

Nvidia was able to train BERT-Large using optimized PyTorch software and a DGX-SuperPOD of more than 1,000 GPUs able to train BERT in 53 minutes.

“Without this kind of technology, it can take weeks to train one of these large language models,” Nvidia applied deep learning VP Bryan Catarazano said in a conversation with reporters and analysts.

Nvidia also claims it has achieved the fastest BERT inference time, dropping down to 2.2 milliseconds by running on a Tesla G4 GPU and TensorRT 5.1 optimized for data center inference. BERT inference takes up to 40 milliseconds when served by CPUs, while many conversational AI operations shoot for 10 milliseconds today, Catarazano said.

GPUs have also enabled gains for Microsoft’s Bing, which has used Nvidia hardware to cut latency time in half.

Each of the advances introduced today is meant to underline performance gains that come from use of Nvidia’s GPU can provide for language understanding. Code for each of the above feats was open-sourced today to help AI practitioners and researchers explore the creation of large language models or speed training or inference with GPUs.

Alongside a sharp decline in word error rates, the reduction of latency has been a major enabler of adoption rates for popular AI assistants like Amazon’s Alexa, Google Assistant and Baidu’s Duer.

Exchanges with an AI assistant that feel like there’s little to no delay leads to more conversations with machines comparable to human-to-human conversations that happen at the speed of thought.

Like multi-turn dialogue features introduced for Microsoft’s Cortana, Alexa, and Google Assistant this year, real-time exchanges with an assistant make the back and forth feel more natural.

Evolution of the state-of-the-art for conversational AI systems has largely revolved around the evolution of Google’s Transformer-based language model in 2017 and BERT in 2018.

Since then, BERT was surpassed by Microsoft’s MT-DNN, Google’s XLNet, and Baidu’s ERNIE, each of which builds upon BERT. Also derived from BERT, Facebook’s RoBERTa was introduced in July and is currently ranked atop the GLUE benchmark leaderboard, best in 4 of 9 language tasks. Each of the models outperforms human baseline on GLUE tasks.