Researchers release data sets to train coronavirus chatbots

A preprint paper published by researchers at the University of California, San Diego; Carnegie Mellon University; and the University of California, Davis proposes AI chatbots that generate responses to patient questions about the coronavirus. The team trained the models underpinning these chatbots on a data set in English and one in Chinese. The data sets contained conversations between doctors and patients talking about the coronavirus, and the researchers claim experiments demonstrate that their approach to meaningful medical dialogues is "promising."

As the coronavirus rages on around the world, some hospitals are discouraging unnecessary visits to prevent the risk of cross-infection. Telemedicine apps and services have consequently been overwhelmed by an influx of patients. In March, virtual health consultations grew by 50%, according to Frost and Sullivan research. Against this backdrop, autonomous chatbots designed for coronavirus triage seem primed to help relieve the burden on health providers.

The researchers trained several dialogue models on the data sets -- CovidDialog -- that they scraped from iCliniq, Healthcare Magic, HealthTap, Haodf, and other online health care forums. The English data set contained 603 consultations, while the Chinese data set had 1,088 consultations. Each consultation starts with a short description of a patient's medical conditions, followed by a conversation between that patient and a doctor, and it optionally includes diagnoses and treatment suggestions provided by the doctor.

The coauthors trained their models based on:

Google's Transformer architecture, an encoder and decoder architecture that takes the conversation history as inputs and generates the response. Self-attention is used to capture the long-range dependency among words.
OpenAI's GPT, a language model based on the Transformer decoder. When generating a response, GPT predicts the next word using its context, including the already-decoded words in this response and the conversation history.
BERT-GPT, an encoder-decoder architecture, where the pretrained BERT is used to encode the conversation history and GPT is used to decode the response.

Because direct training of the models on the relatively small data sets would result in poor generalization, the team leveraged transfer learning, which involves pretraining models on large corpora and then fine-tuning them on on the CovidDialog data sets. The pretraining corpora were largely blurbs from Reddit users, Wikipedia, Chinese chatbots, news, books, stories, and miscellaneous web texts.

In experiments post-training, the Transformer, GPT, and BERT-GPT models were tested against common metrics for evaluating machine translation, including perplexity (which is used to judge the quality and "smoothness" of generated responses) and entropy and dist (which are used to measure lexical diversity). They performed poorly overall, but one model -- the BERT-GPT model -- produced responses to patient questions that were more relevant, informative, and humanlike compared with the baselines, with correct grammar and semantics.

"In this work, we make the first attempt to develop dialogue systems that can provide medical consultations about [coronavirus]," wrote the researchers. "Experimental results show that these trained models are promising in generating clinically meaningful and linguistically high-quality consultations for [coronavirus]."

Both the data sets and code are available in open source.

More