In a preprint paper published this week on Arxiv.org, a team of researchers from the Mayo Clinic and Nference, a startup developing tech that analyzes text from biomedical publications, report that they’ve used AI to isolate phenotypes characteristic of the coronavirus. They claim that a specific combination of cough and diarrhea, along with anosmia (a loss of taste or smell) and excessive sweating, constitute some of the earliest electronic medical record-derived signatures of COVID-19, showing up at up to 4 to 7 days prior to testing.

The coauthors’ approach could be used to spot and triage early cases of coronavirus, perhaps lightening the load on overwhelmed hospitals. While there’s no cure for COVID-19 yet, preliminary studies suggest that early diagnosis can dramatically improve health outcomes.

To conduct their analysis, the team employed a natural language processing system designed to automate the recognition of diseases, drugs, phenotypes, and other entities; quantify the strength of contextual associations between those entities; and classify each association as “positive,” “negative,” or “other.” It incorporates Google’s Transformer architecture, which contains neurons (mathematical functions) arranged in layers that transmit signals from data and adjust the strength (weights) of each connection. All AI models learn to make predictions this way, but Transformers uniquely have attention such that every output element is connected to every input element. The weightings between each element are calculated dynamically.

The system ingested 8,22,9092 clinical notes of electronic medical records from the Mayo Clinic for 14,967 patients who had undergone PCR testing, a form of test used to detect antigen presence (272 patients in the data set were confirmed to have COVID-19). Symptoms and putative symptoms were extracted from the notes both a few weeks prior to and a few weeks after the date when the PCR test was taken.

The AI-extracted info reveals that diarrhea occurred in 43 COVID-19-positive patients (15.8%) in the week prior to PCR testing, whereas only 822 COVID-19-negative patients (5.6%) had diarrhea. An altered or diminished sense of taste or smell was also amplified in COVID-19 patients, as were, to a lesser degree, excessive sweating (31 patients, or 11.4%), fatigue (37, or 13.6%), headache (35, or 12.9%), and cough. Interestingly, despite evidence to the contrary, fever and chills were found to be somewhat nonspecific to those with COVID-19, at least in this patient population — 24.6% of COVID-19-positive patients had a fever a week prior to the PCR test versus 18.6% of COVID-19-negative patients.

In a further analysis of the data, out of 251 possible conjunctions of 27 phenotypes for COVID-19-positive compared with COVID-19-negative patients, two phenotypes — (1) cough and diarrhea and (2) sweating and diarrhea — were found to be “particularly significant.” Cough and diarrhea co-occurred in 36 patients with COVID-19 (13.2%) and in 486 of patients without COVID-19 (3.3%), indicating a 4-fold amplification, while diaphoresis and diarrhea co-occurred in 21 COVID-19 patients (7.7%) versus 204 patients without COVID-19 (1.4%).

“Our findings from the EHR analysis of COVID-19 progression can aid in a human pathophysiology enabled summary of the experimental therapies being investigated for COVID-19,” concluded the coauthors. “A caveat of relying solely on [electronic medical record] inference is that mild phenotypes that may not lead to a presentation for clinical care, such as anosmia, may go unreported in otherwise asymptomatic patients. As at-home serology-based tests for COVID-19 with high sensitivity and specificity are approved, capturing these symptoms will become increasingly important in order to facilitate the continued development and refinement of disease models. EHR-integrated digital health tools may help address this need.”

The work was a part of the Mayo Clinic’s ongoing collaboration with Cambridge-based Nference, a participant in Mayo’s Clinical Data Analytics Platform program. Since January, Nference’s chief focus has been identifying targets and biomarkers for new drugs, matching patients with therapeutic regimens, and devising applications such as label expansion, postmarketing surveillance, and drug purposing.