Supp AI uses machine learning to identify supplement interactions

In 2015, The Allen Institute for Artificial Intelligence -- the research organization founded by late Microsoft cofounder Paul Allen -- released Semantic Scholar, a public AI search engine capable extracting figures from over 173 million computer science and biomedicine journal papers. It received a warm reception, but researchers at the Institute wondered if its underlying algorithms might be adapted to solve other problems in the field of medical research.

To this end, the Allen Institute this week launched Supp AI, a web portal that lets consumers of supplements like vitamins, minerals, enzymes, and hormones identify the products or pharmaceutical drugs with which they might adversely interact. Using a no-frills search bar, they're able to type in trade names for common drugs (e.g., Prozac and Sarafem) and names of active drug ingredients (fluoxetine) to bubble up sentences from research papers supporting interactions alongside links to each source.

A search for the supplement ginkgo, for instance, yields 140 possible interactions to things like Warfarin and nitric oxide.

Supp AI not only surfaces all chemicals or drugs that might interact with a queried supplement, but it helpfully sorts the evidence sentences and prioritizes source papers based on associated metadata. Factors that play into the ultimate ordering include (but aren't limited to) non-retracted studies, clinical trials, human studies, and recency.

Lucy Lu Wang, a PhD student in biomedical and health informatics at the University of Washington and a lead author on the paper, said that one of Supp AI's audiences is the estimated 88% of adults aged 65 and older who take dietary supplements. According to a recent study published in JAMA Internal Medicine, as many as 15% are at risk for potential major drug interactions.

"We [set out to build a system] that can perform massive retrieval of [drug-supplement interaction] evidence over the scientific literature, and then organize it and make it available for consumers, physicians, for researchers -- for anyone who's looking to discover information about supplements at scale," Wang told VentureBeat in a phone interview earlier on Tuesday. "[Our model] looks at the texts from the site of the articles and retrieves sentences that support these interactions."

As Wang and colleagues explain in a paper detailing Supp AI's creation, studies on supplement-drug interactions largely rely on manual curation of the literature, and their results are often difficult to aggregate. Consumer-facing websites like the NIH Office of Dietary Supplements and WebMD often feature incomplete information about common supplements, while drug databases like DrugBank, RxNorm, and the National Drug File Reference Terminology contain insufficient coverage of dietary supplement terminology.

In pursuit of a better alternative, the team tapped an AI model to extract evidence for supplement-drug interactions and supplement-supplement interactions from roughly 22 million papers in PubMed via Semantic Scholar, a free search engine maintained by the U.S. National Institutes of Health. Leveraging similarities between supplement-drug and supplement-supplement interactions, they used labeled data for categorizing drug-drug interactions to train and fine-tune a supplement interactions evidence extractor dubbed BERT-DDI.

BERT-DDI incorporates Google's Bidirectional Encoder Representations from Transformers, or BERT, which accesses context from both past and future directions in an unsupervised fashion (meaning it can ingest data that’s neither classified nor labeled) to model relationships among sentences. Trained on the annotated drug-drug interaction data, the model learned to classify sentences which Supp AI could collate and deliver to users.

Isolating the 1,923 supplements and 2,727 drugs Supp AI can recognize was a painstaking process that necessitated manual review, so that non-supplements like foods, weeds, and herbicides could be removed from the corpus. Equally arduous was compiling a list of 15,252 drug concept unique identifiers (CUIs), or identifiers for the National Library of Medicine's clinical biomedical dictionary.

Prior to Supp AI's deployment, 29.5 million automatically labeled sentences were fed to BERT-DDI model. The curated lists of supplement and drug CUIs were used to remove irrelevant sentences and to group together related evidence, and each sentence was annotated with source paper metadata such as the title, author, publication venue, retraction status, and year of publication.

To evaluate the quality of the extracted sentences, Wang and colleagues sampled 200 and manually labeled them for mentions of supplement interactions. They report that the BERT-DDI model achieved an accuracy of 87%, precision of 77% (the ratio of correctly predicted positive observations to the total predicted positive observations), and recall of 96% (the ability of the classifier to find all the positive samples).

Wang said the near-term goal is to regularly update Supp AI with the latest information extracted from new papers as they're incorporated into the Semantic Scholar corpus. Future releases might indicate the severity of an interaction, or they might suggest supplements that enhance the effects of other supplements or drugs and list scenarios in which those supplements or drugs have been shown to be effective.

"We've reached out to a number of clinicians and health care researchers for feedback, and it's been largely positive," said Wang. "I think one of the best attributes of this tool is that it's freely available for everyone who wants to use it. It opens up the field of supplement interactions and information to a broader audience, and it also opens it up to people in low-resource settings."

More