Researchers affiliated with the University of Washington and Allen Institute for Artificial Intelligence say they’ve developed an AI system — VeriSci — that can automatically fact-check scientific claims. Ostensibly, the system can not only identify abstracts within studies that support or refute the claims, but can also provide rationales for their predictions in the form of evidence extracted from the abstracts.

Automated fact-checking could help to address the reproducibility crises in scientific literature, in which it’s been found that many studies are difficult (or impossible) to replicate. A 2016 poll of 1,500 scientists reported that 70% of them had tried but failed to reproduce at least one other scientist’s experiment. And in 2009, 2% of scientists admitted to falsifying studies at least once, and 14% admitted to personally knowing someone who did.

The Allen Institute and University of Washington team sought to tackle the problem with a corpus — SciFact — containing (1) scientific claims, (2) abstracts supporting or refuting each claim, and (3) annotations with justifying rationales. They curated it with a labeling technique that makes use of citation sentences, a source of naturally occurring claims in the scientific literature, after which they trained a BERT-based model to identify rational sentences and label each claim.

The SciFact data set comprises 1,409 scientific claims fact-checked against a corpus of 5,183 abstracts, which were collected from a publicly available database (S2ORC) of millions of scientific articles. To ensure that only high-quality articles were included, the team filtered for articles with fewer than 10 citations and partial text, randomly sampling from a collection of well-regarded journals spanning domains from basic science (e.g., Cell, Nature) to clinical medicine.

To label SciFact, the researcher recruited a team of annotators, who were shown a citation sentence in the context of its source article and asked to write three claims based on the content while ensuring the claims conformed to their definition. This resulted in so-called “natural” claims where the annotators didn’t see the article’s abstract at the time they wrote the claims.

A scientific natural language processing expert created claim negations to obtain examples where an abstract refutes a claim. (Claims that couldn’t be negated without introducing obvious bias or prejudice were skipped.) Annotators labeled claim-abstract pairs as Supports, Refutes, or Not Enough Info, as appropriate, identifying all rationales in the case of Supports or Refutes labels. And the researchers introduced distractors such that for each citation sentence, articles cited in the same document as the sentence were sampled but in a different paragraph.

VeriSci

Above: Results of VeriSci on several claims concerning COVID-19. In some cases, the label is predicted
given the wrong context; the third evidence sentence for the first claim is a finding about lopinavir, but for the
wrong disease (MERS-CoV).

The model trained on SciFact — VeriSci — consists of three parts: Abstract Retrieval, which retrieves abstracts with the highest similarity to a given claim; Rationale Selection, which identifies rationales for each candidate abstraction; and Label Prediction, which makes the final label prediction. In experiments, the researchers say that about half of the time (46.5%), it was able to correctly identify Supports or Refutes labels and provide reasonable evidence to justify the decision.

To demonstrate VeriSci’s generalizability, the team conducted an exploratory experiment on a data set of scientific claims about COVID-19. They report that a majority of the COVID-related claims produced by VeriSci — 23 out of 36 —  were deemed plausible by a medical student annotator, demonstrating the model could successfully retrieve and classify evidence.

The researchers concede that VeriSci is far from perfect, namely because it becomes confused by context and because it doesn’t perform evidence synthesis, or the task of combining information across different sources to inform decision-making. That said, they assert their study demonstrates how fact-checking might work in practice while shedding light on the challenge of scientific document understanding.

“Scientific fact-checking poses a set of unique challenges, pushing the limits of neural models on complex language understanding and reasoning. Despite its small size, training VeriSci on SciFact leads to better performance than training on fact-checking datasets constructed from Wikipedia articles and political news,” wrote the researchers. “Domain-adaptation techniques show promise, but our findings suggest that additional work is necessary to improve the performance of end-to-end fact-checking systems.”

The publication of VeriSci and SciFact follows the Allen Institute’s release of Supp AI, an AI-powered web portal that lets consumers of supplements like vitamins, minerals, enzymes, and hormones identify the products or pharmaceutical drugs with which they might adversely interact. More recently, the nonprofit updated its Semantic Scholar tool to search across 175 million academic papers.