Microsoft claims its AI framework spots fake news better than state-of-the-art baselines

In a study published this week on the preprint server Arxiv.org, Microsoft and Arizona State University researchers propose an AI framework -- Multiple sources of Weak Social Supervision (MWSS) -- that leverages engagement and social media signals to detect fake news. They say that after training and testing the model on a real-world data set, it outperforms a number of state-of-the-art baselines for early fake news detection.

If the system's accuracy is as claimed and it makes its way into production, it could help combat the spread of false and misleading information about U.S. presidential candidates and other controversial topics. A survey conducted in 2018 by the Brookings Institute found that 57% of U.S. adults saw fake news during the 2018 elections and that 19% believe it influenced their vote.

Many fake news classifiers in the academic literature rely on signals that require a long time to aggregate, making them unsuitable for early detection, the paper's coauthors explain. Moreover, some rely solely on signals that are easily influenced by biased or inauthentic user feedback.

In contrast, the researchers' system employs supervision from multiple sources involving users and their respective social engagements. Specifically, it taps a small amount of manually annotated data and a large amount of weakly annotated data -- i.e., data with a lot of noise -- for joint training in a meta-learning AI framework.

A module dubbed label weighting network (LWN) models the weight of the weak labels that regulate the learning process of the fake news classifier, taking what the researchers refer to as an instance -- for example, a news piece -- and its label as input. It outputs a value representing the importance weight for the pair, which determines the influence of the instance in training the fake news classifier. To allow information sharing among different weak signals, a shared feature extractor works alongside the LWN to learn a common representation and to use functions to map features to different weak label sources.

The Microsoft researchers tapped the open source FakeNewsNet data set to benchmark their system, which contains news content (including meta attributes like body text) with labels annotated by experts from the fact-checking websites GossipCop and PolitiFact, along with social context information such as tweets about news articles. They enhanced it with a corpus of 13 sources, including mainstream British news outlets, such as the BBC and Sky News, and English-language versions of Russian news outlets like RT and Sputnik, with content mostly related to politics.

To generate weak labels, the researchers measured the sentiment scores for users sharing pieces of news and then determined the variance between those scores, such that articles for which the sentiments widely varied were labeled as fake. They also produced sets of people with known public biases and calculated scores based on how closely a user's interests matched with those sets, operating on the theory that news shared by biased users was more likely to be fake. Lastly, they measured credibility by clustering users based on their meta-information on social media so that users who formed big clusters (which might indicate a bot network or malicious campaign) were considered less credible.

In tests, the researchers say the best-performing model, which incorporated Facebook's RoBERTA natural language processing algorithm and trained on a combination of clean and weak data, accurately detected fake news in GossipCop and PolitiFact 80% and 82% of the time, respectively. That's upwards of 7 percentage points better than the baseline models.

The team plans to explore other techniques in future work, like label correction methods for obtaining high-quality weak labels. They also hope to extend their framework to consider other types of weak supervision signals from social networks, leveraging the timestamps of engagements.

These researchers aren't the only ones attempting to combat the spread of fake news with AI, of course. In a recent study, MIT's Computer Science and Artificial Intelligence Laboratory developed an AI system to spot misleading news articles. Jigsaw late last year released Assembler, an AI-powered suite of fake news-spotting tools for media organizations. AdVerif.ai, a software-as-a-service platform that launched in beta last year, parses articles for misinformation, nudity, malware, and other problematic content and cross-references a regularly updated database of thousands of fake and legitimate news items. For its part, Facebook has experimented with deploying AI tools that "identify accounts and false news."