MIT CSAIL's AI can detect fake news and political bias

Fake news continues to rear its ugly head. In March of this year, half of the U.S. population reported seeing deliberately misleading articles on news websites. A majority of respondents to a recent Edelman survey, meanwhile, said they couldn't judge the veracity of media reports. And given that fake news has been shown to spread faster than real news, it's no surprise that almost seven in 10 people are concerned it might be used as a "weapon."

Researchers at the Massachusetts Institute of Technology's Computer Science and Artificial Intelligence Lab (CSAIL) and the Qatar Computing Research Institute believe they've engineered a partial solution. In a study that will be presented later this month at the 2018 Empirical Methods in Natural Language Processing (EMNLP) conference in Brussels, Belgium, they describe an artificially intelligent (AI) system that can determine whether a source is accurate or politically prejudiced.

The researchers used it to create an open source dataset of more than 1,000 news sources annotated with "factuality" and "bias" scores. They claim it's the largest of its kind.

"A [promising] way to fight 'fake news' is to focus on their source," the researchers wrote. "While 'fake news' [posts] are spreading primarily on social media, they still need a 'home', i.e., a website where they would be posted. Thus, if a website is known to have published non-factual information in the past, it is likely to do so in the future."

The novelty of the AI system lies in its broad contextual understanding of the mediums it evaluates. Rather than extracting features (the variables on which the machine learning model trains) from news articles in isolation, it considers crowdsourced encyclopedias, social media, and even the structure of URLs and web traffic data in determining trustworthiness.

It's built on a Support Vector Machine (SVM) -- a supervised system commonly used for classification and regression analysis -- that was trained to evaluate factuality and bias on a three-point (low, mixed, and high) and seven-point scale (extreme-left, left, center-left, center, center-right, right, extreme-right), respectively.

According to the team, the system only needs 150 articles to determine whether a new source can be trusted reliably. It's 65 percent accurate at detecting whether a news source has a high, low, or medium level of "factuality" and is 70 percent accurate at detecting whether it's left-leaning, right-leaning, or moderate.

On the articles front, it applies a six-prong test to the copy and headline, analyzing not just the structure, sentiment, engagement (in this case, the number of shares, reactions, and comments on Facebook), but also the topic, complexity, bias, and morality (based on the Moral Foundation theory, a social psychological theory intended to explain the origins of and variations in human moral reasoning). It calculates a score for each feature and then averages that score over a set of articles.

Wikipedia and Twitter also feed into the system's predictive models. As the researchers note, the absence of a Wikipedia page may indicate that a website isn't credible, or a page might mention that the source in question is satirical or expressly left-leaning. Moreover, they point out that publications without verified Twitter accounts, or those with recently created accounts that obfuscate their location, are less likely to be impartial.

The last two vectors the model takes into account are the URL structure and web traffic. It detects URLs that attempt to mimic those of credible news sources (e.g., "foxnews.co.cc" rather than "foxnews.com") and considers a website's Alexa Rank, a metric calculated by the number of overall pageviews it receives.

The team trained the system on 1,066 news sources from Media Bias/Fact Check (MBFC), a website with human fact-checkers who manually annotate sites with accuracy and bias data. To produce the aforementioned database, they set it loose on 10-100 articles per website (a total of 94,814).

As the researchers painstakingly detail in their report, not every feature was a useful predictor of factuality and/or bias. For example, some websites without Wikipedia pages or established Twitter profiles were unbiased, and news sources ranked highly in Alexa weren't consistently less biased or more factual than their less-trafficked competitors.

Interesting patterns emerged. Articles from fake news websites were more likely to use hyperbolic and emotional language, and left-leaning outlets were more likely to mention fairness and reciprocity. Publications with longer Wikipedia pages, meanwhile, were generally more credible, as were those with URLs containing a minimal number of special characters and complicated subdirectories.

In the future, the team intends to explore whether the system can be adapted to other languages (it was trained exclusively on English), and whether it can be trained to detect region-specific biases. And they plan to launch an app that'll automatically respond to news items with articles "that span the political spectrum."

“If a website has published fake news before, there’s a good chance they’ll do it again,” said Ramy Baly, lead author on the paper and a postdoctoral associate. “By automatically scraping data about these sites, the hope is that our system can help figure out which ones are likely to do it in the first place.”

Of course, they're not the only ones attempting to combat the spread of fake news with AI.

Delhi-based startup MetaFact taps natural language processing algorithms to flag misinformation and bias in news stories and social media posts. And AdVerify.ai, a software-as-a-service platform that launched in beta last year, parses articles for misinformation, nudity, malware, and other problematic content and cross-references a regularly updated database of thousands of fake and legitimate news items.

Facebook, for its part, has experimented with deploying AI tools that "identify accounts and false news," and it recently acquired London-based startup Bloomsbury AI to aid in its fight against misleading stories.

Some experts aren't convinced that AI's up to the task. Dean Pomerleau, a Carnegie Mellon University Robotics Institute scientist who helped organize the Fake News Challenge, a competition to crowdsource bias detection algorithms, told the Verge in an interview that AI lacked the nuanced understanding of language necessary to suss out untruths and false statements.

“We actually started out with a more ambitious goal of creating a system that could answer the question ‘Is this fake news, yes or no?'" he said. "We quickly realized machine learning just wasn’t up to the task.”

Human fact-checkers aren't necessarily better. This year, Google suspended Fact Check, a tag that appeared next to stories in Google News that "include information fact-checked by news publishers and fact-checking organizations," after conservative outlets accused it of exhibiting bias against them.

Whatever the ultimate solution -- whether AI, human curation, or a mix of both -- it can't come fast enough. Gartner predicts that by 2022, if current trends hold, a majority of people in the developed world will see more false than true information.