In a paper highlighted today in a Facebook blog post, engineers describe an algorithm — SybilEdge — to detect fake accounts that evade Facebook’s anti-abuse filters at registration time but that haven’t friended enough people to perpetuate abuse. The goal is to mitigate the accounts’ ability to launch attacks against other users, in part by comparing the way users add friends to their extended social networks.
SybilEdge — which can detect fake Facebook accounts less than a week old with fewer than 20 friend requests — has immediate application for platforms dealing with a wave of misleading information about the coronavirus pandemic. An analysis published by the Reuters Institute for the Study of Journalism at the University of Oxford found that 33% of people have seen some form of misinformation about COVID-19 on social networks like Twitter, Facebook, and YouTube.
In architecting SybilEdge, the development team noted that abusers need to connect to targets in order to launch abuse — that is, they need to find targets, send them a friend request, and have the request accepted. Perhaps unsurprisingly, internal Facebook studies revealed that non-abusers differ in both their selection of friends and those friends’ responses to their friend requests: Fake accounts’ requests were rejected more often than real users’ requests. In addition, fake accounts were often careful when picking their friend request targets, likely to maximize the probability of their requests being accepted.
Facebook created a corpus with which to train SybilEdge by segmenting users into two groups: those more likely to accept friend requests from real accounts and those likely to accept fake account requests. If the former rejects an incoming request, it serves to signal that the requester is a legitimate user. On the other hand, if the users who accept more fake requests accept a request, it indicates that the requester was likely fake.
SybilEdge works in two stages. First, it’s trained by observing the aforementioned samples over time, after which it leverages outputs from Facebook’s behavioral and content classifiers that flag accounts based on actual abuse. This training phase provides the model with all the necessary parameters (i.e., configuration variables estimated from data and required by the model when making predictions) to run in real time for each friend request and response and update the probability of the requester being fake.
Facebook claims that SybilEdge is above 90% accurate at detecting fake accounts with 15 or fewer friend requests on average and 80% accurate at detecting fake accounts with 5 friend requests. Moreover, unlike the baselines with which it was compared, its performance doesn’t degrade with more friend requests (over 45).
“SybilEdge helps us identify abusers quickly and in a way that can be explained and analyzed. In the near future, we plan to look at additional ways that can further speed up the detection of abusive accounts and help make confident decisions even faster than SybilEdge. We plan to accomplish this by mixing feature-based and behavior-based models,” wrote Facebook.
Facebook is broadly moving toward an AI training technique called self-supervised learning, in which unlabeled data is used in conjunction with small amounts of labeled data to produce an improvement in learning accuracy. Facebook’s deep entity classification (DEC) machine learning framework was responsible for a 20% reduction in abusive accounts on the platform in the two years since it was deployed. And in a separate experiment, Facebook researchers were able to train a language understanding model that made more precise predictions with just 80 hours of data compared with 12,000 hours of manually labeled data.