Researchers propose AI for detecting fraudulent crowdfunding campaigns

Crowdfunding has become the de facto way to support individual ventures and philanthropic efforts. But as crowdfunding platforms have risen to prominence, they've also attracted malicious actors who take advantage of unsuspecting donors. Last August, a report from the Verge investigated the Dragonfly Futurefön, a decade-long fraud operation that cost victims nearly $6 million and caught the attention of the FBI. Two years ago, the U.S. Federal Trade Commission announced it was looking into a campaign for a Wi-Fi-enabled, battery-powered backpack that disappeared with more than $700,000.

GoFundMe previously said fraudulent campaigns make up less than 0.1% of all those on its platform, but with millions of new projects launching each year, many bad actors are able to avoid detection. To help catch them, researchers at the University College London, Telefonica Research, and the London School of economics devised an AI system that takes into account textual and image-based features to classify fraudulent crowdfunding behavior at the moment of publication. They claim it's up to 90.14% accurate at distinguishing between fraudulent and legitimate crowdfunding behavior, even without any user or donation activity.

While two of the largest crowdfunding platforms on the web -- GoFundMe and Kickstarter -- employ forms of automation to spot potential fraud, neither claims to take the AI-driven approach advocated by the study coauthors. A spokesperson for GoFundMe told VentureBeat the company relies on the "dedicated experts" on its trust and safety team, who use technology "on par with the financial industry" and community reports to spot fraudulent campaigns. To do this, they look at things like:

Whether the campaign abides by the terms of service
Whether it provides enough information for donors
Whether it's plagiarized
Who started the campaign
Who is withdrawing funds
Who should be receiving funds

Kickstarter says it doesn't use AI or machine learning tools to prevent fraud, excepting proprietary automated tools, and that the majority of its investigative work is performed manually by looking at what signals surface and analyzing them to guide any action taken. A spokesperson told VentureBeat that in 2018 Kickstarter's team suspended 354 projects and 509,487 accounts and banned 5,397 users for violating the company's rules and guidelines -- 8 times as many as it suspended in 2017.

The researchers would argue those efforts don't go far enough. "We find that fraud is a small percentage of the crowdfunding ecosystem, but an insidious problem. It corrodes the trust ecosystem on which these platforms operate, endangering the support that thousands of people receive year on year," they wrote. "[Crowdfunding platforms aren't properly] incentivized to combat fraud among users and the campaigns they launch: On the one hand, a platform's revenue is directly proportional to the number of transactions performed (since the platform charges a fixed amount per donation); on the other hand, if a platform is transparent with respect to how much fraud it has, it may discourage potential donors from participating."

To build a corpus that could be used to "teach" the above-mentioned system to pick out fraudulent campaigns, the researchers sourced entries from GoFraudMe, a resource that aims to catalog fraudulent cases on the platform. They then created two manually annotated data sets focusing on the health domain, where the monetary and emotional stakes tend to be high. One set contained 191 campaigns from GoFundMe's medical category, while the other contained 350 campaigns from different crowdfunding platforms (Indiegogo, GoFundMe, MightyCause, Fundrazr, and Fundly) that were directly related to organ transplants.

Human annotators labeled each of the roughly 700 campaigns in the corpora as "fraud" or "not-fraud" according to guidelines that included factors like evidence of contradictory information, a lack of engagement on the part of donors, and participation of the creator in other campaigns. Next, the researchers examined different textual and visual cues that might inform the system's analysis:

Sentiment analysis: The team extracted the sentiments and tones expressed in campaign descriptions using IBM's Watson natural language processing service. They computed the sentiment as a probability across five emotions (sadness, joy, fear, disgust, and anger) before analyzing confidence scores for seven possible tones (frustration, satisfaction, excitement, politeness, impoliteness, sadness, and sympathy).
Complexity and language choice: Operating on the assumption that fraudsters prefer simpler language and shorter sentences, the researchers checked language complexity and word choice in the campaign descriptions. They looked at both a series of readability scores and language features like function words, personal pronouns, and average syllables per word, as well as the total number of characters.
Form of the text: The coauthors examined the visual structure of campaign text, looking at things like whether the letters were all lowercase or all uppercase and the number of emojis in the text.
Word importance and named-entity recognition: The team computed word importance for the text in the campaign description, revealing similarities (and dissimilarities) among campaigns. They also identified proper nouns, numeric entities, and currencies in the text and assigned them to a finite set of categories.
Emotion representation: The researchers repurposed a pretrained AI model to classify campaign images as evoking one of eight emotions (amusement, anger, awe, contentment, disgust, excitement, fear, and sadness) by fine-tuning it on 23,000 emotion-labeled images from Flickr and Instagram.
Appearance and semantic representation: Using another AI model, the researchers extracted image appearance representations that provided a description of each image, like dominant colors, the textures of the edges of segments, and the presence of certain objects. They also used a face detector algorithm to estimate the number of faces present in each image.

After boiling many thousands of possible features down to 71 textual and 501 visual variables, the researchers used them to train a machine learning model to automatically detect fraudulent campaigns. Arriving at this ensemble model required building sub-models to classify images and text as fraudulent or not fraudulent and combining the results into a single score for each campaign.

The coauthors claim their approach revealed peculiar trends, like the fact that legitimate campaigns are more likely to have images with at least one face compared with fraudulent campaigns. On the other hand, fraudulent campaigns are generally more desperate in their appeals, in contrast with legitimate campaigns' descriptiveness and openness about circumstances.

"In recent years, crowdfunding has emerged as a means of making personal appeals for financial support to members of the public ... The community trusts that the individual who requests support, whatever the task, is doing so without malicious intent," the researchers wrote. "However, time and again, fraudulent cases come to light, ranging from fake objectives to embezzlement. Fraudsters often fly under the radar and defraud people of what adds up to tens of millions, under the guise of crowdfunding support, enabled by small individual donations. Detecting and preventing fraud is thus an adversarial problem. Inevitably, perpetrators adapt and attempt to bypass whatever system is deployed to prevent their malicious schemes."

It's possible that the system might be latching onto certain features in making its predictions, exhibiting a bias that's not obvious at first glance. That's why the coauthors plan to improve it by taking into account sources of labeling bias and test its robustness against unlabeled medically related campaigns across crowdfunding platforms.

"This is a significant step in building a system that is preemptive (e.g., a browser plugin) as opposed to reactive," they wrote. "We believe our method could help build trust in this ecosystem by allowing potential donors to vet campaigns before contributing."

More