One of the biggest challenges — and there are many — facing the artificial intelligence (AI) realm today is inherent biases created by limited training data. Researchers have already demonstrated how Amazon’s facial analysis software, for example, distinguishes gender among certain ethnicities less accurately than other services, while Democratic presidential hopeful Senator Elizabeth Warren has called on federal agencies to address questions around algorithmic bias, such as how the Federal Reserve deals with money lending discrimination.

Against this backdrop, “in-the-wild” software-testing company Applause is looking to “reinvent” AI testing with a new service that better detects AI bias by crowdsourcing larger training data sets.

By way of a brief recap, Massachusetts-based Applause, formerly known as uTest, offers companies like Google and Uber a different kind of app-testing platform, one that taps hundreds of thousands of “vetted” real-world users around the world to squish bugs and iron out usability issues — it’s all about harnessing the power of the crowd rather than running tests entirely in contrived laboratory settings. The company had raised north of $115 million before it was acquired by investment firm Vista Equity Partners in 2017.

Real-world results

A key facet of the Applause platform is not only the sheer number of crowd testers in its community, but the demographic diversity — spanning language, race, gender, location, culture, hobbies, and more. This will likely be among the main selling points as Applause looks to reappropriate its technology to offer companies access to diverse AI training data.

“Not only will this improve AI experiences for consumers everywhere, the breadth of the community also has the potential to mitigate bias concerns and make AI more representative of the real world,” said Applause product VP Kristin Simonini.

Applause’s AI training and testing service is offered across five core AI types covering voice, optical character recognition (OCR), image recognition, biometrics, and chatbots. If, for example, a company needs to quickly source varied training data for a virtual voice assistant, Applause users in various locales could be called upon to record and submit specific utterances. Equally, they could submit photos of objects or places or interact with chatbots to iron out any bias. They could even be asked to submit selfies and fingerprints if they’re testing biometric-based security products.

Perhaps more importantly, Applause promises speed and scale for both gathering training data and testing the outputs, allowing companies to garner rapid and iterative feedback from end users in real time. This could work like an ongoing feedback loop, with the gathered data used to improve AI algorithms and then retested on the the Applause community.

“Users want AI to be more natural, more human,” Simonini added. “Applause’s crowdsourced approach delivers what AI has been missing: a diverse and large collection of real human interactions prior to release.”

Similar initiatives out there at the moment include Amazon’s Mechanical Turk, which can be used to crowdsource data for machine learning experiments; DefinedCrowd, which helps create bespoke data sets for AI model training; and Germany’s Clickworker, which specifically focuses on machine vision and conversational AI.

Thanks to more than a decade of software testing with some of the biggest tech companies in the world, however, Applause is well-positioned to harness its existing presence in the developer community and offer vetted crowd testers to improve AI applications by reducing bias.