Skip to main content

How video games could be used to generate AI training data

New Super NES controller for Switch.

New Super NES controller for Switch.

Image Credit: Nintendo

Connect with top gaming leaders in Los Angeles at GamesBeat Summit 2023 this May 22-23. Register here.

AI, like humans, learns from examples. Given enough data and time, an AI model can make sense of the statistical relationships well enough to generate predictions. That’s how OpenAI’s GPT-3 writes text from poetry to computer code, and how apps like Google Lens recognize objects such as lampshades in photos of bedrooms.

Historically, the data to train as well as test AI has come mostly from public sources on the web. But these sources are flawed. For example, Microsoft quietly removed a dataset with more than 10 million images of people after it came to light that some subjects weren’t aware that they’d been included. Datasets created from local TV news segments are likely to negatively portray Black men because the news often covers crime in a sensationalized, racist way. And the data used to train AI to detect people’s expressed emotions from their faces have been found to contain more happy faces than sad ones because users tend to post happier images of themselves on social media.

Because AI systems tend to amplify biases, dodgy data has led to algorithms that perpetuate poor medical treatment, sexist recruitment and hiringageist ad targetingerroneous grading and comment moderation, and racist recidivism and loan approval. Prejudicial data also fed photo-cropping apps that disfavored darker-skinned individuals and image-recognition algorithms that labeled Black users as “gorillas,” not to mention APIs that identified thermometers held by Black people as “guns.”

As the AI community grapples with the issues around — and the consequences of — using public data, researchers have begun exploring potentially less problematic ways of creating AI datasets. Some of the proposals gamify the collection process, while others monetize it. But while there isn’t consensus on approach, there’s a growing recognition of the harm perpetuated by data collection in the past — and the need to address it.


GamesBeat Summit 2023

Join the GamesBeat community in Los Angeles this May 22-23. You’ll hear from the brightest minds within the gaming industry to share their updates on the latest developments.

Register Here

“With diversified data sources and with quality control, datasets could be sufficiently representative and AI biased could be minimized. They should be the goals and they are achievable,” Chuan Yue, an associate professor at the Colorado School of Mines, told VentureBeat via email. “Crowdsourcing data is viable and is often indispensable not only for researchers in many disciplines but also for AI applications. [But while it] can be ethical, many things must be done in the long run to make it ethical.”

Underrepresented data

Web data doesn’t reflect the world’s diversity. To take one example, languages in Wikipedia-based datasets — used to train systems like GPT-3 — vary not only in size but in the number of edits they contain. (Obviously, not all speakers of a language are literate or have access to Wikipedia.) Beyond Wikipedia, ebooks in some languages — another popular data source — are more commonly available as scanned images versus text, which require processing with optical character recognition tools that can dip to as low as 70% in accuracy.

Researchers have in recent years attempted to crowdsource more diverse datasets — with mixed results. Contributors to Hugging Face’s open-access BigScience project produced a catalog of nearly 200 resources for AI language model training. But Common Voice, Mozilla’s effort to build an open collection of transcribed speech, has vetted only dozens of languages since its 2017 launch.

The hurdles have led experts like Aditya Ponnada, a research scientist at Spotify, to investigate different ways to gamify the data collection process. As a Ph.D. student at Northeastern University’s Personal Health Informatics program, he helped design games that encouraged people to volunteer wellness data by solving game-like puzzles.

“One of the focuses of our lab is to [develop] personalized algorithms for detecting everyday physical activity and sedentary behavior using wearable sensors,” Ponnada told VentureBeat via email. “A part of this process is … labeling and annotating the sensor data or activities (e.g., walking, running, cycling) in tandem (or as close as possible in time to the actual activity). [This] motivated us to come up with ways in which we can get annotations on such large datasets … to build robust algorithms. We wanted to explore the potential of using games to gather labels on large scale noisy sensor data to build robust activity recognition algorithms.”

Most AI learns to make predictions from annotations appended to data like text, photos, videos, and audio recordings. These “supervised” models are trained until they can detect the relationships between the annotations (e.g., a picture of a bird) and output results (e.g., the caption “bird”). During training, the AI learns which output is related to each input, measuring the resulting outputs and fine-tuning the model to get closer to the target accuracy.

Games have been used to crowdsource data in the recent past, particularly in domains like protein molecule folding, RNA behavior, and complex genome sequencing. In 2017, the Computational Linguistics and Information Processing Laboratory at the University of Maryland launched a platform dubbed Break It, Build It, which let researchers submit models to users tasked with coming up with examples to defeat them. A 2019 paper described a setup where trivia enthusiasts were instructed to craft questions for AI models validated via live human-computer matches. And Meta (formerly Facebook) maintains a platform called Dynabench that has users “fool” models designed to analyze sentiment, answer questions, detect hate speech, and more.

In Ponnada’s research, he and colleagues tested two games: An “endless runner-type” level similar to Temple Run and a pattern matching puzzle akin to (but not exactly like) Phylo. The team found that players, which were recruited through Amazon Mechanical Turk, performed better with the puzzles to label sensor data — perhaps because the puzzles enabled players to solve problems at their own pace.

“[Large groups of players have] a lot of creative potential to solve complex problems. This is where games create an environment where the complex problems feel less like a monotonous task and more like a challenge, an appeal to intrinsic motivation,” Ponnada said. “[Moreover,] games enable novel interfaces or ways of interacting with computers. For instance, in pattern matching games, it is the mechanics and the finger swipe interactions (or other drag or toss interactions on the smartphones) that make the experience more engaging.”

Building on this idea, Synesis One, a platform founded by Mind AI CEO Paul Lee, aims to develop games that on the backend create datasets to train AI. According to Lee, Synesis One — which reportedly raised $9.5 million in an initial coin offering in December — will be used to bolster some of the natural language models that Mind AI, an “AI-as-a-service” provider, already offers to customers.

Synesis One is scheduled to launch in early 2022 with Quantum Noesis, a “playable graphic novel” that has players use “wits and creativity” to solve word puzzles. Quantum Noesis requires virtual currency called Kanon to access. But in a twist on the usual pay-to-play formula, Kanon can also earn players rewards as they complete various challenges in the game and contribute data. For example, Kanon holders that purchase non-fungible tokens of words in the puzzles will earn income whenever the words are used by one of Mind AI’s enterprise customers, Lee claims.

A screenshot from Quantum Noesis.

“Humans don’t like banal work. Any rote work of this nature needs to be transcended, and gamifying work allows us to do just that. We’re creating a new way to work — one that’s more engaging and more fun,” Lee told VentureBeat via email. “With a little extra work on our side, the point of gamification is to attract more and different users than an interface like Wikipedia has, which is all business, no pleasure. There might be bright young minds out there who would not be attracted to the traditional crowdsourcing platforms, so this strategy provides a method to drive interest.”


But for all the advantages games offer when it comes to dataset collection, it’s not clear they can overcome all the shortcomings of existing, non-game crowdsourcing platforms. Wired last year reported on the susceptibility of Amazon Mechanical Turk to automated bots. Bots aside, people bring problematic biases to the table. In a study led by the Allen Institute for AI, scientists found that labelers are more likely to annotate phrases in the African American English (AAE) dialect more toxic than their general American English equivalents, despite their being understood as non-toxic by AAE speakers. (AAE, a dialect associated with the descendants of slaves in the South, is primarily — but not exclusively — spoken by Black Americans.)

Beyond the bias issue, high-quality annotations require domain expertise — as Synced noted in a recent piece, most labelers can’t handle “high-context” data such as legal contract classification, medical images, or scientific literature. Games, like crowdsourcing platforms, need a computer or mobile device and an internet connection to play — barring participation. And they threaten to depress wages in a field where the pay tends to be extremely low. The annotators of the widely used ImageNet computer vision dataset made a median wage of $2 per hour, one study found — with only 4% making more than $7.25 per hour.

Being human, people also make mistakes — sometimes major ones. In an MIT analysis of popular datasets, the researchers found mislabeled images (like one breed of dog being confused for another), text sentiment (like Amazon product reviews described as negative when they were actually positive), and audio of YouTube videos (like an Ariana Grande high note being categorized as a whistle).

“The immediate challenges are fairness, representative, biases, and quality control,” Yue said. “A better method of data collection is to collect data from multiple sources including multiple crowdsourcing platforms, local communities, and some special targeted populations. In other words, data sources should be diversified.  Meanwhile, quality control is crucial in the entire process. Here quality control should be interpreted from a broader viewpoint including if the data are responsibly provided, if the data integrity is ensured and if data samples are sufficiently representative.”

Accounting for the potential pitfalls, Ponnada believes that games are only suited for certain dataset collection tasks, like solving puzzles that researchers can then verify for their applications. “Games for crowdsourcing” make most sense for players who have a motivation to either just play games or play games specifically to support science, he asserts.

“While I agree that fair pay is very important for crowd workers, games appeal more to a very specific division within the crowd workers who are motivated to play games — especially games with a purpose,” Ponnada said. “I believe designing [mobile-first] games for these casual game players in the crowd might yield results faster for complex problems. [G]ames built for crowdsourcing purposes have [historically] appealed to a specific age group that can adapt to games faster [and] the goal has been to play the games in spare time. [But] it is possible that there is an untapped potential gaming audience (e.g., older adults) that can contribute to games.”

Lee agrees with the notion that games could attract a more diverse pool of annotators — assuming that corporate interests don’t get in the way. But he points to a major challenge in designing games for scientific discovery: making them easy to understand — and fun. If the tutorials aren’t clear and the gameplay loops aren’t appealing, the game won’t accomplish what it was intended to do, he says.

“Some [dataset collection efforts] we’ve seen may have been gamified, but they’ve been done so poorly that no one wants to play. You can see [other] examples in some kids’ educational video games. That’s the real challenge — to do it well is an art. And different topics lend themselves more or less of a challenge to gamify well,” Lee said. “[That’s why] we’re going to create a number of titles that reach people with diverse interests. We believe that we’ll be able to create a new way of working that appeals to a really large crowd.”

GamesBeat's creed when covering the game industry is "where passion meets business." What does this mean? We want to tell you how the news matters to you -- not just as a decision-maker at a game studio, but also as a fan of games. Whether you read our articles, listen to our podcasts, or watch our videos, GamesBeat will help you learn about the industry and enjoy engaging with it. Discover our Briefings.