The Transform Technology Summits start October 13th with Low-Code/No Code: Enabling Enterprise Agility. Register now!
Many real-world problems require complex coordination between multiple agents — e.g., people or algorithms. A machine learning technique called multi-agent reinforcement learning (MARL) has shown success with respect to this, mainly in two-team games like Go, DOTA 2, StarCraft, hide-and-seek, and capture the flag. But the human world is far messier than games. That’s because humans face social dilemmas at multiple scales, from the interpersonal to the international, and they must decide not only how to cooperate but when to cooperate.
To address this challenge, researchers at OpenAI propose training AI agents with what they call randomized uncertain social preferences (RUSP), an augmentation that expands the distribution of environments in which reinforcement learning agents train. During training, agents share varying amounts of reward with each other; however, each agent has an independent degree of uncertainty over their relationships, creating “asymmetry” that the researchers hypothesize pressures agents to learn socially reactive behaviors.
To demonstrate RUSP’s potential, the coauthors had agents play Prisoner’s Buddy, a grid-based game where agents receive a reward for “finding a buddy.” On each timestep, agents act by either choosing another agent or deciding to choose no one and sitting out the round. If two agents mutually choose each other, they each get a reward of +2. If agent Alice chooses Bob but the choice isn’t reciprocated, Alice receives -2 and Bob receives +1. Agents that choose no one receive 0.
Three top investment pros open up about what it takes to get your video game funded.
The coauthors also explored preliminary team dynamics in a much more complex environment called Oasis. It’s physics-based and tasks agents with survival; their reward is +1 for every timestep they remain alive and a large negative reward when they die. Their health decreases with each step, but they can regain health by eating food pellets and can attack others to reduce their health. If an agent is reduced below 0 health, it dies and respawns at the edge of the play area after 100 timesteps.
There’s only enough food to support two of the three agents in Oasis, creating a social dilemma. Agents must break symmetry and gang up on the third to secure the food source to stay alive.
RUSP agents in Oasis performed much better than a “selfish” baseline in that they achieved higher reward and died less frequently, the researchers report. (For agents trained with high uncertainty levels, up to 90% of the deaths in an episode were attributable to a single agent, indicating that two agents learned to form a coalition and mostly exclude the third from the food source.) And in Prisoner’s Buddy, RUSP agents successfully partition into teams that tended to be stable and maintained throughout an episode.
The researchers note that RUSP is inefficient — with the training setup in Oasis, 1,000 iterations corresponded to roughly 3.8 million episodes of experience. This being the case, they argue that RUSP and techniques like it warrant further exploration. “Reciprocity and team formation are hallmark behaviors of sustained cooperation in both animals and humans,” they wrote in a paper submitted to the 2020 NeurIPS conference. “The foundations of many of our social structures are rooted in these basic behaviors and are even explicitly written into them — almost 4,000 years ago, reciprocal punishment was at the core of Hammurabi’s code of laws. If we are to see the emergence of more complex social structures and norms, it seems a prudent first step to understanding how simple forms of reciprocity may develop in artificial agents.”
GamesBeatGamesBeat's creed when covering the game industry is "where passion meets business." What does this mean? We want to tell you how the news matters to you -- not just as a decision-maker at a game studio, but also as a fan of games. Whether you read our articles, listen to our podcasts, or watch our videos, GamesBeat will help you learn about the industry and enjoy engaging with it. How will you do that? Membership includes access to:
- Newsletters, such as DeanBeat
- The wonderful, educational, and fun speakers at our events
- Networking opportunities
- Special members-only interviews, chats, and "open office" events with GamesBeat staff
- Chatting with community members, GamesBeat staff, and other guests in our Discord
- And maybe even a fun prize or two
- Introductions to like-minded parties