All the sessions from Transform 2021 are available on-demand now. Watch now.
Deep reinforcement learning — an AI training technique that employs rewards to drive software policies toward goals — has been tapped to model the impact of social norms, create AI that’s exceptionally good at playing games, and program robots that can recover from nasty spills. But despite its versatility, reinforcement learning (or “RL,” as it’s typically abbreviated) has a showstopping shortcoming: It’s inefficient. Training a policy requires lots of interactions within a simulated or real-world environment — far more than the average person needs to learn a task.
To remedy it somewhat in the video gaming domain, researchers at Google recently proposed a new algorithm — Simulated Policy Learning, or SimPLe for short — which uses game models to learn quality policies for selecting actions. They describe it in a newly published preprint paper (“Model-Based Reinforcement Learning for Atari“) and in documentation accompanying the open-sourced code.
“At a high-level, the idea behind SimPLe is to alternate between learning a world model of how the game behaves and using that model to optimize a policy (with model-free reinforcement learning) within the simulated game environment,” wrote Google AI scientists Łukasz Kaiser and Dumitru Erhan. “The basic principles behind this algorithm are well established and have been employed in numerous recent model-based reinforcement learning methods.”
As the two researchers further explain, training an AI system to play games requires predicting the target game’s next frame given a sequence of observed frames and commands (e.g., “left,” “right,” “forward,” “backward”). A successful model, they point out, can produce trajectories that could be used to train a gaming agent policy, which would obviate the need to rely on computationally costly in-game sequences.
Three top investment pros open up about what it takes to get your video game funded.
SimPLe does exactly this. It takes four frames as input to predict the next frame along with the reward, and after it’s fully trained, it produces “rollouts” — sample sequences of actions, observations, and outcomes — that are used to improve policies. (Kaiser and Erhan note that SimPLe only uses medium-length rollouts to minimize prediction errors.)
In experiments lasting the equivalent of two hours of gameplay (100,000 interactions), agents with SimPLe-tuned policies managed to achieve the maximum score in two test games (Pong and Freeway) and generate “near-perfect predictions” up to 50 steps into the future. They occasionally struggled to capture “small but highly relevant” objects in games, resulting in failure cases, and Kaiser and Erhan concede that it doesn’t yet match the performance of standard RL methods. But SimPLe was up to two times more efficient in terms of training, and the research team expects future work will improve its performance measurably.
“The main promise of model-based reinforcement learning methods is in environments where interactions are either costly, slow or require human labeling, such as many robotics tasks,” they wrote. “In such environments, a learned simulator would enable a better understanding of the agent’s environment and could lead to new, better and faster ways for doing multi-task reinforcement learning.”
GamesBeatGamesBeat's creed when covering the game industry is "where passion meets business." What does this mean? We want to tell you how the news matters to you -- not just as a decision-maker at a game studio, but also as a fan of games. Whether you read our articles, listen to our podcasts, or watch our videos, GamesBeat will help you learn about the industry and enjoy engaging with it. How will you do that? Membership includes access to:
- Newsletters, such as DeanBeat
- The wonderful, educational, and fun speakers at our events
- Networking opportunities
- Special members-only interviews, chats, and "open office" events with GamesBeat staff
- Chatting with community members, GamesBeat staff, and other guests in our Discord
- And maybe even a fun prize or two
- Introductions to like-minded parties