DeepMind's AI system finds its way around simulated cities it hasn't seen before

DeepMind says it designed a system that can leverage prior knowledge to solve tasks, while at the same time exploring to gather new knowledge and plan using this new knowledge when faced with new tasks. In a paper accepted to the Conference on Computer Vision and Pattern Recognition (CVPR) 2020, researchers at the company describe an AI "planning module" that operates over episodic memories (memories of everyday events that can be explicitly stated), which they say outperforms the nearest baseline by two to three times with respect to planning and exploring.

A grand challenge in AI is architecting a model that's able to enter unfamiliar environments and get to work immediately. For example, the paragon household robot would use general knowledge about homes to find cleaning supplies and acquire information it anticipates will be useful, like the location of clothes hampers in the rooms it passes. It could then leverage the newfound knowledge (i.e., hamper locations) to plan solutions for future tasks (e.g., doing the laundry) that solve the tasks more quickly.

Unfortunately, even state-of-the-art episodic memory models are able to explore but not to plan, potentially because they lack mechanisms for planning using memories. DeepMind claims to have remedied this with a novel module -- episodic planning network (EPN) -- that prompts AI agents to explore and plan effectively in unfamiliar environments.

EPN leverages self-attention, a method for computing relationships among an arbitrary number of items that doesn't assume any particular structure among them. EPN begins with episodic memories that reflect experience in a scenario so far, with each memory containing representations of the current observation, the previous action, and the previous observation.

In an experiment that brings to mind the New York City-navigating AI that Facebook open-sourced two years ago, the DeepMind researchers trained EPN-based software agents in One-Shot StreetLearn, a simulation where environments are sampled as neighborhoods from Google's StreetLearn data set of real-world street-level imagery. In One-Shot StreetLearn, you define tasks by selecting a position and orientation that the agent must navigate to from its current position.

Given only an image showing the current location, an image representing the goal location, and the ability to move left, right, or forward, the EPN-based agents successfully reached 28.7 goals per episode (averaged over 100 consecutive episodes) in places unfamiliar to them, according to the coauthors. They also matched the minimum number of steps to complete new tasks after only 15-20 tasks, and they generalized well to larger neighborhoods containing a greater number of intersections, with performance reaching 77% success with nine intersections as opposed to five in the original tasks.

"In the current experiments, the agent could succeed by planning over observed states," the researchers wrote. "However, there is nothing preventing EPNs from being used to plan over belief states, a potential critical ability for operating in dynamic partially-observed environments ... Future work might may approach [problems] with broader task distributions ... and test the extent to which EPNs are effective in solving broader classes of tasks."

EPN builds on DeepMind's existing city-navigation work and Dreamer, which internalizes a world model and plans ahead to select actions by "imagining" their long-term outcomes. More recently, the lab detailed Agent57, a system that uses episodic memory to learn a family of policies for exploring and exploiting. (Agent57 is one of the first systems to outperform humans on all 57 Atari games in the Arcade Learning Environment data set.)