In a study earlier this year accepted to the Genetic and Evolutionary Computation Conference (GECCO) 2020, Google researchers investigate the properties of AI software agents that employ self-attention bottlenecks. They claim that these agents not only demonstrate an aptitude for solving challenging vision-based tasks, but that they’re better at tackling slight modifications of the tasks, due to their blindness to details that might confuse them.

Inattentional blindness is the phenomenon that causes a person to miss things in plain sight; it’s a consequence of selective attention, a mechanism that’s believed to enable humans to condense information into a form compact enough for decision-making. Luminaries like Yann LeCun assert it can inspire the design of AI systems that better mimic the elegance and efficiency of biological organisms.

The Google researchers’ proposed agent — AttentionAgent — aims to devote most of its attention to task-relevant elements, ignoring distractions. To achieve this, the system segments input images into patches and relies on a self-attention architecture to “vote” on patches and elect a subset. The elected patches guide AttentionAgent’s actions as it keeps apprised of changes in the input data, tracking how important factors evolve over time.

Google AttentionAgent

Above: VizDoom: The AttentionAgent is trained in the environment with no modifications (left). It is able to adapt to changes in the environment, such as a higher wall (middle left), a different floor texture (middle right), or floating text (right).

In experiments, the team showed that AttentionAgent learned to attend to a range of regions in the images. For instance, they trained it to survive on a level within VizDoom, a digital research environment built on the first-person shooter game Doom, even in environments with walls, floor textures, and signage that it hadn’t encountered before. And on the CarRacing game within OpenAI’s Gym, a toolkit for developing and comparing reinforcement learning algorithms, AttentionAgent learned to drive during a sunny day and transfer its skills to driving at night, on a rainy day, in a different car, with brighter or darker scenery, and in the presence of visual artifacts. Perhaps more impressively, training in CarRacing required 1,000 times fewer parameters — the variables internal to the system that inform its predictions — than conventional methods that fail to generalize.

Google AttentionAgent

Above: CarRacing: No modification (left); color perturbation (middle left); vertical bars on left and right (middle right); added red blob (right).

Despite the encouraging results, the researchers note AttentionAgent has serious limitations. It doesn’t generalize to cases where “dramatic” background changes are involved, for example; an agent trained on the CarRacing with a green grass background failed to generalize when the background was replaced with distracting YouTube videos. When the background was replaced with uniform noise, the agent attended to random patches of noise. And while training an agent from scratch with the noisy background enabled it to get around the track, its performance was mediocre.

Google AttentionAgent

Above: AttentionAgent fails to generalize to drastically modified environments.

To motivate future, improved work in selective attention, the researchers released a suite of car racing tasks that involve environmental modifications. It’s now available in open source on GitHub. “The simplistic method we use to extract information from important patches may be inadequate for more complicated tasks,” wrote coauthors Yujin Tang, a research software engineer at Google, and David Ha, a staff research scientist at Google Research in Tokyo. “How we can learn more meaningful features, and perhaps even extract symbolic information from the visual input will be an exciting future direction.”