In a new study published this week on the preprint server Arxiv.org, scientists at the University of Toronto and the Vector Institute, an independent nonprofit dedicated to advancing AI, propose BabyAI++, a platform to study whether descriptive texts help AI to generalize across dynamic environments. Both it and several baseline models will soon be available on GitHub.

One of the most powerful techniques in machine learning — reinforcement learning, which entails spurring software agents toward goals via rewards — is also one of the most flawed. It’s sample inefficient, meaning it requires a large number of compute cycles to complete, and without additional data to cover variations, it adapts poorly to environments that differ from the training environment.

It’s theorized that prior knowledge of tasks through structured language could be combined with reinforcement learning to mitigate its shortcomings, and BabyAI++ was designed to put this theory to the test. To this end, the platform builds upon an existing reinforcement learning framework — BabyAI — to generate various dynamic, color tile-based environments along with texts that describe their layouts in detail.

BabyAI++

Above: Environments in BabyAI++.

BabyAI++’s levels consist of objects that can be picked up and dropped; doors that can be unlocked and opened; and various tasks that the agents must be undertake. Like the environments themselves, the tasks are randomly generated, and they’re communicated to the agent through “Baby-Language,” a compositional language that uses a subset of English vocabulary.

The abovementioned texts reveal which types of tiles are in use and what color is matched to each tile. Since the pairing between the color and tile type is randomized, the agent must understand the description for it to properly navigate the map.

Within BabyAI++, every level is partitioned into two configurations: training and testing. In the training configuration, the agent is exposed to all tile and colors types in the level, but some combinations of color-type pairs are held out. In the testing configuration, all color-type pairs are enabled, forcing the agent to use language grounding to associate the type of the tile to the color.

The paper describes several experiments that were conducted using the baseline models, one of which — attention-fusion — uses what’s called an attention mechanism to assign relevant text embeddings (mathematical representations) to locations on a scene embedding feature map (a function that maps embeddings to a feature space, or the dimensions where the variables the AI processes reside). For the most difficult level, this attention-fusion model had a 16.2% higher testing success rate (around 60% after 5 steps, or actions) than the second-best model on the most challenging level, and it completed the level using fewer frames of images (around 65 compared with 75).

BabyAI++

Above: The attention model’s architecture.

The coauthors assert that this shows descriptive texts are useful for agents to generalize environments with variable dynamics by learning language-grounding. “We believe the proposed BabyAI++ platform, with its public code and baseline implementations, will further spur research development in this area,” they wrote in the paper.


The audio problem: Learn how new cloud-based API solutions are solving imperfect, frustrating audio in video conferences. Access here