The winners are Alex Nichols, the Compscience.org team, and Songbing Choi. And Unity, the maker of the Unity3D game engine, has also announced it has open-sourced Obstacle Tower for the research community to extend for their own needs.
The challenge started in February as a way to help foster research in the AI community by providing a challenging new benchmark built in Unity. The benchmark, called Obstacle Tower, was developed to be difficult for current machine learning algorithms to solve.
It pushed the boundaries of what was possible in the field by focusing on procedural generation. Key to that was only allowing participants access to one hundred instances of the Obstacle Tower, and evaluating their trained agents on a set of unique procedurally generated towers they had never seen before.
In this way, agents had to be able not only to solve the versions of the environment they had seen before but also do well on unexpected variations, a key property of intelligence referred to as generalization.
Once Unity created Obstacle Tower, it performed preliminary benchmarking using two of the state-of-the-art algorithms at the time. Unity’s learned agents were able to solve a little over an average of three floors solved on these unseen instances of the tower used for evaluation.
Since the start of the contest, Unity received close to 3,000 submitted agents. The top six final agents submitted by participants were able to solve over 10 floors of unseen versions of the tower, with the top entry solving an average of nearly 20 floors.
Alex Nichols, under the name unixpickle, scored first place with 19 floors completed. Unity wasn’t expected anyone to get past 10 floors. Second place went to a team at Compscience.org at the Universitat Pompeu Fabra under the name giadefa, followed by third place Songbin Choi, a biomedical engineer based in Seoul, South Korea.
Honorable mentions included Joe Booth (joe_booth), Doug Meng (dougm), and UEFDL (Miffyli).
Nichols has been programming since he was 11 years old. As a senior in high school, he became very interested in AI and taught himself about it. He studied at Cornell for three semesters before leaving to pursue AI full-time and ultimately joining OpenAI (he has since left but still maintains strong interest in AI).
Nichols trained his agent in several steps. First, he trained a classifier to identify objects (boxes, doors, etc..). This classifier was used throughout the process to tell the agent what objects it has seen in the past 50 time steps. Then, he used behavioral cloning to train an agent to imitate human demonstrations.
Lastly, Nichols used a variant of PPO which he calls “prierarchy” to fine-tune his behavioral cloned agent based on the game’s reward function. This variant of PPO replaces the entropy term with a KL term that keeps the agent close to the original behavior cloned policy.
He tried a few other approaches that didn’t quite pan out: GAIL for more sample-efficient imitation learning, CMA-ES to learn a policy from scratch, and stacking last-layer features from the classifier and feeding it into the agent (instead of using the classifier’s outputs for the state).
Unity said that all of the source code for Obstacle Tower is now available under the Apache 2 license.
“We waited to make this release until the contest was completed to prevent anyone from reverse-engineering the task or evaluation process. Now that it is over, we hope researchers and users are able to take things apart to help learn how to solve the task better, as well as modify the Obstacle Tower for your own needs,” said the contest organizers, Arthur Juliani and Jeffrey Shih, in a blog post.
The Obstacle Tower was built to be highly modular, and relies heavily on procedural generation of multiple aspects of the environment, from the floor layout to the item and module placement in each room.
“We expect that this modularity will make it easy for researchers to define their own custom tasks using the pieces and tools we’ve built,” Juliani and Shih said.
The focus of the Obstacle Tower Challenge is what Unity called weak generalization (sometimes called within-distribution generalization). For the challenge, agents had access to one hundred towers, and were tested on an additional five towers. Importantly, all of these towers were generated using the same set of rules. As such, there were no big surprises for the agents.
Also of interest is a different kind of generalization, what Unity called the strong kind (or sometimes called out of distribution). In this scenario, the agent would be tested on a version of Obstacle Tower, which was generated using a different set of rules from the training set.
Unity had a separate visual theme for the evaluation phase, which used different textures, geometry, and lighting.
“We think that benchmarks like these can be an even better measure of progress in artificial intelligence. We look forward to the community extending our work and proposing their own using this open source release,” Juliani and Shih said.
Collaborators on the project also included Julian Togelius and Ahmed Khalifa. Google Cloud provided GCP credits and AICrowd tech for hosting the challenge.
There’s still progress to make. Each instance of Obstacle Tower contains 100 floors. This means that there is still 80% of the tower left unsolved. Unity said it is hiring AI experts.