Intel AI researchers combine reinforcement learning methods to teach 3D humanoid how to walk

Researchers from Intel's AI Lab and the Collaborative Robotics and Intelligent Systems Institute at Oregon State University have combined a number of methods to make better-performing reinforcement learning systems that can be applied to things like robotic control, systems governing autonomous vehicle function, and other complex AI tasks.

Collaborative Evolutionary Reinforcement Learning (CERL) can achieve better performance in benchmarks like Humanoid, OpenAI's Hopper, Swimmer, HalfCheetah, and Walker2D than gradient-based or evolutionary algorithms for reinforcement learning can on their own. Using the CERL approach, researchers were able to make a 3D humanoid agent walk upright with OpenAI's Humanoid benchmark.

Those results are achieved in part through training systems that explore more of a reinforcement learning training environment to seek a reward and complete a specific task.

Environment exploration is important to ensure that a diverse range of experiences are documented and courses of action considered. Issues related to environmental exploration have emerged, particularly with the rise in popularity of using deep reinforcement learning to accomplish challenging real-world tasks, researchers said in a paper explaining how CERL works. "Neuroevolution binds this entire process to generate a single emergent learner that exceeds the capabilities of any individual learner," the paper reads.

CERL combines policy gradient-based reinforcement learning and evolutionary algorithms, and then the top-performing neural nets are chosen in each batch or generation of trained systems. That way, researchers can use the strongest neural nets to create new generations of systems, and they can distribute compute resources to algorithms that achieve the best performance.

CERL also combines replay buffers, which store the experience of learners in an environment, in order to create a single replay buffer and share experiences between systems in order to achieve higher sample efficiency than the previous method.

The CERL paper published on arXiv was accepted for oral argument at the International Conference on Machine Learning (ICML), which takes place this week in Long Beach, California. Intel is presenting another paper at ICML that defines an approach to AI model compression that doesn't compromise accuracy.

More