Researchers from Stanford AI Lab (SAIL) have devised a method to deal with data and environments that change over time in a way that outperforms some leading approaches to reinforcement learning. Lifelong Latent Actor-Critic, aka LILAC, uses latent variable models and a maximum entropy policy to leverage past experience for better sample efficiency and performance in dynamic environments.
“On a variety of challenging continuous control tasks with significant non-stationarity, we observe that our approach leads to substantial improvement compared to state-of-the-art reinforcement learning methods,” they wrote in a paper about LILAC. Reinforcement learning capable of adapting to environments may be able to, for example, let robots or autonomous vehicles operate when weather conditions change and rain or snow are introduced.
Authors conducted four tests in dynamic reinforcement learning environments including a Sawyer robot from the Meta-World benchmark, a Half-Cheetah in OpenAI Gym, and a 2D navigation task. Researchers found that in all domains, LILAC attains higher and more stable returns compared to some top reinforcement learning approaches like the Soft Actor-Critic (SAC) introduced by Berkeley AI Research (BAIR) in 2018 and Stochastic Latent Actor-Critic (SLAC), which UC Berkeley researchers introduced earlier this year.
Stanford researchers Annie Xie, James Harrison, and Chelsea Finn published a paper on LILAC two weeks ago in the preprint repository arXiv. Lead author Xie also worked with UC Berkeley professor Sergey Levine on SAC and SLAC.
“In contrast to these methods, LILAC infers how the environment changes in future episodes and steadily maintains high rewards over the training procedure, despite experiencing persistent shifts in the environment in each episode,” the paper read.
The authors say the LILAC approach shares similarities with lifelong learning and online learning algorithms. Meta-learning and meta-reinforcement learning algorithms also attempt to quickly adapt to new settings.
In other recent reinforcement learning news, AI researchers from Google Brain, Carnegie Mellon University, University of Pittsburgh, and UC Berkeley — including Levine again — recently introduced a new approach to domain adaptation, the technique of changing the reward function for agents in reinforcement learning environments. Like other reinforcement learning environments, the approach attempts to make a source domain in a simulator more like a target domain like the real world.
“The agent is penalized for taking transitions which would indicate whether the agent is interacting with the source or target domain,” according to the domain adaptation paper released last week. “Experiments on a range of control tasks show that our method can leverage the source domain to learn policies that will work well in the target domain, despite observing only a handful of transitions from the target domain.”
Researchers modified the reward function with classifiers made to distinguish between source and target domain transitions. They tested their approach with three tasks in OpenAI Gym. Benjamin Eysenbach of Google Brain and CMU is the lead author of that paper.
In other recent reinforcement learning news, in May UC Berkeley researchers open-sourced RAD to improve any reinforcement learning algorithm.