Google researchers have developed an AI system that learns from the motions of animals to give robots greater agility, according to a preprint paper and blog post published this week. The coauthors believe their approach could bolster the development of robots that can complete tasks in the real world, for instance transporting materials between multilevel warehouses and fulfillment centers.
The teams’ framework takes a motion capture clip of an animal — a dog, in this case — and uses reinforcement learning, a training technique that spurs software agents to complete goals via rewards, to train a control policy. Providing the system with different reference motions enabled the researchers to “teach” a four-legged Unitree Laikago robot to perform a range of behaviors, they say, from fast walking (at a speed of up to 2.6 miles per hour) to hops and turns.
To validate their approach, the researchers first compiled a data set of real dogs performing various skills. (Training largely took place in a physics simulation so that the pose of the reference motions could be closely tracked.) Then, by using the different motions in the reward function (which describes how agents ought to behave), the researchers used about 200 million samples to train a simulated robot to imitate motion skills.
But simulators generally provide only a coarse approximation of the real world. To address this issue, the researchers employed an adaptation technique that randomized the dynamics in the simulation by, for example, varying physical quantities like the robot’s mass and friction. These values were mapped using an encoder to a numerical representation — i.e., an encoding — that was passed as an input to the robot control policy. When deploying the policy to a real robot, the researchers removed the encoder and searched directly for a set of variables that allowed the robot to successfully execute skills.
The team says they were able to adapt a policy to the real world using under eight minutes of real-world data across approximately 50 trials. Moreover, they demonstrated that the real-world robot learned to imitate various motions from a dog, including pacing and trotting, as well as artist-animated keyframe motions like a dynamic hop-turn.
“We show that by leveraging reference motion data, a single learning-based approach is able to automatically synthesize controllers for a diverse repertoire [of] behaviors for legged robots,” wrote the coauthors in the paper. “By incorporating sample efficient domain adaptation techniques into the training process, our system is able to learn adaptive policies in simulation that can then be quickly adapted for real-world deployment.”
The control policy wasn’t perfect — owing to algorithmic and hardware limitations, it couldn’t learn highly dynamic behaviors like large jumps and runs and wasn’t as stable as the best manually designed controllers. (In five episodes for a total of 15 trials per method, the real-world robot fell after six seconds while pacing, on average; after five seconds while backward trotting; nine seconds while spinning; and 10 seconds while hop-turning.) The researchers leave to future work improving the robustness of the controller and developing frameworks that can learn from other sources of motion data, such as video clips.