NeurIPS 2019 featured robot curling players and coffee makers

Last week marked Neural Information Processing Systems (NeurIPS), one of the largest AI and machine learning conferences in the world. NeurIPS 2017 and NeurIPS 2018 received 3,240 and 4,854 research paper submissions, respectively, but this year's event -- which took place from December 8 to December 14 in Vancouver, Canada -- handily broke those records with around 6,600 submissions. More than 4,200 people queued in the registration line on Sunday afternoon, and all told, over 13,000 people attended, up 40% from the prior conference.

One particularly active category of research this year was robotics, which saw workshop and paper contributions from Intel, the University of California at Berkeley, and other leaders. Perhaps the most intriguing of these were novel approaches to training a team of machines to jointly solve a problem, and a multi-stage learning technique that uses pixel-level translation of human videos to train robots to complete tasks.

Multi-stage task learning

Researchers at Berkeley's department of electrical engineering and computer sciences designed a system that aims to reduce human burden, at least where defining a task and resetting an environment is concerned. Their framework -- AVID -- translates human instructions for each step into robot instructions via a CycleGAN, a technique that involves the training of image-to-image translation models using a collection of images from two domains that need not be related.

In practice, robots internalize tasks one stage at a time, automatically discovering how to reset stages to retry it without human intervention. This makes the learning process largely automatic, from the intuitive specification of tasks via videos to training.

Better still, the researchers say that in experiments, AVID successfully learned tasks such as operating a coffee machine and retrieving a cup directly from raw image observations. Training required only 20 minutes to provide human demonstrations and about 180 minutes of robot interaction with the environment, and in one of the tasks, it outperformed behavioral cloning using real robot demonstrations rather than videos of human demonstrations.

They leave to future work amortizing the cost of training the CycleGAN models for specific tasks, perhaps by reusing trained CycleGAN models to translate demonstrations for other, somewhat related tasks. The researchers believe training could be generalized with a large data set involving multiple different human and robot behaviors in an environment, enabling new tasks to be learned with just a few human demonstrations.

Teaching robots teamwork

Researchers at Intel sought to tackle two longstanding problems in machine learning -- a disinclination to explore environments and a high sensitivity to choice in hyperparemeters, or parameters whose values are set before the learning process begins -- with a framework dubbed CERL, or collaborative evolutionary reinforcement learning. It's a collection of optimized algorithms that together achieves greater sample efficiency, and that dynamically distributes computational resources to favor the best-performing models of the bunch.

Learning objectives in CERL are split into two optimization processes that operate simultaneously. The system constructs a population of model "teams," and evaluates each team on its performance on the actual task. Following these evaluations, strong teams stay together, while a mutation step breaks up weak teams and reforms models into new teams.

Importantly, each model gets a shared replay buffer, or a data repository where it can store its experiences as it explores. CERL constructs as many shared buffers as there are team positions, so a team member can learn from the experiences of all of its versions across all of the teams. And it's this split-level approach that enables CERL to achieve state-of-the-art performance on a number of difficult benchmarks, including training a 3D humanoid model to walk from scratch.

In the future, the team plans to investigate similar problems involving multi-task learning in scenarios that have no well-defined reward feedback. They also hope to explore the role of communication in solving such tasks, which they note represents a class of problems that are a step up from simple perception.

Bonus round: Curling robots

Who knew robots could curl so well? A team hailing from Korea University and the Berlin Institute of Technology describe in a paper a machine -- nicknamed Curly -- that holds its own on real-world curling ice. An AI-based curling strategy and simulation engine guide the thrower robot, which autonomously drives and recognizes the field configuration thanks to a combination of traction control, cameras, and machine vision.

As the researchers note, curling ice sheets are traditionally covered with pebbles, whose condition changes over time depending on factors like temperature, humidity, ice makers, elapsed time since the maintenance ended, and amount of sweeping during the game. The trajectory of the stones varies over time as a result.

Curly contends with this by deploying a physics-based simulator designed to adjust parameters including throw angle, velocity, and curl direction until an optimal strategy is discovered. The robot's thrower component performs this strategy on the ice sheet while holding and rotating a curling stone, which it releases by unfolding a gripper arm. A skip component keeps tabs on the locations and trajectories of stones while accounting for variability.

According to the researchers, Curly performed well in on-the-ice experiments -- namely, in classical game situations and when interacting with human opponents like a top-ranked Korean amateur high school team. They leave to future research using explainable AI techniques to gain a better understanding of critical shot impacts, allowing the robot to better learn from its mistakes.