Check out all the on-demand sessions from the Intelligent Security Summit here.

Deep reinforcement learning systems are among the most capable in AI, particularly in the robotics domain. However, in the real world, these systems encounter a number of situations and behaviors they weren’t exposed to during development.

In a step toward systems that can collaborate with humans in order to help them accomplish their goals, researchers at Microsoft; the University of California, Berkeley; and the University of Nottingham developed a methodology for applying a testing paradigm to human-AI collaboration that can be demonstrated in a simplified version of the game Overcooked. Players in Overcooked control a number of chefs in kitchens filled with obstacles and hazards to prepare meals to order under a time limit.

The team asserts that Overcooked, while not necessarily designed with robustness benchmarking in mind, can successfully test potential edge cases in states a system should be able to handle, as well as the partners the system should be able to play with. For example, in Overcooked, systems must contend with scenarios like plates that are accidentally left on counters and partners staying put for a while because they’re thinking or away from their keyboard.

Overcooked AI

Above: Screen captures from the researchers’ test environment.

The researchers investigated a number of techniques for improving system robustness, including training a system with a diverse population of other collaborative systems. Over the course of experiments in Overcooked, they observed whether several test systems could recognize when to get out of the way (like when a partner was carrying an ingredient) and when to pick up and deliver orders after a partner has been idling for a while.


Intelligent Security Summit On-Demand

Learn the critical role of AI & ML in cybersecurity and industry specific case studies. Watch on-demand sessions today.

Watch Here

According to the researchers, current deep reinforcement agents aren’t very robust — at least not as measured by Overcooked. None of the systems they tested scored above 65% in the video game, suggestingOvercooked can serve as a useful human-AI collaboration metric in the future, the researchers say.

Overcooked AI

“We emphasize that our primary finding is that our [Overcooked] test suite provides information that may not be available by simply considering validation reward, and our conclusions for specific techniques are more preliminary,” the researchers wrote in a paper describing their work. “A natural extension of our work is to expand the use of unit tests to other domains besides human-AI collaboration … An alternative direction for future work is to explore meta learning, in order to train the agent to adapt online to the specific human partner it is playing with. This could lead to significant gains, especially on agent robustness with memory.”

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.