Amazon’s re:Mars conference last June featured a carnival of robotics and AI. Disney showed a demo of its stunning robot acrobats, while others showed off delivery robots, dextrous robotic hands, and robotic snakes that can weave through the cracks of buildings after a disaster. Boston Dynamics’ four-legged Spot Mini was there, as well as robots built for space. To start the event, Robert Downey Jr. announced the creation of a new foundation to clean up the planet … with help from robots.

But when an Amazon employee asked CEO Jeff Bezos onstage about his vision for the next 10 years, Bezos talked first about more seemingly mundane applications — robotic arms and grasping objects. Like getting autonomous vehicle systems on public roads, robotic grasping remains one of the grand AI challenges poised to upend the economy and change human lives in the years ahead. But like the self-driving car field, sometimes there’s disagreement about the best way to measure progress among companies spinning out of robotic research labs at schools like MIT and UC Berkeley.

“I think if you went back in time 30 or 40 years and asked roboticists and computer scientists, people working on machine learning at that time, which problem would be harder to solve — machine vision, natural language understanding, or grasping — I think most people would have predicted that we would solve grasping first,” Bezos said. “And, of course, it’s turned out to be an incredibly difficult problem, probably in part because we’re starting to solve [grasping] with machine vision.”

Amazon and Blue Origin CEO Jeff Bezos talks about space, the future of tech, and other topics at the Amazon re:Mars conference in Las Vegas

Above: Amazon and Blue Origin CEO Jeff Bezos at the Amazon re:Mars conference in Las Vegas

Image Credit: Khari Johnson / VentureBeat

Today, in Amazon fulfillment centers, picking — the act of moving individual items for orders into a box — is done by people, but grasping robots could replace those workers, removing an entire layer of human labor in ecommerce. Amazon is a company whose former fulfillment center employees say treated them like robots, and it continues to increase roles for robots in fulfillment centers that started in 2012 with the acquisition of Kiva Systems and the creation of Amazon Robotics.

VB Transform 2020 Online - July 15-17. Join leading AI executives: Register for the free livestream.

Robotic arms with more refined grasping capabilities will have applications in home robotics (something Amazon is reportedly working on) and a range of tasks in other fields, as well as for Bezos’ plan to build on and near the moon with Blue Origin .

In an interview with VentureBeat, Covariant CEO Peter Chen said his company considers mean picks per hour (MPPH) a “retired metric,” even though some still consider it a primary way to measure robotic grasping system performance. He said the metric should be retired because he no longer considers achieving human rates of picking with a robotic arm to be a challenge.

MPPH takes into account the average number of grasping attempts a robot makes in an hour, as well as mean grasp reliability, or the probability that each grasp attempt will be successful. But Chen argues the number of mistakes that require human intervention per hour is a better measurement, because how a robot performs on that metric can determine how much human oversight it demands.

He draws a comparison to the way we evaluate autonomous driving systems.

“[Means picks per hour] is kind of like, ‘Can you drive down a block on a sunny day?’ That’s analogous to the self-driving situation. Everyone can do that. That’s no longer a test. What is a real test is how long you can sustain that. That becomes what matters,” Chen said.

“What we measure much more is the reliability of the system. This is similar to how in self driving, people measure how often a [human] driver needs to engage. Because that basically measures when AI fails to make decisions on its own, and that’s the same thing for us, and that’s almost, I would say, the most important measure in terms of value creation.”

Chen said he’s not aware of any other company focused on mean intervention per hour as a key metric, but he said that reflects Covariant’s maturity in the robotic manipulation space.

Covariant launched in 2017 but only came out of stealth last month, with support from deep learning luminaries like Geoffrey Hinton, Jeff Dean, and Yann LeCun. Covariant cofounders include Chen, UC Berkeley Robot Learning Lab director and Berkeley AI Research (BAIR) codirector Pieter Abbeel, and others who met while working together at OpenAI.

Covariant — a startup whose system is currently being used in a factory in Germany — recently claimed it had reached a new milestone. The company said its machines can pick and pack some 10,000 different items with greater than 99% accuracy.

In a test last year, robotics company ABB invited 20 companies from the U.S. and Europe to take part in a challenge involving picking and sorting random items. In the end, Covariant was the only company able to complete all the tasks and do so at speeds comparable to a human.

An ABB spokesperson declined to comment on which companies participated in the competition (the company agreed not to share details about participants) but said the test included 26 common items like apples, toys, bottles, and clamshell packs. ABB uses a formula that combines metrics like pick rate and mistakes — such as double picks or failed picks — to measure the performance of robotic grasping systems.

This week, ABB announced a partnership with Covariant to bring AI-enabled grasping robots to warehouses for ecommerce.

How to measure success

In a 2018 IEEE op-ed, 19 members of the robotics community across academia, industry, and standards bodies — including leaders at organizations like NASA’s Jet Propulsion Lab, Nvidia’s robotics unit, and the National Institute for Standards and Technology (NIST) — called for open discussion of benchmarks and metrics to measure progress in robotic grasping. The paper makes no explicit call for a single recommended success metric, but the primary metric mentioned is mean picks per hour.

Lael Odhner, cofounder and CTO of RightHand Robotics, which makes piece-picking systems for robotic arms, signed the 2018 op-ed.

He says there may be some nuance in how companies and researchers calculate mean picks per hour, but it’s a number intended to factor in range, rate, and reliability. Here, range is the percentage of customer inventory robots can pick, rate is the time it takes to pick any given item, and reliability is the amount of time spent handling exception cases, like items lost due to breakage or the need for manual intervention.

“Once all of these components are taken together, the result will be measured as an average number of picks per hour, but it will clearly take into account much more than the robot’s speed,” he said.

“I think Peter [Chen]’s focus on eliminating manual intervention is a good first step, since this is a significant risk to productivity in any automation. However, at some point, the value of automation in a production environment has to be measured in terms of total throughput, since the customer has a budget of so many cents for handling an item, and the overall cost of these has to add up to a reasonable number to pay for the robot,” Odhner said.

Alberto Rodriguez, who led Team MIT-Princeton in the Amazon Robotics Challenge between 2015 and 2017 and is now director of MCube Lab at MIT, also signed the op-ed. Rodriguez said he believes that the most advanced AI for bin-picking robots is now found in startup and corporate development, not academia.

“They have brought the performance of technology much farther in terms of reliability and speed, with better engineering of both the algorithms and the hardware than what can be done in an academic environment,” he said.

Peter Yu spent three years competing in the Amazon Robotics Challenge with Rodriguez at MIT. Today, he’s the CTO of XYZ Robotics, a robotic systems startup with customers in China and the United States.

Back in 2017, Yu said grasping systems hit averages near 30 mean picks per hour, but the MIT-Princeton team reached levels near 120 picks per hour. Today, he said, XYZ Robotics can achieve 900 picks per hour in a varied random item scenario.

Yu said metrics that track the rate of picks over time, like MPPH, are still important for manufacturers since a robotic arm must maintain speeds in keeping with people and machines in the rest of a warehouse’s supply chain.

“The best way, or the most real way [to test grasping systems] is [to go] to one of the deployment sites and then time the robot performance. And, as you know, different items can result in different speed because of the weight and size,” Yu told VentureBeat.

Why robotic grasping is hard

Ken Goldberg is a cocreator of the Dexterity Network (Dex-Net), a system for robotic grasping developed at AUTOLAB in affiliation with Berkeley AI Research, the CITRIS People and Robots Initiative, and the Real-Time Intelligent Secure Execution (RISE) Lab, with support from Amazon Robotics, Google, Intel, Samsung, and Toyota Research. He’s also CEO of Ambidextrous Robotics, a company that has raised funding but still considers itself in stealth mode. He also signed the 2018 IEEE letter.

Before Jeff Bezos took the stage at re:Mars last year, Goldberg talked about robotic grasping and how deep learning and simulation data are advancing the field. Control of actuators, friction between grippers, interpretation of perception from sensors, varying centers of mass, and noisy data can make robotic grasping a challenge. But Goldberg said Dex-Net is capable of achieving 400 picks per hour on objects it’s never seen before. A 2016 analysis clocks human performance at roughly 400 to 600 mean picks per hour.

Like XYZ Robotics, Dex-Net claims its systems offer grasping abilities nearly on par with human performance, but the two express this fact in different ways. Chen said 400 picks per hour is incredibly low for logistics customers but also said picking rates can get as high as 900-1,200 picks per hour.

In an interview with VentureBeat last month following a speech at the Re-Work Deep Learning Summit in San Francisco, Goldberg declined to respond to questions about Covariant but talked about the mean picks per hour metric.

“I think everybody’s doing certain deployments, but the question is if it’s in production … that’s where the rubber meets the road. Some of us are working 24 hours a day — that’s where it’s really exciting, and I think [there’s more work in warehouses] starting to happen,” he said.

In addition to picks per hour, Goldberg said companies should consider metrics like double picks — when a robotic grasper picks up two items at once — and the number of items left in bins.

“Under certain circumstances, if we have nice objects and you have a very fast robot, you can get there [human picking rates] ,” Goldberg told VentureBeat last month. “But they say humans are like 650 per hour; that’s an amazing level. It’s very hard to beat humans. We’re very good. We’ve evolved over millions of years.”

Metrics used to measure progress in robotic grasping can vary based on the task. For example, for robots operating in a mission-critical environment like space, accuracy matters above all.

Whatever success metrics companies use to measure progress in robotic grasping, both Chen and Goldberg agree a continued focus on adversarial examples — the kind that continually stump systems — can lead to great progress.

“We actually built adversarial objects that are extremely hard to grasp,” Goldberg told VentureBeat.

In work published last year, Goldberg and co-authors from Berkeley AI Research and AUTOLAB intentionally designed adversarial cubes and other objects. In the case of one adversarial cube, Dex-Net achieved a 0% success rate.

Above: Adversarial objects created by roboticists at UC Berkeley

Chen declined to share specifics about how Covariant approaches adversarial learning, but he said the best learning possibilities lie in hunting for outliers.

“Let’s say the long-tail hard cases normally only occur 1% of the time,” he said. “If you adversarially train for it, then you can make those occur much more often and essentially accelerate your training and make that more efficient.”