Will this crisis help set autonomous AI on the right course?

The COVID-19 pandemic accelerates an automated future that’s already on its way. It serves as a wake-up call to all AI, robotics, and driverless car startups: stop building eye-dazzling demos and talking about the future possibility of general-use AI. Instead, focus on deploying real-world solutions that can run 24 hours a day with minimum human intervention and deliver true value to users.

Thousands of Americans have started to work from home amidst the current pandemic. Retailers have struggled with supply while nervous consumers are hoarding everything from toilet paper to hand soap. Across the globe, Chinese e-commerce giant JD began testing a level 4 autonomous delivery robot in Wuhan and running its automated warehouses 24 hours a day to cope with a surge in demand.

Suddenly, autonomous machines need to be better than just proof of concept. They must be robust enough to work independently across various real-life situations.

In some ways, the epidemic accelerates an automated future that’s already on its way. It has exposed problems that have long existed in the AI venture scene: buzzwords and hype cloud people’s judgment, making it difficult to see real progress.

The industry needs to take on much-needed reforms towards real-world autonomous systems in the following three areas:

1. Rethink metrics

As more autonomous AI machines are deployed in the real world, conventional metrics such as speed, cycle time, or success rate can no longer represent the full picture. We need to measure the reliability of the system under uncertainties with robustness metrics such as the average number of human interventions. We need more tools and industry standards to evaluate overall system performance across a wide range of scenarios because real life, unlike a controlled environment, is unpredictable.

If a delivery robot can reach a max speed of 4 mph but cannot complete a single deliver without onsite human support, the robot is not creating much value to its users.

DevOps emerged a few years ago to shorten the development cycle and continuously deliver high-quality software. In comparison to software engineering, AI or ML is much less mature. 87% of ML projects never go into production. However, recently we started to see MLOps or AIOps appearing more and more.

This marks a crucial transition from AI/ML research to actual products that are used and tested every day. It requires a significant change in mindset to focus on quality assurance instead of state-of-the-art ML models. I'm not saying we can't have both at the same time, but to date we've seen more emphasis on the latter.

2. Redesign error handling and communication

The recent shut-down of Starsky Robotics reminds us that we are still years away from fully autonomous solutions. However, that doesn’t mean that AI robotics cannot bring immediate values to humans. As mentioned in my previous article, even if humans need to handle edge cases 15% of the time, that still means companies can reduce significant labor and integration costs.

That’s why it’s important to measure the number of human interventions required as mentioned above. More importantly, we need to design a better way to handle and communicate errors. For example, showing the confidence level of machine learning model predictions or framing your predictions as suggestions instead of decisions are ways to build trust with users.

Besides, it’s crucial to have two-way communication to allow users to flag unknown unknowns, errors that systems cannot detect. Especially for major errors that need immediate human intervention to resume system operation.

Error handling is the first step. It’s about identifying cases where machines cannot cope with every scenario by themselves. The next step is to ensure seamless handoff and collaboration between machines and humans to address edge cases and optimize the overall performance.

3. Redefine human-machine interaction

We are used to guiding robots or giving commands to machines. But as machines keep getting smarter, should we humans always make the final call?

For example, who should be controlling an autonomous robotaxi? The car itself? The human safety driver? Someone who monitors a fleet of robotaxis remotely? Or the passengers? Under what situation? Do we have the right tool and technology to pass all the relevant information to that decision maker promptly?

In addition to technology, there are also trust issues. Even though research shows that autonomous cars are safer, nearly half of Americans still prefer not to use a self-driving car.

How do we design human-centered AI to make sure that autonomous machines make our lives better, not worse? How do we automate the right use cases to augment humans? How do we build a hybrid team that delivers better outcomes and allows humans and machines to learn from each other?

There are still a lot of questions that we need to answer. But the good news is that we have started to do so. And we seem to be heading in the right direction.

Bastiane Huang is a product manager at OSARO, an AI/robotics startup based in San Francisco backed by Peter Thiel and Jerry Yang’s AME Cloud. She previously worked for Amazon Alexa.