The challenges of developing autonomous vehicles during a pandemic

In the months since the coronavirus prompted shelter-in-place orders around the world, entire companies, industries, and economies have been decimated. One market that appeared poised to escape the impact was autonomous vehicles, particularly in the case of companies whose cars can be used to transport supplies to health care workers. But it seems even they aren't immune.

In March, many high-profile startups and spinoffs, including Uber, Cruise, Aurora, Argo AI, and Lyft, suspended real-world testing of their autonomous vehicle fleets, citing safety concerns and a desire to limit contact between drivers and riders. Waymo went one step further, announcing that it would pause its commercial Waymo One operations in Phoenix, Arizona -- including its fully driverless cars, which don't require human operators behind the wheel -- until further notice.

The interruptions present a mammoth engineering challenge: how to replicate the data collected by real-world cars while the fleets remain grounded for months or longer. This problem has never been tackled before, and some experts believe it's insurmountable. Even Waymo CEO John Krafcik has said that real-world experience is "unavoidable" in driverless car development.

But some of the industry's biggest players -- including Waymo -- are trying anyway.

Data is the new oil

Virtually every autonomous vehicle development pipeline leans heavily on logs from sensors mounted to the outsides of cars, including lidar sensors, cameras, radar, inertial measurement units (IMUs), odometry sensors, and GPS. This data is used to train the family of machine learning models that underpin driverless vehicles' perception, prediction, and motion planning capabilities. These systems make sense of the world and the objects within it and dictate the paths vehicles ultimately take.

For instance, Tesla compiled a corpus of tens of thousands of occluded stop signs to teach a model to recognize similar signs in the wild. And Cruise used a combination of synthetic and real-world audio and visual data to train a system that detects police cars, fire trucks, ambulances, and other emergency vehicles.

Real-world data collection also entails mapping, which in the context of autonomous vehicles refers to the creation of 3D, high-definition, centimeter-level maps of roads, buildings, vegetation, and other static objects in the world. Ahead of testing in a new location, companies like Waymo and Cruise deploy sensor-equipped, manually driven cars to map the routes driverless vehicles could possibly take. These maps help the vehicles localize themselves in the world, and they also provide valuable contextual information, like speed limits and the location of traffic lanes and pedestrian crossings.

In lieu of all this, autonomous vehicle companies must rely on the data they've collected to date -- and perturbations or modifications of that data -- for system development and evaluation. Fortuitously, many of these companies have invested in simulation to scale testing beyond what's possible in the real world.

Simulation

Waymo

Waymo says it drives 20 million miles a day in its Carcraft simulation platform -- the equivalent of over 100 years of real-world driving on public roads. Moreover, the company says that Waymo Driver, its autonomous vehicle software suite, has accumulated over 15 billion simulated autonomous miles to date. That's up from 10 billion simulated autonomous miles as of July 2019.

"There's a lot of information in [Carcraft]," Jonathan Karmel, Waymo's product lead for simulation and automation, told VentureBeat. "That's why we use a range of tools internally to extract the most important signals -- the most interesting miles and useful information."

Using web-based interfaces to interact with Carcraft simulations, Waymo engineers leverage real-world data to prepare for edge cases and explore ideas, selecting encounters from Waymo's more than 20 million autonomous miles on roads in 25 cities. As both the software and scene evolve, keeping the environment around Waymo Driver up to date maintains realism. That entails modeling agent behavior and using reactive agents (such as other cars, cyclists, and pedestrians) who respond to the new position of the virtual cars.

Waymo says it also synthesizes realistic sensor data for cars and models scenes in updated environments. As its virtual cars drive through the same scenarios Waymo vehicles experience in the real world, engineers modify the scenes and evaluate possible situations. They also manipulate those scenes by virtually adding new agents into the situation, such as cyclists, or by modulating the speed of oncoming traffic to gauge how the Waymo Driver would have reacted.

Over time, the simulation scenarios are amplified through a large number of variations to assess the desired behavior of Waymo Driver. That information is used to improve both safety and performance. "I look at sensor simulation work as being able to [augment] our real-world miles," said Karmel. "We have the capabilities to incrementally understand the real world as things change, and as we continue to make changes to improve the state of [our systems'] performance, we continue to [create] new challenges in simulation."

In addition to constructing scenarios informed by real-world driving data, Waymo deploys never-before-tested synthetic scenarios captured from its private test track. The company says this enables it to continue to expand the number of miles it can simulate. The majority of learning and development is done in simulation, according to Karmel -- well before updated versions of Waymo Driver hit real-world roads.

An oft-overlooked aspect of these learning and development processes is comfort. Waymo says it evaluates multiple "comfort metrics," like the ways people respond to vehicles' various driving behaviors. This on-road testing feedback is used to train AI models and run them in simulation to validate how different scenarios influence rider comfort, from figuring out the ideal braking speed to ensuring the car drives smoothly.

"We're ... beginning to better understand the components that make a ride comfortable," explained Karmel. "Some of the key components are things like acceleration and deceleration, and we want to receive that information into simulation to predict what we think a rider or driver reaction would have been in the real world. There's a machine learning model to predict what those reactions are in [Carcraft]."

Beyond Carcraft, Waymo's engineers tap tools like Content Search, Progressive Population-Based Augmentation (PPBA), and Population-Based Training (PBT) to support various development, testing, and validation efforts. Content Search draws on tech similar to that powering Google Photos and Google Image Search to let data scientists locate objects in Waymo's driving history and logs. PBT -- which was architected in collaboration with Alphabet's DeepMind -- starts with multiple machine learning models and replaces underperforming members with "offspring" to reduce false positives by 24% in pedestrian, bicyclist, and motorcyclist recognition tasks. As for PPBA, it bolsters object classifiers while decreasing costs and accelerating the training process, chiefly because it only needs annotated lidar data for training.

Cruise

Cruise also runs lots of simulations -- about 200,000 hours of compute jobs each day in Google Cloud Platform -- one of which is an end-to-end, three-dimensional Unreal Engine environment that Cruise employees call The Matrix. It enables engineers to build any kind of situation they're able to dream up, and to synthesize sensor inputs like camera footage, lidar, and radar feeds to autonomous virtual cars.

"Handling the long tail is the reason autonomous vehicles are one of the most difficult and exciting AI problems on the planet, and also the fact that we expect extremely high performance levels from autonomous vehicles and their underlying models," Cruise head of AI Hussein Mehanna told VentureBeat. "When you look at the training data, you have thousands of lidar scan points, high-resolution images, radar data, and information from all sorts of other sensors. All of that requires a significant amount of infrastructure."

Cruise spins up 30,000 instances daily across over 300,000 processor cores and 5,000 graphics cards, each of which loops through a single drive's worth of scenarios and generates 300 terabytes of results. (It's basically like having 30,000 virtual cars driving around at the same time.) Among other testing approaches, the company employs replay, which involves extracting real-world sensor data, playing it back against the car's software, and comparing the performance with human-labeled ground truth data. It also leverages planning simulation, which lets Cruise create up to hundreds of thousands of variations of a scenario by tweaking variables like the speed of oncoming cars and the space between them.

According to Cruise VP of simulation Tom Boyd, engineers make choices about which elements of scenarios to model and the granularity at which to model them. For instance, they weigh whether simulating a tire slip -- which is dependent on a car's mileage, road conditions, and even the metal used on the axle -- is more important than modeling lidar reflections off of car windshields and rearview mirrors or radar multipath returns.

Another way Cruise manages simulation trade-offs is through frameworks that serve the different levels of accuracy tests require. Those that don't need 3D graphics can run on commodity hardware up to 100 times real-time. "No vehicle dynamics software model is perfectly accurate," said Boyd. "They can get complex, and it would be easy for us to spend months of development effort to resolve extremely minor differences between how the [autonomous vehicle] acts in simulation and on the road."

Tools within Cruise's engineering suite include the web-based Webviz, which has its roots in a hackathon project and which is now used by roughly a thousand monthly active employees. The latest production version lets engineers save configurations, share various parameters, and watch vehicle simulations as they run on remote servers. There's also Worldview, a lightweight and extensible 2D/3D scene renderer that lets engineers quickly build custom visualizations.

Aurora

Aurora, the self-driving car company founded by former Waymo engineer Chris Urmson, says that the virtual cars within its Virtual Testing Suite platform complete over a million tests per day on average. This platform and other tools enable the company's engineers to quickly identify, review, categorize, and convert the majority of events and interesting on-road scenarios into virtual tests, and to run thousands of tests to evaluate a single change to the master codebase.

The Virtual Testing Suite comprises a mix of codebase tests, perception tests, manual driving evaluations, and simulations. Engineers write both unit tests (e.g., seeing if a method to calculate velocity gives the right answer) and integration tests (e.g., seeing whether that same method works well with other parts of the system). New work must pass all relevant tests before it's merged with the larger code, thereby allowing engineers to identify and fix any issues.

A series of specialized perception tests in simulation are created from real-world log data, and Aurora says it's developing "highly realistic" sensor simulations so that it can generate tests for uncommon and high-risk scenarios. Other experiments they regularly run in the Virtual Testing Suite assess how well Aurora Driver -- Aurora's full-stack driverless platform -- performs across a range of driving benchmarks.

No matter the nature of the test, custom-designed tools automatically extract information from Aurora's log data (e.g., how fast a pedestrian is walking) and plug it into various simulation models, which is designed to save engineers time.

The company says that in the months since Aurora halted all real-world testing, its vehicle operators have joined forces with its triage and labeling teams to mine manual and autonomous driving data for on-road events that can be turned into simulated virtual tests. Aurora also says it's building new tools, such as a web app designed to make crafting simulations even easier for engineers, and that it's enhancing existing pipelines that will support the creation of new testing scenarios.

Elsewhere, Aurora engineers are continuing to build and refine the company's vehicle maps -- the Aurora Atlas -- in areas where Aurora Driver will operate when it resumes on-road testing, a spokesperson tells VentureBeat. They're adding new maps to Cloud Atlas, the versioned database specially designed to hold Atlas data, tapping into machine learning models that automatically generate annotations like traffic lights.

Advancements in AI and machine learning have made it easier to teach car-driving agents to navigate never-before-seen roads within simulations. In a recent technical paper, researchers at MIT's Computer Science and Artificial Intelligence Laboratory describe an approach not unlike Aurora's that involves Virtual Image Synthesis and Transformation for Autonomy (VISTA), a photorealistic simulator that uses only a real-world corpus to synthesize viewpoints from potential vehicle trajectories. VISTA was able to train a model that navigated a car through previously unseen streets -- even when the car was positioned in ways that mimicked near-crashes.

"We don't anticipate that COVID-19 will delay our progress in the long term, largely due to our investments in virtual testing, but it has demonstrated the urgency for self-driving transportation that can move people and goods safely and quickly without the need of a human driver," said Urmson in a statement. "That's why we're more committed to our mission than ever and continue to hire experts in all disciplines, pay everyone at the company, and find ways to advance development on the Aurora Driver. As our industry comes together, ingenuity, dedication, and thoughtful leadership will get us through these challenging times."

Uber

Uber's Advanced Technologies Group (ATG), the division spearheading Uber's autonomous vehicles projects, retains a team that continuously expands the test set within Uber's simulator based on test track and road behavior data. Every time any adjustment is made to the self-driving system's software, it's automatically re-run against the full suite of simulation tests, ATG head of systems engineering and testing Adrian Thompson told VentureBeat.

ATG engineers use tools like DataViz, a web-based interface that ATG developed in collaboration with Uber's Data Visualization Team, to see how cars in simulation interpret and perceive the virtual world. DataViz offers realistic representations of elements like cars, ground imagery, lane markers, and signs. It does the same for abstract representation (by way of color and geometric coding) for algorithmically generated information such as object classification, prediction, planning, and lookaheads. Together, they enable employees to inspect and debug information collected from offline and online testing, as well as to explore information in the process of creating new scenarios.

Thompson says that Uber's decision to ramp up development of its modeling and simulation tools over the past two years is paying dividends. In something of a case in point, the company is now using over 2 million miles of sensor logs augmented with simulations to accomplish the "vast majority" of its AI training and validation, he said.

"We have experienced very little disruption to our AI model development trajectory due to the absence of road operations," said Thompson. "Our test track testing is meant to validate our models, so we're able to maintain, if not accelerate, our pace of development during this time."

Perhaps unsurprisingly, Thompson also says that the virtual cars in Uber's simulation environment are driving more miles than before the pandemic. He doesn't attribute this to the health crisis per se, but he says that COVID-19 provided an opportunity to continue scaling simulations.

"We have well-established strategic plans in place to expand our simulated mileage further. It's part serendipity that our model-based development approach has made our operations more robust to situations like this pandemic," he added. "We will continue this rapid expansion of our simulation capability for the foreseeable future, and have no plans to reduce simulated miles driven even after the pandemic is behind us."

Lyft

Lyft was in the midst of developing a new vehicle platform when it was forced to halt all real-world testing. Nevertheless, Jonny Dyer, director of engineering at Lyft's Level 5 self-driving division, tells VentureBeat that the company is "doubling down" on simulation by leveraging data from the roughly 100,000 miles its real-world autonomous cars have driven and calibrating its simulation environment ahead of validation.

Specifically, Lyft is refining the techniques it used in simulation to direct agents (such as virtual pedestrians) to react realistically to vehicles, in part with AI and machine learning models. It's also building out tools like a benchmarking framework that enables engineers to compare and improve the performance of behavior detectors, as well as a dashboard that dynamically updates visualizations to help create diversified simulation content.

Dyer says that Lyft isn't so much focused on challenges like simulating camera, lidar, and radar sensor data, but instead on traditional physics-based mechanisms, as well as methods that help identify the right sets of parameters to simulate. "It's not a game of scale with simulation models -- it's really more about simulating the right miles with high fidelity," he said. "We're focusing on that fidelity aspect and getting simulation closer to telling us the types of things that real-world driving does. It's the question of not just simulating a large number of miles, but simulating the right miles."

Lyft also reworked its validation strategy to weigh more heavily on things like structural and dynamic simulation in light of the pandemic, according to Dyer. The company had planned to perform real-world testing prior to these steps -- and it still will in some capacity -- but the shutdown forced its hardware engineers to pivot engineering toward simulation.

For example, a senior computer engineer mounted the high-performance server that runs Lyft's autonomous vehicle technology stack -- which contains eight graphics cards and a powerful x86 processor -- in her bedroom with four desk fans blowing on it to keep it cool. Another engineer built an electrolytic corrosion setup in his garage with a Raspberry Pi and circuit boards he'd purchased on eBay. Yet another engineer converted the cricket pitch in his backyard into a lidar sensor range, with full-sized street signs he's using to perform calibration for the new sensors Lyft plans to integrate.

Industry challenges

Despite the Herculean efforts of the autonomous vehicle companies grounded by COVID-19, it seems likely that a few will emerge from the pandemic worse for wear. Simulation is no substitute for testing on real roads, some experts assert.

A longstanding challenge in simulations involving real data is that every scene must respond to a self-driving car's movements -- even though those that might not have been recorded by the original sensor. Whatever angle or viewpoint isn't captured by a photo or video has to be rendered or simulated using predictive models, which is why simulation has historically relied on computer-generated graphics and physics-based rendering that somewhat crudely represents the world. (Tellingly, even Wayve , a U.K.-based startup that trains self-driving models solely in simulation, relies on feedback from safety drivers to fine-tune those models.)

A paper published by researchers at Carnegie Mellon outlines the other challenges with simulation that impede real-world hardware development:

The reality gap: Simulated environments don't always adequately represent physical reality -- for example, a simulation lacking an accurate tire model might not account for realistic car behaviors when cornering at high speeds.
Resource costs: The computational overhead of simulation requires specialized hardware like graphics cards, which drives high cloud costs. According to a recent Synced report, training a state-of-the-art machine learning model like the University of Washington's Grover, which generates and detects fake news, can cost in excess of $25,000 over a two-week period.
Reproducibility: Even the best simulators can contain non-deterministic elements that make reproducing tests impossible.

Indeed, Yandex -- which continues to operate its self-driving cars in public roads in locations where it's allowed, such as Moscow -- says that while simulation can aid autonomous vehicle development, public testing remains critical. Shifting to a full simulation program without on-road tests will slow the progress of autonomous vehicle development in the short term, the company asserts, because developing a simulation with 100% accuracy and complexity might require as much problem-solving and resources as developing self-driving technology itself.

"[Without real-world testing,] self-driving companies won't be able to collect critical real-world driving data," a Yandex spokesperson told VentureBeat. "[Additionally,] driving simulations and running vehicles on test tracks can help prove that vehicles meet specific requirements in a laboratory environment. Driving on public roads presents much more complex, real-world dynamics that the self-driving platforms need to face, including different weather conditions and a variety of pedestrian and driver behavior."

Beyond exposing autonomous driving systems to these complex dynamics, Ars Technica's Timothy B. Lee notes that testing ensures sensors and other hardware have a low failure rate; that the cars will use safe passenger pickup and dropoff locations; and that fleet operators are well-trained to handle any contingency. It also allows companies to identify issues that might crop up, like whether there are enough vehicles available for rush-hour service.

Dyer doesn't disagree with these sentiments entirely, but he's generally more optimistic about the prospect of simulated testing. Simulation is well-suited for structured and functional testing on test track data, he says, which make up a large slice of Lyft's autonomous vehicles roadmap.

"The reality is that all simulation is somewhat limited in that you have to calibrate it and validate it against reality. ... It's not going to replace driving on the roads anytime soon [because] you can't do everything in simulation. But I do think that we're making tremendous progress in our simulation environment," he said. "In this respect, the pandemic hasn't been a setback at all. There's a lot of basic stuff that comes out of these big engineering projects like tech debt and infrastructural things that you want to fix, but that becomes hard to fix when you're in the middle of an operational program. Investing in these will, in my opinion, pay off in a big way once we're back."

Skeptics like Boston Consulting Group senior partner and managing director Brian Collie anticipate the pandemic will delay the commercialization of driverless car technology by at least three years. As if on cue, Ford today announced it will delay plans to launch an autonomous vehicle service to 2022; the automaker had been collaborating with Argo AI and testing its go-to-market strategy through pilot programs Postmates, Walmart, Domino's and local partners.

Karmel concedes that there might be bumps in the road -- particularly with Waymo's testing paused -- but he says with confidence that the pandemic hasn't materially affected planned rollouts.

"If you just focus on synthetic miles and don't start bringing in some of the realism that you have from driving in the real world, it actually becomes very difficult to know where you are on that curve of realism," said Karmel. "That said, what we're trying to do is learn as much as we can -- we're still getting thousands of years of experience during this period of time."