DeepMind and Waymo collaborate to improve AI accuracy and speed up model training

AI models capable of reliably guiding driverless cars typically require endless testing and fine-tuning, not to mention computational power out the wazoo. In an effort to bolster AI algorithm training effectiveness and efficiency, Google parent company Alphabet's Waymo is collaborating with DeepMind on techniques inspired by evolutionary biology, the two companies revealed in a blog post this morning.

As Waymo explains, AI algorithms self-improve through trial and error. A model is presented with a task that it learns to perform by continually attempting it and adjusting based on the feedback it receives. Performance is heavily dependent on the training regimen -- known as a hyperparemeter schedule -- and finding the best regimen is commonly left to experienced researchers and engineers. They handpick AI models undergoing training, culling the weakest performers and freeing resources to train new algorithms from scratch.

DeepMind devised a less labor-intensive approach in PBT (Population Based Training), which starts with multiple machine learning models initiated with random variables (hyperparameters). The models are evaluated periodically and compete with each other in an evolutionary fashion, such that underperforming members of the population are replaced with "offspring" (copies of better-performing members with slightly mutated variables). PBT doesn't necessitate restarting training from scratch, because each offspring inherits the state of its parent network, and the hyperparameters are updated actively throughout training. The net result is that PBT spends the bulk of its resources training with "good" hyperparameter values.

PBT isn't perfect -- it tends to optimize for the present and fails to consider long-term outcomes, disadvantaging late-blooming AI models. To mitigate this, researchers at DeepMind trained a larger population and created subpopulations called niches, in which algorithms are only allowed to compete within their own subgroups. Lastly, the team directly rewarded diversity by providing more unique models an edge in the competition.

In several recent studies, DeepMind and Waymo applied PBT to pedestrian, bicyclist, and motorcyclist recognition tasks with the goal of investigating whether it could improve recall (the fraction of obstacles identified over the total number of in-scene obstacles) and precision (the fraction of detected obstacles that are actually obstacles and not false positives). Ultimately, the companies sought to train a single AI model to maintain recall of over 99% while reducing false positives.

Waymo reports that these experiments informed a "realistic" framework for evaluating real-world model robustness, which in turn informed PBT's algorithm-selecting competition. They also say the experiments revealed the need for fast evaluation to support evolutionary competition; PBT models are evaluated every 15 minutes. (DeepMind said it employed parallelization across "hundreds" of distributed machines in Google's datacenters to achieve this.)

The results are impressive. PBT algorithms managed to achieve higher precision, reducing false positives by 24% compared to their hand-tuned equivalents, while maintaining a high recall rate, Waymo claims. Moreover, they saved time and resources -- the hyperparameter schedule discovered with PBT-trained algorithms took half the training time and resources and used half the computational resources.

Waymo says it has incorporated PBT directly into Waymo's technical infrastructure, enabling researchers from across the company to apply it with a button click. "Since the completion of these experiments, PBT has been applied to many different Waymo models and holds a lot of promise for helping to create more capable vehicles for the road," wrote the company. "Traditionally, [AI] can only be trained using simple and smooth loss functions, which act as a proxy for what we really care about. PBT enabled us to go beyond the update rule used for training neural nets, and toward the more complex metrics optimizing for features we care about."