Google researchers recently published a paper describing a framework — SEED RL — that scales AI model training to thousands of machines. They say that it could facilitate training at millions of frames per second on a machine while reducing costs by up to 80%, potentially leveling the playing field for startups that couldn’t previously compete with large AI labs.

Training sophisticated machine learning models in the cloud remains prohibitively expensive. According to a recent Synced report, the University of Washington’s Grover, which is tailored for both the generation and detection of fake news, cost $25,000 to train over the course of two weeks. OpenAI racked up $256 per hour to train its GPT-2 language model, and Google spent an estimated $6,912 training BERT, a bidirectional transformer model that redefined the state of the art for 11 natural language processing tasks.

SEED RL, which is based on Google’s TensorFlow 2.0 framework, features an architecture that takes advantage of graphics cards and tensor processing units (TPUs) by centralizing model inference. To avoid data transfer bottlenecks, it performs AI inference centrally with a learner component that trains the model using input from distributed inference. The target model’s variables and state information are kept local, while observations are sent to the learner at every environment step and latency is kept to a minimum thanks to a network library based on the open source universal RPC framework.


SEED RL’s learner component can be scaled across thousands of cores (e.g., up to 2,048 on Cloud TPUs), and the number of actors — which iterate between taking steps in the environment and running inference on the model to predict the next action — can scale up to thousands of machines. One algorithm — V-trace — predicts an action distribution from which an action can be sampled, while another — R2D2 — selects an action based on the predicted future value of that action.

To evaluate SEED RL, the research team benchmarked it on the commonly used Arcade Learning Environment, several DeepMind Lab environments, and the Google Research Football environment. They say that they managed to solve a previously unsolved Google Research Football task and that they achieved 2.4 million frames per second with 64 Cloud TPU cores, representing an improvement over the previous state-of-the-art distributed agent of 80 times.

“This results in a significant speed-up in wall-clock time and, because accelerators are orders of magnitude cheaper per operation than CPUs, the cost of experiments is reduced drastically,” wrote the coauthors of the paper. “We believe SEED RL, and the results presented, demonstrate that reinforcement learning has once again caught up with the rest of the deep learning field in terms of taking advantage of accelerators.”