Google today made available TensorFlow RunTime (TFRT), a new runtime for its TensorFlow machine learning framework that provides a unified, extensible infrastructure layer with high performance across a range of hardware. Its release in open source on GitHub follows a preview earlier this year during a session at the 2020 TensorFlow Dev Summit, where TFRT was shown to speed up core loops in a key benchmarking test.
TFRT is intended to address the needs of data scientists looking for faster model iteration time and better error reporting, Google says, as well as app developers looking for improved performance while training and serving models in production. Tangibly, TFRT could reduce the time it takes to develop, validate, and deploy an enterprise-scale model, which surveys suggest can range from weeks to months (or years). And it might beat back Facebook’s encroaching PyTorch framework, which continues to see rapid uptake among companies like OpenAI, Preferred Networks, and Uber.
TFRT executes kernels — math functions — on targeted hardware devices. During this development phase, TFRT invokes a set of kernels that call into the underlying hardware, focusing on low-level efficiency.
Compared with TensorFlow’s existing runtime, which was built for graph execution (executing a graph of operations, constants, and variables) and training workloads, TFRT is optimized for inference and eager execution, where operations are executed as called from a Python script. TFRT leverages common abstractions across eager and graph executions; to achieve even better performance, its graph executor supports the concurrent execution of operations and asynchronous API calls.
Google says that in a performance test, TFRT improved the inference time of a trained ResNet-50 model (a popular algorithm for image recognition) by 28% on a graphics card compared with TensorFlow’s current runtime. “These early results are strong validation for TFRT, and we expect it to provide a big boost to performance,” wrote TFRT product manager Eric Johnson and TFRT tech lead Mingsheng Hong in a blog post. “A high-performance low-level runtime is a key to enable the trends of today and empower the innovations of tomorrow … TFRT will benefit a broad range of users.”
Contributions to the TFRT GitHub repository are currently limited, and TFRT isn’t yet available in the stable build of TensorFlow. But Google says that it’ll soon arrive through an opt-in flag before eventually replacing the existing runtime.