Google today introduced TensorFlow Lite 1.0, its framework for developers deploying AI models on mobile and IoT devices. Improvements include selective registration and quantization during and after training for faster, smaller models. Quantization has led to 4 times compression of some models.

“We are going to fully support it. We’re not going to break things and make sure we guarantee its compatibility. I think a lot of people who deploy this on phones want those guarantees,” TensorFlow engineering director Rajat Monga told VentureBeat in a phone interview.

Lite begins with training AI models on TensorFlow, then is converted to create Lite models for operating on mobile devices. Lite was first introduced at the I/O developer conference in May 2017 and in developer preview later that year.

The TensorFlow Lite team at Google also shared its roadmap for the future today, designed to shrink and speed up AI models for edge deployment, including things like model acceleration, especially for Android developers using neural nets, as well as a Keras-based connecting pruning kit and additional quantization enhancements.

Other changes on the way:

  • Support for control flow, which is essential to the operation of models like recurrent neural networks
  • CPU performance optimization with Lite models, potentially involving partnerships with other companies
  • Expand coverage of GPU delegate operations and finalize the API to make it generally available

A TensorFlow 2.0 model converter to make Lite models will be made available for developers to better understand how things wrong in the conversion process and how to fix it.

TensorFlow Lite is deployed by more than two billion devices today, TensorFlow Lite engineer Raziel Alvarez said onstage at the TensorFlow Dev Summit being held at Google offices in Sunnyvale, California.

TensorFlow Lite increasingly makes TensorFlow Mobile obsolete, except for users who want to utilize it for training, but a solution is in the works, Alvarez said.

A variety of techniques are being explored to reduce the size of AI models and optimize for mobile devices, such as quantization and delegates (structured layers for executing graphs in different hardware to improve inference speed).

Mobile GPU acceleration with delegates for a number of devices was made available in developer preview in January; it can make model deployment 2 to 7 times faster than floating point CPU. Edge TPU delegates are able to increase speeds to 64 times faster than a floating point CPU.

In the future, Google plans to make GPU delegates generally available, expand coverage, and finalize APIs.

Above: TensorFlow Lite speeds

Image Credit: Khari Johnson / VentureBeat

A number of native Google apps and services use TensorFlow Lite, including GBoard, Google Photos, AutoML, and Nest. All computation for CPU models when Google Assistant needs to respond to queries when offline is now carried out by Lite.

Lite can also run on devices like Raspberry Pi and the new $150 Coral Dev Board, which was also introduced earlier today.

Also making their debut today: The alpha release of TensorFlow 2.0 for a simplified user experience; TensorFlow.js 1.0; and the version 0.2 release of TensorFlow for developers who write code in Apple’s programming language Swift.

TensorFlow Federated and TensorFlow Privacy were also released today.

Lite for Core ML, Apple’s machine learning framework, was introduced in December 2017.

Custom TensorFlow Lite models also work with ML Kit, a quick way for developers to create models for mobile devices, introduced last year for Android and iOS developers using Firebase.