Google today released Quantization Aware Training (QAT) API, which enables developers to train and deploy models with the performance benefits of quantization — the process of mapping input values from a large set to output values in a smaller set — while retaining close to their original accuracy. The goal is to support the development of smaller, faster, and more efficient machine learning models well-suited to run on off-the-shelf machines, such as those in medium- and small-business environments where computation resources are at a premium.
Often, the process of going from a higher to lower precision is noisy. That’s because quantization squeezes a small range of floating-point values into a fixed number of information buckets, leading to information loss similar to rounding errors when fractional values are represented as integers. (For example, all values in range [2.0, 2.3] might be represented in a single bucket.) Problematically, when the lossy numbers are used in several computations, the losses accumulate and need to be rescaled for the next computation.
The QAT API solves this by simulating low-precision computation during the AI model training process. Quantization error is introduced as noise throughout the training, which QAT API’s algorithm tries to minimize so that it learns variables that are more robust to quantization. A training graph leverages operations that convert floating-point objects into low-precision values and then convert low-precision values back into floating-point, ensuring that quantization losses are introduced in the computation and that further computations emulate low-precision.
In tests, Google reports that an image classification model (MobilenetV1 224) with a non-quantized accuracy of 71.03% achieved 71.06% accuracy after quantization when tested on the open source Imagenet data set. Another classification model (Nasnet-Mobile) tested against the same data set only experienced a 1% loss in accuracy (74% to 73%) post-quantization.
Aside from emulating the reduced precision computation, QAT API is responsible for recording the statistics necessary to quantize a trained model or parts of it. This enables developers to convert a model trained with the API to a quantized integer-only TensorFlow Lite model, for example, or to experiment with various quantization strategies while simulating how quantization affects accuracy for different hardware backends.
Google says that by default, QAT API — which is a part of the TensorFlow Model Optimization Toolkit — is configured to work with the quantized execution support available in TensorFlow Lite, Google’s toolset designed to adapt models architected on its TensorFlow machine learning framework to mobile, embedded, and internet of things devices. “We are very excited to see how the QAT API further enables TensorFlow users to push the boundaries of efficient execution in their TensorFlow Lite-powered products as well as how it opens the door to researching new quantization algorithms and further developing new hardware platforms with different levels of precision,” wrote Google in a blog post.
The formal launch of the QAT API comes after the unveiling of TensorFlow Quantum, a machine learning framework for training quantum models, at the TensorFlow Dev Summit. The QAT API was previewed during a recorded session at the conference.