Google today launched an OpenCL-based mobile GPU inference engine for its TensorFlow framework on Android. It’s available now in the latest version of the TensorFlow Lite library, and the company claims it offers a two times speedup over the existing OpenGL backend with “reasonably-sized” AI models.

OpenGL, which is nearly three decades old, is a platform-agnostic API for rendering 2D and 3D vector graphics. Compute shaders were added with OpenGL ES 3.1, but the TensorFlow team says backward-compatible design decisions limited them from reaching device GPUs’ full potential. On the other hand, OpenCL was designed for computation with various accelerators from the beginning and was thus more relevant to the domain of mobile GPU inference. This motivated the TensorFlow team’s investigation into — and eventual adoption of — an OpenCL-based mobile inference engine.

The new TensorFlow inference engine features an optimizer that chooses the right workgroup size to boost performance, resulting in up to a 50% speedup over the average on hardware like Qualcomm Adreno GPUs. It supports FP16 natively and requires accelerators to specify data types’ availability, reducing memory and bandwidth usage and training time by speeding up algorithmic computations. (Google notes that some older GPUs like the circa-2012 Adreno 305 can now operate at their full capabilities thanks to FP16 support.) And OpenCL is able to greatly outperform OpenGL’s performance by maintaining synergy with physical constant memory, a hardware feature in chips like Adreno GPUs that reserves RAM for storing constant arrays and variables.

Inference latency of MNASNet 1.3 on select Android devices with OpenCL

Above: Inference latency of MNASNet 1.3 on select Android devices with OpenCL.

Image Credit: Google

In one benchmark test, the TensorFlow team reduced the latency of MNASNet 1.3, a so-called neural architecture search system, from over 100 milliseconds on the Vivo Z3 with the OpenGL-based backend to 25 milliseconds with the OpenCL alternative. In another test with the object detection algorithm SSD MobileNet v3, the team reduced latency from nearly 100 milliseconds on the Huawei Mate 20 to less than 25 milliseconds.

Inference latency of SSD MobileNet v3 (large) on select Android devices with OpenCL

Above: Inference latency of SSD MobileNet v3 (large) on select Android devices with OpenCL.

Image Credit: Google

Google notes that OpenCL isn’t a part of the standard Android distribution, making it unavailable to some users. As a stopgap measure, TensorFlow Lite now checks for the availability of OpenCL at runtime so that if it isn’t available or can’t be loaded, the library falls back to the old OpenGL backend.

“While the TensorFlow Lite GPU team continuously improves the existing OpenGL-based mobile GPU inference engine, we also keep investigating other technologies,” TensorFlow software engineers Juhyun Lee and Raman Sarokin wrote in a blog post. “OpenCL brings quite a lot of features that let us optimize our mobile GPU inference engine.”


The audio problem: Learn how new cloud-based API solutions are solving imperfect, frustrating audio in video conferences. Access here