Join Transform 2021 this July 12-16. Register for the AI event of the year.

Google today launched an OpenCL-based mobile GPU inference engine for its TensorFlow framework on Android. It’s available now in the latest version of the TensorFlow Lite library, and the company claims it offers a two times speedup over the existing OpenGL backend with “reasonably-sized” AI models.

OpenGL, which is nearly three decades old, is a platform-agnostic API for rendering 2D and 3D vector graphics. Compute shaders were added with OpenGL ES 3.1, but the TensorFlow team says backward-compatible design decisions limited them from reaching device GPUs’ full potential. On the other hand, OpenCL was designed for computation with various accelerators from the beginning and was thus more relevant to the domain of mobile GPU inference. This motivated the TensorFlow team’s investigation into — and eventual adoption of — an OpenCL-based mobile inference engine.

The new TensorFlow inference engine features an optimizer that chooses the right workgroup size to boost performance, resulting in up to a 50% speedup over the average on hardware like Qualcomm Adreno GPUs. It supports FP16 natively and requires accelerators to specify data types’ availability, reducing memory and bandwidth usage and training time by speeding up algorithmic computations. (Google notes that some older GPUs like the circa-2012 Adreno 305 can now operate at their full capabilities thanks to FP16 support.) And OpenCL is able to greatly outperform OpenGL’s performance by maintaining synergy with physical constant memory, a hardware feature in chips like Adreno GPUs that reserves RAM for storing constant arrays and variables.

Inference latency of MNASNet 1.3 on select Android devices with OpenCL

Above: Inference latency of MNASNet 1.3 on select Android devices with OpenCL.

Image Credit: Google

In one benchmark test, the TensorFlow team reduced the latency of MNASNet 1.3, a so-called neural architecture search system, from over 100 milliseconds on the Vivo Z3 with the OpenGL-based backend to 25 milliseconds with the OpenCL alternative. In another test with the object detection algorithm SSD MobileNet v3, the team reduced latency from nearly 100 milliseconds on the Huawei Mate 20 to less than 25 milliseconds.

Inference latency of SSD MobileNet v3 (large) on select Android devices with OpenCL

Above: Inference latency of SSD MobileNet v3 (large) on select Android devices with OpenCL.

Image Credit: Google

Google notes that OpenCL isn’t a part of the standard Android distribution, making it unavailable to some users. As a stopgap measure, TensorFlow Lite now checks for the availability of OpenCL at runtime so that if it isn’t available or can’t be loaded, the library falls back to the old OpenGL backend.

“While the TensorFlow Lite GPU team continuously improves the existing OpenGL-based mobile GPU inference engine, we also keep investigating other technologies,” TensorFlow software engineers Juhyun Lee and Raman Sarokin wrote in a blog post. “OpenCL brings quite a lot of features that let us optimize our mobile GPU inference engine.”


VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more
Become a member