Google brings cross-platform AI pipeline framework MediaPipe to the web

Roughly a year ago, Google open-sourced MediaPipe, a framework for building cross-platform AI pipelines consisting of fast inference and media processing (like video decoding). Basically, it's a quick and dirty way to perform object detection, face detection, hand tracking, multi-hand tracking, hair segmentation, and other such tasks in a modular fashion, with popular machine learning frameworks like Google's own TensorFlow and TensorFlow Lite.

MediaPipe could previously be deployed to desktop, mobile devices running Android and iOS, and edge devices like Google's Coral hardware family, but it's increasingly making its way to the web courtesy WebAssembly, a portable binary code format for executable programs, and XNNPack ML Inference Library, an optimized collection of floating-point AI inference operators. On the graphics and rendering side, MediaPipe now automatically taps directly into WebGL, a JavaScript API for rendering interactive 2D and 3D graphics within any compatible web browser, such that the browser creates a virtual machine at runtime that executes instructions very quickly.

An API facilitates communications between JavaScript and C++, allowing users to change and interact with MediaPipe graphs directly using JavaScript. And all the requisite demo assets, including AI models and auxiliary text and data files, are packaged as individual binary data packages to be loaded at runtime.

"Since everything runs directly in the browser, video never leaves the user's computer and each iteration can be immediately tested on a live webcam stream (and soon, arbitrary video)," explained MediaPipe team members Michael Hays and Tyler Mullen in a blog post.

Google leveraged the above-listed components to integrate preview functionality into a web-based visualizer -- a sort of workspace for iterating over MediaPipe flow designs. The visualizer, which is hosted at viz.mediapipe.dev, enables developers to inspect MediaPipe graphs (frameworks for building machine learning pipelines) by pasting a graph code into the editor tab or uploading a file to the visualizer. Users can pan around and zoom into the graphical representation using a mouse and scroll wheel, and the visualization reacts to changes made within the editor in real time.

Hays and Mullen note that currently, web-based MediaPipe support is limited to the demo graphs supplied by Google. Developers must edit one of the template graphs -- they can't provide their own from scratch or add or alter assets. TensorFlow Lite inference isn't supported, and the graph's computations must be run on a single processor thread.

A lack of compute shaders -- routines compiled for high-throughput accelerators -- available for the web is to blame for this last limitation, which Hays, Mullen, and team attempted to work around by using graphic cards for image operations where possible and the lightest-weight possible versions of all AI models. They plan to "continue to build upon this new platform" and to provide developers with "much more control" over time.

More