Google open-sources AI image segmentation models optimized for Cloud TPUs

Google's custom tensor processing unit (TPU) chips, the latest generation of which became available to Google Cloud Platform customers last year, are tailor-made for AI inferencing and training tasks like image recognition, natural language processing, and reinforcement learning. To support the development of apps that tap them, the Mountain View company has steadily open-sourced architectures like BERT (a language model), MorphNet (an optimization framework), and UIS-RNN (a speaker diarization system), often along with data sets. Continuing in that vein, Google is today adding two new models for image segmentation to its library, both of which it claims achieve state-of-the-art performance deployed on Cloud TPU pods.

The models -- Mask R-CNN and DeepLab v3+ -- automatically label regions in an image and support two types of segmentation. The first kind, instance segmentation, gives each instance of one or multiple object classes (e.g., people in a family photo) a unique label, while semantic segmentation annotates each pixel of an image according to the class of object or texture it represents. (A city street scene, for instance, might be labeled as “pavement,” “sidewalk,” and “building.")

As Google explains, Mask R-CNN is a two-stage instance segmentation system that can localize multiple objects at once. The first stage extracts patterns from an input photo to identify potential regions of interest, while the second stage refines those proposals to predict object classes before generating a pixel-level mask for each.

DeepLab 3+, on the other hand, prioritizes segmentation speed. Trained on the open source PASCAL VOC 2012 image corpus using Google's TensorFlow machine learning framework on the latest-generation TPU hardware (v3), it's able to complete training in less than five hours.

Tutorials and notebooks in Google's Colaboratory platform for Mask R-CNN and DeepLab 3+ are available as of this week.

TPUs -- application-specific integrated circuits (ASICs) that are liquid-cooled and designed to slot into server racks -- have been used internally to power products like Google Photos, Google Cloud Vision API calls, and Google Search results. The first-generation design was announced in May at Google I.O, and the newest -- the third generation -- was detailed in May 2018. Google claims it offers up to 100 petaflops in performance, or about 8 times that of its second-generation chips.

Google isn't the only one with cloud-hosted hardware optimized for AI. In March, Microsoft opened Brainwave --a fleet of field-programmable gate arrays (FPGAs) designed to speed up machine learning operations -- to select Azure customers. (Microsoft said that this allowed it to achieve 10 times faster performance for the models that power its Bing search engine.) Meanwhile, Amazon provides its own FPGA hardware to customers, and is reportedly developing an AI chip that will accelerate its Alexa speech engine's model training.

More