Facebook today is announcing that it’s open-sourcing some of its latest artificial intelligence (A.I.) software for segmenting objects within images. The DeepMask, SharpMask, and MultiPathNet tools are available now on GitHub under a BSD license.
It’s not as if Facebook is opening up about these programs for the first time. They’ve been described in academic papers (specifically this one, this one, and this one). Now Facebook’s Artificial Intelligence Research (FAIR) lab is connecting the dots with an extensive blog post and is also, of course, making the software available free for others to inspect and build on.
Image segmentation goes beyond just recognizing the people, places, or things in an image, or even determining their location within the image. It’s about finding the exact pixels where they reside in the image. To do that, Facebook is using a type of A.I. called deep learning, which generally entails training artificial neural networks on lots of data and then getting them to make inferences about new data.
Within Facebook, the tools work in a pipeline. “DeepMask generates initial object masks, SharpMask refines these masks, and finally MultiPathNet identifies the objects delineated by each mask,” FAIR research scientist Piotr Dollár wrote in today’s blog post.
These are not the first Facebook A.I. systems to become broadly available. For one thing, Torchnet was released in June.
These companies and individual academic research labs participate in image segmentation competitions, such as COCO. Better lab performance can translate into better apps, which can attract more users and more data, so releases like this can be meaningful.
Facebook has some ideas about how to make its apps better with these tools. As Dollár put it:
By enabling computers to recognize objects in photos, for instance, it will be easier to search for specific images without an explicit tag on each photo. People with vision loss, too, will be able to understand what is in a photo their friends share because the system will be able to tell them, regardless of the caption posted alongside the image.
. . . Furthermore, leveraging the segmentation technology we’ve been developing, our goal is to enable even more immersive experiences that allow users to “see” a photo by swiping their finger across an image and having the system describe the content they’re touching.
. . .
In addition, our next challenge will be to apply these techniques to video, where objects are moving, interacting, and changing over time. We’ve already made some progress with computer vision techniques to watch videos and, in real time, understand and classify what’s in them, for example, cats or food. Real-time classification could help surface relevant and important Live videos on Facebook, while applying more refined techniques to detect scenes, objects, and actions over space and time could one day allow for real-time narration.
See Dollár’s full blog post for more detail.