Facebook's AI can copy the style of text in photos from a single word

Facebook today introduced TextStyleBrush, an AI research project that can copy the style of text in a photo from just a single word. The company claims that TextStyleBrush, which can edit and replace arbitrary text in images, is the first "unsupervised" system of its kind that can recognize both typefaces and handwriting.

AI-generated images have been advancing at a breakneck pace, and they have obvious business applications, like photorealistic translation of languages in augmented reality (AR). (The AR market was anticipated to be worth $18.8 billion by the end of 2020, according to Statista.) But building a system that's flexible enough to understand the nuances of text and handwriting is a difficult challenge because it means comprehending styles for not just typography and calligraphy but for transformations like rotations, curved text, deformations, background clutter, and image noise.

TextStyleBrush works much like style brush tools in word processors but for text aesthetics in images, according to Facebook. Unlike previous approaches, which define specific parameters like typeface or target style supervision, it takes a more holistic training approach and disentangles the content of a text image from all aspects of its appearance.

Unsupervised learning

The "unsupervised" part of the system refers to unsupervised learning, the process by which the system was subjected to "unknown" data that had no previously defined categories or labels. TextStyleBrush had to teach itself to classify data, processing the unlabeled data to learn from its inherent structure.

As Facebook notes, training systems like TextStyleBrush typically involves annotated data that teaches the system to classify individual pixels as either "foreground" or "background" objects. But it's tough to apply this to images captured in the real world. Handwriting can be one pixel in width or less, and collecting high-quality training data requires labeling the foregrounds and backgrounds.

By contrast, given a detected "text box" containing a source style, TextStyleBrush renders new content in the style of the source text using a single sample. While it occasionally struggles with text written in metallic objects and characters in different colors, Facebook says TextStyleBrush proves it's possible to build systems that can learn to transfer text aesthetics with more flexibility than was possible before.

"We hope this work will continue to lower barriers to photorealistic translation [and] creative self-expression," Facebook said in a blog post. "While this technology is research, it can power a variety of useful applications in the future, like translating text in images to different languages, creating personalized messaging and captions, and maybe one day facilitating real-world translation of street signs using AR."

The capabilities, methods, and results of the work on TextStyleBrush are available on Facebook's developer portal. The company says it plans to submit it to a peer-reviewed journal in the future.

Unsupervised learning

More