Facebook research reveals AI tools for improving online clothes shopping

In May, the same week Facebook announced Shops, a way for businesses to set up online stores for customers across Facebook, WhatsApp, Messenger, and Instagram, the tech giant detailed the AI and machine learning systems behind its ecommerce experiences. Facebook said its goal is to one day develop an assistant that can serve up product recommendations on the fly, and that can learn preferences by analyzing images of what’s in a person’s wardrobe while allowing the person to try new items on self-replicas and sell apparel that others can preview.

A flurry of Facebook-authored papers accepted to the Conference on Computer Vision and Pattern Recognition (CVPR) 2020 suggest the company is on its way to developing the components of this assistant. One paper describes an algorithm that uncovers and quantifies fashion influences from images taken around the world. Another demonstrates an AI model that generates 3D models of people from single images. And a third proposes a system that captures clothing's affinity with different body shapes.

Ecommerce businesses like Facebook Marketplace lean on AI to automate a host of behind-the-scenes tasks, from learning preferences and body types to understanding the factors that might influence purchase decisions. McKinsey estimates that Amazon, which recently deployed AI to handle incoming shopper inquiries, generates 35% of all sales from its product recommendation engine. Beyond ranking, AI from startups like ModiFace, Vue.ai, Edited, Syte, and Adverity enable customers to try on shades of lipstick virtually, see model images in every size, and spot trends and sales over time.

Discovering fashion style influences

As an engineer at Facebook AI Research notes in one of the papers, the clothes people wear are a function of factors like comfort, taste, and occasion but also wider and subtler influences like changing social norms, art, politics, celebrities, style icons, the weather, and the "mood" of a city in which someone lives. For this reason, quantitatively pinpointing the influences in fashion remains an intractable challenge.

The Facebook researcher, then, proposes discovering influence patterns in large photo galleries and leveraging those patterns to forecast style trends. "We contend that images are exactly the right data to answer such questions," Kristen Grauman and Ziad Al-Halah, a coauthor at the University of Texas at Austin, wrote in the paper. "Unlike vendors' purchase data, other non-visual metadata, or hype from haute couture designers, everyday photos of what people are wearing in their daily life provide an unfiltered glimpse of current clothing styles 'on the ground.'"

The novel approach begins with the extraction of a vocabulary of visual styles from unlabeled, geolocated, and time-stamped images of people. (The researchers sourced from GeoStyle, a corpus of over 7.7 million images of people in 44 cities from Instagram and Flickr.) Each style is a mixture of detected visual attributes; for example, one might capture short floral dresses in bright colors while another captures preppy collared shirts. The past trajectories of the popularity of styles are recorded to help identify time precedence and novelty, where "time precedence" refers to when a city's fashion changes before an observed influence. Then, a statistical measure calculates the degree of influence between cities, while an AI model exploits the photographic relationships to anticipate future popular styles in any location.

In experiments, the researchers tapped attribute predictions to represent each photo with 46 attributes (e.g., colors, patterns, and garment types), and they learned 50 fashion styles based on these. For each style, they inferred its popularity trajectories in individual cities over the course of weeks using the abovementioned AI model.

The researchers say the results shed light on the spread of fashion trends across the world, revealing (1) which cities exerted and received more influence on others, (2) which most affected global trends, (3) which contributed to the prominence of a given style, and (4) how a city's degree of influence itself changed over time. For instance, their approach discovered that:

Fashion hubs like Paris and Berlin exert influence on multiple cities while at the same time being influenced by few. According to the researchers, Paris influences four cities in Europe while being influenced solely by Milan, while cities like Jakarta have a one-to-one influence relation with Manila.
While some cities like London and Rio maintain a steady influence through time, others like Austin and Johannesburg demonstrate a positive trend and are gaining more influence in fashion over time.
For some of the fashion styles, a couple of cities maintain a monopoly of influence on them, whereas others are influenced almost uniformly by multiple cities. Seoul and Bangkok strongly influence six global fashion styles, for example, while Manila and Jakarta only weakly influence them.

"Our findings hint at how computer vision can help democratize our understanding of fashion influence, sometimes challenging common perceptions about what parts of the world are driving fashion," the coauthors wrote. "In addition, we demonstrate that by incorporating influence, the ... forecasting model yields state-of-the-art accuracy for predicting the future popularity of styles."

3D people renderings

The second Facebook paper proposes an AI technique for generating 3D models of clothed people, which could become the centerpiece of a future Facebook-powered fashion assistant. The system -- Animatable Reconstruction of Clothed Humans (ARCH) -- would enable users to see how they look wearing apparel in various poses not only while standing, but when walking, sitting, and crouching in a range of environments and lighting.

ARCH is an end-to-end framework for reconstructing "animation-ready" 3D clothed humans from a single view. A prediction component infers body pose and shape, allowing the system to define a semantic space and deformation field by sampling points around the body surface and assigning "skinning weights" that individually influence clothed body part transformations. (The semantic space consists of tens of thousands of 3D points where each point is associated with semantic information supporting the render, while the deformation field is represented by a mathematical operation that actually accomplishes the render.) It then learns a mathematical function that enables the generation of details like clothing wrinkles, hair style, and more, rigged in order to be used as an animatable avatar.

In experiments, the researchers trained the system on 275 3D scans from the open source RenderPeople data set and 207 scans from AXYZ for a total of 209,520 images. For each, they produced 360-degree images by rotating a camera around the vertical axis by a 1-degree interval, and they then used 38 environment maps to render scans with different natural lighting conditions.

The coauthors report that their model outperformed several baselines in experiments, producing "plausible" predictions for unseen parts like hair and the back of clothing and larger wrinkles and seams in things like pants, shirts, and shoes. It's not perfect -- they say that "rare" poses not sufficiently covered in the training data sets affect ARCH's accuracy, for one -- but they plan to continue to improve it in future work. Already, related research from Facebook Reality Labs managed to generate much more detailed 3D reconstructions of clothed people than was previously possible.

Dressing for diverse body shapes

Body shape plays an important role in determining what garments will best suit a given person, a Facebook researcher co-writes in the third paper, yet today's clothing recommendation methods take a "one shape fits all" approach. To remedy this, the researcher and a coauthor introduce Visual Body-aware Embedding (ViBE), which aims to identify garments that flatter a specific body type given an image of a person.

The team began by compiling a data set from an online shopping website called Birdsnet, which provides a range of sizes (8 to 18 in Australian measurements) in most fashion styles. Each item is worn by a number of models in different body shapes and contains the front and back views of garments along with an image of the model, their body measurements, and textual descriptions.

After collecting a total of 958 dresses and 999 tops spanning 68 fashion models, the researchers used a pretrained model to extract visual features from the catalog images, capturing the overall color, pattern, and silhouette of the clothing. They mined the most frequent words in all descriptions for all catalog entries to build a vocabulary of attributes and then obtained an array of binary attributes for each garment, which captured localized and subtle properties like specific necklines, sleeve cuts, and fabric. Lastly, they estimated a 3D human body model from each image to capture the fine-grained shape cues.

The researchers also developed an automatic approach to recommending clothing to people based on their body shapes. It maps a subject's body shape into the learned representations and, leveraging trained models, takes the closest and furthest 400 clothing items as the most and least suitable garments.

In experiments, for the slender subjects, ViBE recommended shorter dresses that fit or flare, which could show off their legs. For petite people, it found the most suitable attributes are waistbands and empire styles that create taller looks, as well as embroidery and ruffles that increase volume. For curvier body shapes, ViBE predicted the most suitable attributes are extended or 3/4 sleeves that cover the arms, v-necklines that create an extended slimmer appearance, and wrap or side-splits that define waists while revealing curves around the upper legs.

ViBE -- along with ARCH and the fashion influence predictor -- appear to be meaningful steps toward Facebook's fashion assistant. But judging by earlier statements, the company anticipates that advancements in language understanding, personalization, and "social-first" experiences must emerge before a truly predictive style assistant becomes a possibility.

"We envision a future in which [a] system could ... incorporate your friends' recommendations on museums, restaurants, or the best ceramics class in the city -- enabling you to more easily shop for those types of experiences," the company said in a previous statement. "Our long-term vision is to build an all-in-one AI lifestyle assistant that can accurately search and rank billions of products, while personalizing to individual tastes. That same system would make online shopping just as social as shopping with friends in real life. Going one step further, it would advance visual search to make your real-world environment shoppable. If you see something you like (clothing, furniture, electronics, etc.), you could snap a photo of it and the system would find that exact item, as well as several similar ones to purchase right then and there."

Discovering fashion style influences

3D people renderings

Dressing for diverse body shapes

More