How Google is using generative AI for virtual try-ons

Continuing the product update streak from the Google I/O development conference, Google today announced it is adding virtual try-ons to Search.

Available starting today for shoppers in the U.S., the capability will make buying clothes online a tad easier. However, instead of superimposing the digital version of an outfit on the buyers’ virtual avatars, much like what many brands have done, the company is using generative AI and producing highly detailed portrayals of clothing on real models, with different body shapes and sizes.

“Our new generative AI model can take just one clothing image and accurately reflect how it would drape, fold, cling, stretch, and form wrinkles and shadows on a diverse set of real models in various poses. We selected people ranging in sizes XXS-4XL representing different skin tones, body shapes, ethnicities and hair types,” Lilian Rincon, senior director of product management at Google, said in a blog post.

So, how is generative AI enabling virtual try-ons?

Most virtual try-on tools in the market create dressed-up avatars by using techniques like geometric warping, which deforms a clothing image to fit a person’s image/avatar. The method works but the output is often not perfect, with clear fitting errors — unnecessary folds, for example.

To address this, Google developed a new diffusion-based AI model. Diffusion is the process of training a model by adding extra pixels to an image until it becomes unrecognizable and then reversing (or denoising) it until the original image is reconstructed in perfect quality. The model learns from this and gradually starts generating new, high-quality images from random, noised images.

In this case, the internet giant tapped its Shopping Graph (a comprehensive dataset of products and sellers) to train its model on images of people representing different body shapes, sizes, etc. The training was done using millions of image pairs, each showing a different person wearing an outfit in two different poses.

Using this data and the diffusion technique, the model learned to render outfits on the images of different people standing in different poses, whether sideways or forward. This way, whenever a user exploring an outfit on Search hits the try-on button, they can select a model with a similar body shape and size and see how the outfit would fit them. The garment and model image chosen act as the input data.

“Each image is sent to its own neural network (a U-net) and shares information with [the] other [network] in a process called 'cross-attention' to generate the output: a photorealistic image of the person wearing the garment,” Ira Kemelmacher-Shlizerman, senior staff research scientist at Google, noted in a separate blog post.

That said, it is important to note that the try-on feature works only for women’s tops from brands across Google at the moment. As the training data grows and the model expands, it will cover more brands and items.

Google says virtual try-on for men will launch later this year.

So, how is generative AI enabling virtual try-ons?

More