Researchers find way to boost self-supervised AI models' robustness

In self-supervised learning -- an AI technique where the training data is automatically labeled by a feature extractor -- the extractor can exploit low-level features (known as "shortcuts") that cause it to ignore useful representations. In search of a technique that might help remove these shortcuts autonomously, researchers at Google Brain developed a framework -- a "lens" -- that enabled self-supervised models to outperform those trained in a conventional fashion.

As the researchers explain in a preprint paper published this week, in self-supervised learning, extractor-generated labels are used to create a pretext task that requires learning abstract, semantic features. A model pretrained on the task can then be transferred to tasks for which labels are expensive to obtain, for example by fine-tuning the model for a given target task. But defining pretext tasks is often challenging because models are biased toward exploiting the simplest features, like logos, watermarks, and color fringes caused by camera lenses.

Fortunately, the features that a model can use to solve a pretext task can be used by an adversary to make the pretext task harder.The researchers' framework -- which targets self-supervised computer vision models -- processes images with a lightweight image-to-image model called a "lens" that is trained adversarially to reduce pretext task performance. Once trained, the lens can be applied to unseen images so it can be used when transferring the model to a new task. In addition, the lens can help visualize the shortcuts by spotlighting the differences between the input and output images, providing insights into how shortcuts differ.

In experiments, the researchers trained a self-supervised model on an open source data set -- CIFAR-10 -- and tasked it with predicting the correct orientation of images rotated slightly. To test the lens, they added shortcuts to the input images with directional information that let the model solve the rotation task without having to learn object-level features. The researchers report that representations the model learned (without the lens) from the synthetic shortcuts performed poorly, while feature extractors learned from the lens performed "dramatically" better overall.

In a second test, the team trained a model on over a million images in the open source ImageNet corpus and had it predict the relative location of one or more patches contained within the images. They say that for all tested tasks, adding the lens led to an improvement over the baseline.

"Our results show that the benefit of automatic shortcut removal using an adversarially trained lens generalizes across pretext tasks and across data sets. Furthermore, we find that gains can be observed across a wide range of feature extractor capacities," wrote the study's coauthors. "Apart from improved representations, our approach allows us to visualize, quantify, and compare the features learned by self-supervision. We confirm that our approach detects and mitigates shortcuts observed in prior work and also sheds light on issues that were less known."

In future research, they plan to explore new lens architectures and see whether the technique can be applied to further improve supervised learning algorithms.

More