In a paper scheduled to be presented next week during the annual Conference on Computer Vision and Pattern Recognition (CVPR), scientists at IBM, Tel Aviv University, and Technion describe a novel AI model design — Label-Set Operations (LaSO ) networks — designed to combine pairs of labeled image examples (e.g., a pic of a dog annotated “dog” and a sheep annotated “sheep”) to create new examples that incorporate the seed images’ labels (a single pic of a dog and sheep annotated “dog” and “sheep”). The coauthors believe that in the future, LaSO networks could be used to augment corpora that lack sufficient real-world data.

“Our method is capable of producing a sample containing … labels present in two input samples,” wrote the researchers. “The proposed approach might also prove useful for the interesting visual dialog use case, where the user can manipulate the returned query results by pointing out or showing visual examples of what she [or] he likes or doesn’t like.”

LaSO networks learn to manipulate label sets of given samples and synthesize new ones corresponding to combined label sets, taking as input photos of different types and identifying common semantic content before implicitly removing concepts present in one sample from another sample. (A “union” operation in a LaOS network will result in a synthetic example labeled “person,” “dog,” “cat,” and “sheep,” for instance, while “intersection” and “subtraction” operations will result in examples labeled “person” and “dog” or “sheep” alone, respectively.) Because the AI models operate directly on image representations and don’t require additional inputs to control manipulations, they’re able to generalize to images containing categories that weren’t seen during training.

As the researchers explain, in few-shot learning — the practice of feeding an AI model with a very small amount of training data — only one or a very small number of samples per category are typically available. Most approaches in the image classification domain involve only single labels, where every training image contains only one object and a corresponding category label. A more challenging scenario — the scenario the team’s paper investigated — is multi-label few-shot learning, where training images contain multiple objects across multiple category labels.

 

IBM image synthesis LaOS

Above: Image retrieval done on synthetic LaSO vectors.

Image Credit: IBM Research

The researchers trained several LaSO networks jointly as a single multi-task network on a corpus with multiple labels per image mapped to the objects appearing on that image. Then, they evaluated the networks’ aptitude for classifying the outputted examples by using a classifier pre-trained on multi-label data. In a separate few-shot learning experiment, the team tapped the LaSO networks to generate additional examples out of random pairs of the few provided training examples, and devised a novel benchmark for multi-label few-shot classification.

“Multi-label few-shot classification is a new, challenging and practical task. The results of evaluating the LaSO label-set manipulation with neural networks on the proposed benchmark demonstrate that LaSO holds a good potential for this task and possibly for other interesting applications,” wrote the researchers in a forthcoming blog post. “We hope that this work will inspire more researchers to look into this interesting problem.”