Google's TensorFlow Similarity helps AI models find related items

Let the OSS Enterprise newsletter guide your open source journey! Sign up here.

Google today announced TensorFlow Similarity, a Python package designed to train similarity models with the company's TensorFlow machine learning framework. Similarity models search for related items, for example finding similar-looking clothes and identifying currently playing songs.

As Google explains, many similarity models are trained using a technique called contrastive learning. Contrastive learning, in turn, relies on clustering algorithms, which automatically identify patterns in data by operating on the theory that data points in groups should have similar features.

Contrastive learning allows a model to project items into an "embedding space" when applied to a dataset, such that the distances between embeddings -- mathematical representations of the items -- are indicative of how similar the input examples are. Training with TensorFlow Similarity yields a space where the distance between similar items remains small while the distance between dissimilar items is large. For instance, training a similarity model on the Oxford-IIIT Pet dataset leads to clusters where similar-looking breeds are close by and cats and dogs are separated.

Training similarity models

Once a model is trained, TensorFlow Similarity builds an index that contains the embeddings of the various items to make them searchable. According to Google, the library enables searches over millions of indexed items, retrieving the top similar matches within a fraction of second. Moreover, TensorFlow Similarity can add an unlimited new number of classes to the index without having to retrain, instead computing only the embeddings for representative items of the new classes.

While the initial release of the library is focused on providing components to build contrastive learning-based similarity models, Google says it'll add support for additional types of models to TensorFlow Similarity in the future. "The ability to search for related items has many real world applications," Google's Elie Bursztein and Owen S. Vallis wrote in a blog post. "More generally, being able to quickly retrieve related items is a vital part of many core information systems such as multimedia searches, recommender systems, and clustering pipelines."

TensorFlow Similarity is available in open source via GitHub. In addition, Google has released a programming notebook with a tutorial on basic usage.

The release of TensorFlow Similarity follows the launches of other TensorFlow extensions focused on particular kinds of models and use cases. In 2019, Google debuted TensorFlow Privacy, a library intended to make it easier for developers to train AI models with "strong privacy guarantees." And last year, the company released an experimental module that tests the security of AI models.