Head over to our on-demand library to view sessions from VB Transform 2023. Register Here

In a paper published this week on the preprint server Arxiv.org, Amazon scientists detail a way for AI models to learn features from images that are compatible with previously computed ones. They say it enables old models to bypass computing features for all previously seen images every time new ones are added, which could save enterprises developing computer vision-enabled applications valuable time and compute power.

As the researchers explain, visual classification is often accomplished by mapping each image onto a vector space — a collection of objects called vectors — using a machine learning model. As images of a new class become available, their vectors are used to spawn a new cluster, which is used to identify the closest to one or a set of input images. Over time, the data sets grow and their quality improves with newly trained models, but in order to harvest the benefit of these new models, the new models must reprocess all images in the set to generate their vectors and create the clusters.

By contrast, the researchers’ approach enables new models to be deployed without having to re-index existing image collections. They say that it doesn’t require modification of the models’ architecture nor of the parameters of the old model — i.e., the configuration variables internal to the model whose values can be estimated from the given data. Perhaps more importantly, they also claim that it doesn’t sacrifice accuracy.

Amazon computer vision


VB Transform 2023 On-Demand

Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.


Register Now

In experiments, the researchers used the IMDB-Face data set (which contains about 1.7 million images of 59,000 celebrities) to train AI models and the IJB-C face recognition data set (which has around 130,000 images from 3,531 identities) to validate them. The models were then given two tasks: (1) deciding given a pair of templates (one or more face images from the same person) whether they belong to the same person and (2) using a template to search across a set of indexed templates.

The team says that their approach maintained a baseline level of accuracy, but they concede that it has several limitations.

“Backward compatibility is critical to quickly deploy new embedding models that leverage ever-growing large-scale training data sets and improvements in deep learning architectures and training methods, [but there’s an] accuracy gap of the new models trained with [our technique] relative to the new model oblivious of previous constraints,” they wrote. “Though the gap is reduced by slightly more sophisticated forms of BCT, there is still work to be done in characterizing and achieving the attainable accuracy limits.”

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.