Designing AI systems capable of accurate instance-level landmark recognition (i.e., distinguishing Niagara Falls from just any waterfall) and retrieving images (matching objects in an image to other instances of that object in a catalog) is a longstanding pursuit of Google’s AI research division. Last year, it released Google-Landmarks, a landmarks data set it claimed at the time was the world’s largest, and hosted two competitions (Landmark Recognition 2018 and Landmark Retrieval 2018) in which more than 500 machine learning researchers participated.

Today, in a significant step toward its goal of more sophisticated landmark-detecting computer vision models, Google open-sourced Google-Landmarks-v2, a new, larger landmark recognition corpus containing twice as many photos and seven times as many landmarks. Additionally, it’s launched two new challenges (Landmark Recognition 2019 and Landmark Retrieval 2019) on Kaggle, its machine learning community, and released the source code and model for Detect-to-Retrieve, a framework for regional image retrieval.

“Both instance recognition and image retrieval methods require ever-larger datasets in both the number of images and the variety of landmarks in order to train better and more robust systems,” wrote Google AI software engineers Bingyi Cao and Tobias Weyand. “We hope that this dataset will help advance the state-of-the-art in instance recognition and image retrieval.”

Google landmarks data set

Above: Heatmap of the landmark locations in Google-Landmarks-v2.

Image Credit: Google

According to Bingyi and Weyand, Google-Landmarks-v2 contains over 5 million images of more than 200,000 different landmarks collected from photographers around the world. The photogs in question labeled their own images — which depict the Neuschwanstein Castle, Golden Gate Bridge, Kiyomizu-dera, Burj Khalifa, Great Sphinx of Giza, Machu Picchu, and other famous sights — and submitted them for inclusion. Then, Google researchers supplemented them with historical and lesser-known images from Wikimedia Commons, the Wikimedia Foundation’s online repository of free-use images, sounds, and other media.

So what’s the deal with Detect-to-Retrieve framework? Cao and Weyand say the published model — which was trained on a subset of 80,000 from the original landmarks data set — leverages bounding boxes from an object detection model to give “extra weight” to image regions containing items of interest, significantly improving accuracy.

Both Landmark Recognition 2019, which tasks entrants with designing landmark-detecting AI models, and Landmark Retrieval 2019, which has competitors use an AI system to find images showing a target landmark, are open for entry. Both include cash prizes totaling $50,000, and Bingyi and Weyand say the winning teams will be invited to present their methods at the Second Landmark Recognition Workshop at the 2019 Conference on Computer Vision and Pattern Recognition in Long Beach, California later this year.