Did you miss a session from the Future of Work Summit? Head over to our Future of Work Summit on-demand library to stream.

In a new study, researchers at Amazon describe a technique that factors in information about knowledge graphs to perform entity alignment, which entails determining which elements of different graphs refer to the same “entities” (which might be anything from products to song titles). The idea is to improve computational efficiency while at the same time improving performance, speeding up graph-related tasks like product searches on Amazon and question answering via Alexa.

The work, which was accepted to the 2020 Web Conference, might also benefit graphs beyond Amazon, such as those that underpin social networks like Facebook and Twitter, as well as graphs used by enterprises to organize various digital catalogs.

As Amazon product graph applied scientist Hao Wei explains in a blog post, the advantage of knowledge graphs — mathematical objects consisting of nodes and edges — is that they can capture complex relationships more easily than conventional databases. (For example, in a movie data set, a node might represent an actor, a director, a film, or a film genre, while the edges represent who acted in what, who directed what, and so on.) Expanding a graph often involves integrating it with another knowledge graph, but different graphs might use different terms for the same entities, which can lead to errors.

Amazon knowledge graph

Amazon’s proposed system is a graph neural network, where nodes are converted to a fixed-length vector representation that captures information about attributes useful for entity alignment. The network considers the central node and the nodes nearby it, and for each of these nodes it produces a new embedding that consists of the node’s first embedding concatenated with the sum of its immediate neighbors’ embeddings. Additionally, the network produces a new embedding for the central node, which consists of that node’s embedding concatenated with the summation of the secondary embeddings of its immediate neighbors.

The researchers report that in tests involving the integration of two Amazon movie databases, their system improved upon the best-performing of 10 baseline systems by 10% on a metric called area under the precision-recall curve (PRAUC), which evaluates the trade-off between true-positive and true-negative rates. Furthermore, compared with a baseline system called DeepMatcher, which was specifically designed with scalability in mind, the Amazon system reduced training time by 95%.


VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more
Become a member