Why knowledge graphs are key to working with data efficiently, powerfully

This post is by Dr. Mukta Paliwal, senior data scientist at Persistent Systems.

As many as 50% of Gartner client inquiries on the topic of artificial intelligence involve a discussion involving the use of graph technology, the market research firm said in its Top 10 Data and Analytics Trends for 2021. Every large enterprise wants to exploit available data to bring more insights for doing business at scale. To achieve this, connected data has become a logical need, as it helps in bringing context within the existing organizational data to create knowledge.

Businesses have to face the pace of constantly evolving data needs. Knowledge graphs can help companies move away from traditional databases and use the power of natural language processing, machine learning, and semantics to better leverage data.

What is a knowledge graph?

Knowledge graphs represent a collection of interlinked facts about a domain. Essentially, entities and relations are extracted from the unstructured data and stored in the form of a triple: subject-predicate-object. For example, the statement “Captain Marvel is the strongest Avenger” can be broken into a subject (Captain Marvel), a predicate (is the strongest) and an object (Avenger) and stored as a triple (Captain Marvel-is the strongest-Avenger) along with other related entities in a knowledge graph of Avengers, the popular Marvel movie characters.

Essentially, we can define knowledge graphs with these features: 1) they define real-world entities of a domain; (2) they provide relationships between them; (3) they define rules for possible classes of entities and relations via some schema; (4) they enable reasoning to infer new knowledge.

Knowledge graphs can be auto-generated or human-curated, may have been designed with a rigid ontology or may be evolving with time, can be in different shapes and sizes, and may have been developed by a company or by an open-source community. Irrespective of these differences, they help in organizing unstructured data in a way that information can easily be extracted where explicit relations between multiple entities help in the process.

Why use knowledge graphs?

A knowledge graph is self-descriptive, as it provides a single place to find the data and understand what it is all about. As the meaning of the data is encoded alongside the data in the graph itself, the word semantics is associated with the knowledge graph. Knowledge graphs bring additional value by providing:

Context: Knowledge graphs provide context to algorithms by integrating various types of information into an ontology and flexibility to add new derived knowledge on the go. Most traditional knowledge graphs can simultaneously use various types of raw data.
Efficiency: Once desired entities and relations are available, knowledge graphs offer computational efficiencies for querying stored data resulting in effective use of data for generating insights.
Explainability: Large networks of entities and relations provide solutions for the issue of understandability by integrating the meaning of entities available within the graph itself. As such, knowledge graphs become intrinsically explainable.

Where to use knowledge graphs

According to Gartner’s Top 10 Data and Analytics Trends for 2021, knowledge graphs are the foundation of modern data and analytics, with capabilities to enhance and improve user collaboration, machine learning models, and explainable AI. Although graph technologies are not new to data and analytics, there has been a shift in the way they are used. A knowledge graph brings together machine learning and graph technologies to give AI the context it needs.

To solve complex problems, where there is a need to integrate multiple unstructured and semi-structured sources of data coming from a variety of sources, we need connected, reusable, and flexible data foundation to reflect the complexity of the real world. Connected data, enriched with meaning, allows for multiple interpretations from the same data, which is helpful in getting answers to complex queries to derive insights with more efficiency.

Organizations are identifying an increasing number of use cases for knowledge graphs, including:

Fraud detection: Identifying fraudulent transactions is the most prevalent use case and has applications in banking, mobile phone transactions, government benefits and tax fraud. The use of knowledge graphs also enhances fraud, waste, and abuse detection on insurance claims. Knowledge graphs empowered by machine learning and reasoning capabilities allow companies to better identify fraudulent patterns by traversing many real-time interconnected entities in a large network.

Drug discovery: Drug discovery is an extremely complex and cost-intensive process. Knowledge graphs have shown considerable promise across a range of tasks, including drug repurposing, drug interactions, and target gene-disease prioritization. A large number of open- source databases are integrated along with published literature to create huge biomedical knowledge graphs. These KGs have become very helpful in mining the relations between entities like genes, drugs, disease, etc. and use them in downstream applications.

Semantic search: A knowledge graph stores meanings of the entities; hence, knowledge graph-powered search is referred to as “semantic search,” or search enriched with meaning. Semantic search is used to improve the accuracy of search results when exploring the internet or the internal systems of an organization. For semantic search to work, along with a well- curated knowledge graph, the capabilities of text analytics and indexing techniques are used.

Recommender systems: Recommender systems are developed to model users’ preferences for personalized recommendations of products. There are a variety of modeling techniques used to develop the recommendation system. In spite of their considerable merit, these systems suffer from such challenges as data sparsity, cold start, and expandability of the recommendations. Knowledge graph-based recommender systems are able to help solve these challenges to an extent. In this approach, user and item entities are connected through multiple relationships. The relations are used to obtain a probable candidate list for the target user, and the path between target user and recommended item is used as an explanation for recommended items.

Mukta Paliwal is senior data scientist at Persistent Systems. She leads and consults with teams to create and deliver cutting-edge software solutions based on AI/ML in multiple business domains. She has a Ph.D. in Applied Machine Learning.

What is a knowledge graph?

Why use knowledge graphs?

Where to use knowledge graphs

More