Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this January 12, 2022. Learn more
In the past few years, AI has crossed the threshold from hype to reality. Today, with unstructured data growing by 23% annually in an average organization, the combination of knowledge graphs and high performance computing (HPC) is enabling organizations to exploit AI on massive datasets.
Full disclosure: Before I talk about how critical graph computing +HPC is going to be, I should tell you that I’m CEO of a graph computing, AI and analytics company, so I certainly have a vested interest and perspective here. But I’ll also tell you that our company is one of many in this space — DGraph, MemGraph, TigerGraph, Neo4j, Amazon Neptune, and Microsoft’s CosmosDB, for example, all use some form of HPC + graph computing. And there are many other graph companies and open-source graph options, including OrientDB, Titan, ArangoDB, Nebula Graph, and JanusGraph. So there’s a bigger movement here, and it’s one you’ll want to know about.
Knowledge graphs organize data from seemingly disparate sources to highlight relationships between entities. While knowledge graphs themselves are not new (Facebook, Amazon, and Google have invested a lot of money over the years in knowledge graphs that can understand user intents and preferences), its coupling with HPC gives organizations the ability to understand anomalies and other patterns in data at unparalleled rates of scale and speed.
There are two main reasons for this.
First, graphs can be very large: Data sizes of 10-100TB are not uncommon. Organizations today may have graphs with billions of nodes and hundreds of billions of edges. In addition, nodes and edges can have a lot of property data associated with them. Using HPC techniques, a knowledge graph can be sharded across the machines of a large cluster and processed in parallel.
The second reason HPC techniques are essential for large-scale computing on graphs is the need for fast analytics and inference in many application domains. One of the earliest use cases I encountered was with the Defense Advanced Research Projects Agency (DARPA), which first used knowledge graphs enhanced by HPC for real-time intrusion detection in their computer networks. This application entailed constructing a particular kind of knowledge graph called an interaction graph, which was then analyzed using machine learning algorithms to identify anomalies. Given that cyberattacks can go undetected for months (hackers in the recent SolarWinds breach lurked for at least nine months), the need for suspicious patterns to be pinpointed immediately is evident.
Today, I’m seeing a number of other fast-growing use cases emerge that are highly relevant and compelling for data scientists, including the following.
Financial services — fraud, risk management and customer 360
Digital payments are gaining more and more traction — more than three-quarters of people in the US use some form of digital payments. However, the amount of fraudulent activity is growing as well. Last year the dollar amount of attempted fraud grew 35%. Many financial institutions still rely on rules-based systems, which fraudsters can bypass relatively easily. Even those institutions that do rely on AI techniques can typically analyze only the data collected in a short period of time due to the large number of transactions happening every day. Current mitigation measures therefore lack a global view of the data and fail to adequately address the growing financial fraud problem.
A high-performance graph computing platform can efficiently ingest data corresponding to billions of transactions through a cluster of machines, and then run a sophisticated pipeline of graph analytics such as centrality metrics and graph AI algorithms for tasks like clustering and node classification, often using Graph Neural Networks (GNN) to generate vector space representations for the entities in the graph. These enable the system to identify fraudulent behaviors and prevent anti-money laundering activities more robustly. GNN computations are very floating-point intensive and can be sped up by exploiting tensor computation accelerators.
Secondly, HPC and knowledge graphs coupled with graph AI are essential to conduct risk assessment and monitoring, which has become more challenging with the escalating size and complexity of interconnected global financial markets. Risk management systems built on traditional relational databases are inadequately equipped to identify hidden risks across a vast pool of transactions, accounts, and users because they often ignore relationships among entities. In contrast, a graph AI solution learns from the connectivity data and not only identifies risks more accurately but also explains why they are considered risks. It is essential that the solution leverage HPC to reveal the risks in a timely manner before they turn more serious.
Finally, a financial services organization can aggregate various customer touchpoints and integrate this into a consolidated, 360-degree view of the customer journey. With millions of disparate transactions and interactions by end users — and across different bank branches – financial services institutions can evolve their customer engagement strategies, better identify credit risk, personalize product offerings, and implement retention strategies.
Pharmaceutical industry — accelerating drug discovery and precision medicine
Between 2009 to 2018, U.S. biopharmaceutical companies spent about $1 billion to bring new drugs to market. A significant fraction of that money is wasted in exploring potential treatments in the laboratory that ultimately do not pan out. As a result, it can take 12 years or more to complete the drug discovery and development process. In particular, the COVID-19 pandemic has thrust the importance of cost-effective and swift drug discovery into the spotlight.
A high-performance graph computing platform can enable researchers in bioinformatics and cheminformatics to store, query, mine, and develop AI models using heterogeneous data sources to reveal breakthrough insights faster. Timely and actionable insights can not only save money and resources but also save human lives.
Challenges in this data and AI-fueled drug discovery have centered on three main factors — the difficulty of ingesting and integrating complex networks of biological data, the struggle to contextualize relations within this data, and the complications in extracting insights across the sheer volume of data in a scalable way. As in the financial sector, HPC is essential to solving these problems in a reasonable time frame.
The main use cases under active investigation at all major pharmaceutical companies include drug hypothesis generation and precision medicine for cancer treatment, using heterogeneous data sources such as bioinformatics and cheminformatic knowledge graphs along with gene expression, imaging, patient clinical data, and epidemiological information to train graph AI models. While there are many algorithms to solve these problems, one popular approach is to use Graph Convolutional Networks (GCN) to embed the nodes in a high-dimensional space, and then use the geometry in that space to solve problems like link prediction and node classification.
Another important aspect is the explainability of graph AI models. AI models cannot be treated as black boxes in the pharmaceutical industry as actions can have dire consequences. Cutting-edge explainability methods such as GNNExplainer and Guided Gradient (GGD) methods are very compute-intensive therefore require high-performance graph computing platforms.
The bottom line
Graph technologies are becoming more prevalent, and organizations and industries are learning how to make the most of them effectively. While there are several approaches to using knowledge graphs, pairing them with high performance computing is transforming this space and equipping data scientists with the tools to take full advantage of corporate data.
Keshav Pingali is CEO and co-founder of Katana Graph, a high-performance graph intelligence company. He holds the W.A.”Tex” Moncrief Chair of Computing at the University of Texas at Austin, is a Fellow of the ACM, IEEE and AAAS, and is a Foreign Member of the Academia Europeana.
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.
If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing an article of your own!