Publication
SIGMOD 2020
Conference paper

IBM Db2 Graph: Supporting Synergistic and Retrofittable Graph Queries Inside IBM Db2

View publication

Abstract

To meet the challenge of analyzing rapidly growing graph and network data created by modern applications, a large number of graph databases have emerged, such as Neo4j and JanusGraph. They mainly target low-latency graph queries, such as finding the neighbors of a vertex with certain properties, and retrieving the shortest path between two vertices. Although many of the graph databases handle the graph-only queries very well, they fall short for real life applications involving graph analysis. This is because graph queries are not all that one does in an analytics workload of a real life application. They are often only a part of an integrated heterogeneous analytics pipeline, which may include SQL, machine learning, graph, and other analytics. This means graph queries need to be synergistic with other analytics. Unfortunately, most existing graph databases are standalone and cannot easily integrate with other analytics systems. In addition, many graph data (data about relationships between objects or people) are already prevalent in existing non-graph databases, although they are not explicitly stored as graphs. None of existing graph databases can retrofit graph queries onto these existing data without transferring or transforming data. In this paper, we propose an in-DBMS graph query approach, IBM Db2 Graph, to support synergistic and retrofittable graph queries inside the IBM Db2 relational database. It is implemented as a layer inside Db2, and thus can support integrated graph and SQL analytics efficiently. Db2 Graph employs a novel graph overlay approach to expose a graph view of the relational data. This approach flexibly retrofits graph queries to existing graph data stored in relational tables, without expensive data transfer or transformation. In addition, it enables efficient execution of graph queries with the help of Db2 relational engine, through sophisticated compile-time and runtime optimization strategies. Our experimental study, as well as our experience with real customers using Db2 Graph, showed that Db2 Graph can provide very competitive and sometimes even better performance on graph-only queries, compared to existing graph databases. Moreover, it optimizes the overall performance of complex analytics workloads.