Storing and analyzing historical graph data at scale
Abstract
The work on large-scale graph analytics to date has largely focused on the study of static properties of graph snapshots. However, a static view of interactions between entities is often an oversimplification of several complex phenomena like the spread of epidemics, information diffusion, formation of online communities, and so on. Being able to find temporal interaction patterns, visualize the evolution of graph properties, or even simply compare snapshots across time, adds significant value in reasoning over graphs. However, due to the lack of underlying data management support, an analyst today has to manually navigate the added temporal complexity of dealing with large evolving graphs. In this paper, we present a system, called Historical Graph Store, that enables users to store large volumes of historical graph data and to express and run complex temporal graph analytical tasks against that data. It consists of two key components: (1) a Temporal Graph Index (TGI), that compactly stores large volumes of historical graph evolution data in a partitioned and distributed fashion - TGI also provides support for retrieving snapshots of the graph as of any timepoint in the past or evolution histories of individual nodes or neighborhoods; and (2) a Temporal Graph Analysis Framework (TAF), for expressing complex temporal analytical tasks and for executing them in an efficient and scalable manner using Apache Spark. Our experiments demonstrate our system's efficient storage, retrieval and analytics across a wide variety of queries on large volumes of historical graph data.