Introducing acacia-RDF: An X10-based scalable distributed RDF graph database engine
Abstract
Linked data mining has become one of the key questions in HPC graph mining in recent years. However, the existing RDF database engines are not scalable and are less reliable in heterogeneous clouds. In this paper we describe the design and implementation of Acacia-RDF which is a scalable distributed RDF graph database engine developed with X10 programming language to solve this issue. Acacia-RDF partitions the RDF data sets into subgraphs following vertex cut paradigm. The partitioned data sets are persisted on secondary storage across X10 places. We developed a scalable SPARQL processor for Acacia-RDF which operates on top of partitioned RDF data. Furthermore, we demonstrate the implementation of scalable graph algorithms such as Triangle counting with such partitioned data sets. We present performance results gathered from Acacia with different scales of LUBM RDF benchmark data sets and make a comparison of Acacia's performance against Neo4j graph database server. From the scalability experiments conducted upto 16 X10 places, we observed that Acacia-RDF scales well with LUBM data sets. Acacia-RDF reported approximately 2 seconds elapsed time on 4 places for running the first and third queries of the LUBM benchmark on LUBM scale 40 data set. Through this work we introduce the use of X10 language for scalable RDF graph data management.