Acacia-RDF: An X10-based scalable distributed RDF graph database engine
Linked data mining has become one of the key questions in High Performance graph mining in recent years. However, the existing Resource Description Framework (RDF) database engines are not scalable and are less reliable in heterogeneous clouds. In this paper we describe the design and implementation of Acacia-RDF which is a scalable distributed RDF graph database engine developed with X10 programming language to solve this issue. Acacia-RDF partitions the RDF data sets into subgraphs following vertex cut paradigm. The partitioned data sets are persisted on secondary storage across X10 places. We developed a scalable SPARQL (an RDF query language) processor for Acacia-RDF which operates on top of partitioned RDF data. Furthermore, we designed and implemented a replication based fault tolerance mechanism for Acacia-RDF. We present performance results gathered from Acacia with different scales of LUBM (Lehigh University Benchmark) RDF benchmark data sets. We make a comparison of Acacia-RDF's performance against Neo4j graph database server. From the scalability experiments conducted upto 16 X10 places, we observed that Acacia-RDF scales well with LUBM data sets. Acacia-RDF reported less than ten seconds elapsed times on 16 places for running the first and the third queries of the LUBM benchmark on LUBM 160 universities data set with 3.6 million vertices and 28.5 million edges which was 1.7GB in size. Through this work we describe and demonstrate the use of X10 language for development of scalable RDF graph data management systems.