Quark-X: An efficient top-K processing framework for RDF quad stores
There is a growing trend towards enriching the RDF content from its classical Subject-Predicate-Object triple form to an annotated representation which can model richer relationships such as including fact provenance, fact confidence, higher-order relationships and so on. One of the recommended ways to achieve this is to use reification and represent it as N-Quads - or simply quads-where an additional identifier is associated with the entire RDF statement which can then be used to add further annotations. A typical use of such annotations is to have quantifiable confidence values to be attached to facts. In such settings, it is important to support efficient top-k queries, typically over user-defined ranking functions containing sentence-level confidence values in addition to other quantifiable values in the database. In this paper, we present Quark-X, an RDF-store and SPARQL processing system for reified RDF data represented in the form of quads. This paper presents the overall architecture of our system - illustrating the modifications which need to be made to a native quad store for it to process top-k queries. In Quark-X, we propose indexing and query processing techniques for making top-k querying efficient. In addition, we present the results of a comprehensive empirical evaluation of our system over Yago2S and DBpedia datasets. Our performance study shows that the proposed method achieves one to two order of magnitude speed-up over baseline solutions.