Efficient implementation of large-scale multi-structural databases

Ronald Fagin; Ph. Kolaitis; R. Kumar; J. Novak; D. Sivakumar; A. Tomkins

VLDB 2005

Conference paper

01 Dec 2005

Efficient implementation of large-scale multi-structural databases

Abstract

In earlier work, we defined "multi-structural databases," a data model to support efficient analysis of large, complex data sets over multiple numerical and hierarchical dimensions. We defined three types of queries over this data model, each of which required solving an optimization problem. An example is to find the ten most significant nonoverlapping regions of geography crossed with time in which coverage of the Olympics was much stronger in newspapers than online sources. In this paper, we present a general query framework capturing the original three queries as part of a much broader family. We then give efficient algorithms for particular subclasses of this family. Finally, we describe an implementation of these algorithms that operates on a collection of several billion web documents. Using our algorithms in conjunction with random sampling techniques, our system can solve these queries in real time.

Conference paper