About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
VLDB 2005
Conference paper
Efficient implementation of large-scale multi-structural databases
Abstract
In earlier work, we defined "multi-structural databases," a data model to support efficient analysis of large, complex data sets over multiple numerical and hierarchical dimensions. We defined three types of queries over this data model, each of which required solving an optimization problem. An example is to find the ten most significant nonoverlapping regions of geography crossed with time in which coverage of the Olympics was much stronger in newspapers than online sources. In this paper, we present a general query framework capturing the original three queries as part of a much broader family. We then give efficient algorithms for particular subclasses of this family. Finally, we describe an implementation of these algorithms that operates on a collection of several billion web documents. Using our algorithms in conjunction with random sampling techniques, our system can solve these queries in real time.