Publication
ICDE 2009
Conference paper

Resolution-Aware query answering for business intelligence

View publication

Abstract

Entity uncertainty is an unavoidable problem in modern enterprise databases, resulting from integration of data over multiple sources. In traditional warehousing, the administrator, during an ETL process, manually and laboriously resolves inconsistent data records to discover "true" entities (customers, products, etc.) and identify their "correct" attribute values. At any time point, however, the current entity resolution is merely a best guess, and OLAP query results based on this resolution are inherently imprecise. We propose a new approach that maintains the data in an unresolved state, and dynamically deals with entity uncertainty at query time. We enhance the traditional OLAP model to return not a single query answer, but rather upper and lower bounds on each OLAP aggregate. This approach avoids expensive entity-resolution processing, and serves to identify potential risks when making business decisions based on the results of OLAP queries. By focusing on bounds, rather than probability distributions, we can easily and efficiently process roll-up and group-by aggregation queries over all of the core aggregation functions. Moreover, our approach can be readily implemented in an existing RDBMS using SQL queries, and does not require the user to specify explicit probabilities for alternative entity resolutions. Experiments show that the overhead of our new OLAP functionality is small over a wide range of scenarios. © 2009 IEEE.

Date

Publication

ICDE 2009

Authors

Share