About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
KDD 1997
Conference paper
Anytime Exploratory Data Analysis for Massive Data Sets
Abstract
Exploratory data analysis is inherently an iterative, interactive endeavor. In the context of massive data sets, however, many current data analysis algorithms will not scale appropriately to permit interaction on a human time-scale. In this paper "anytime data analysis"is proposed as a general framework to enable exploratory data analysis of massive data sets. Anytime data analysis takes into account not only the quality of the model being fit but also the resources (time and memory) used to achieve that fit. The framework is discussed in some detail for interactive multivariate density estimation. Out-of-sample log-likelihood and model combination techniques (such as stacking) are used to greedily explore the data landscape. The method is applied to two significant scientific data sets where it is shown that it can be better to combine multiple "cheap-to-construct"models than to spend the same time optimizing the parameters of a single more complex model.