About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
Data and Knowledge Engineering
Paper
An efficient workload-based data layout scheme for multidimensional data
Abstract
Physical data layout is a crucial factor in the performance of queries and updates in large data warehouses. Data layout enhances and complements other performance features such as materialized views and dynamic caching of aggregated results. Prior work has identified that the multidimensional nature of large data warehouses imposes natural restrictions on the query workload. A method based on a "uniform" query class approach has been proposed for data clustering and shown to be optimal. However, we believe that realistic query workloads will exhibit data access skew. For instance, if time is a dimension in the data model, then more queries are likely to focus on the most recent time interval. The query class approach does not adequately model the possibility of multidimensional data access skew. We propose the affinity graph model for capturing workload characteristics in the presence of access skew and describe an efficient algorithm for physical data layout. Our proposed algorithm considers declustering and load balancing issues which are inherent to the multidisk data layout problem. We demonstrate the validity of this approach experimentally. © 2001 Elsevier Science B.V. All rights reserved.