Publication
SIGMOD 2007
Conference paper

BIwTL: A business information warehouse toolkit and language for warehousing simplification and automation

View publication

Abstract

Rapidly leveraging information analytics technologies to mine the mounting information in structured and unstructured forms, derive business insights and improve decision making is becoming increasingly critical to today's business successes. One of the key enablers of the analytics technologies is an Information Warehouse Management System (IWMS) that processes different types and forms of information, builds, and maintains the information warehouse (IW) effectively. Although traditional multi-dimensional data warehousing techniques, coupled with the well-known ETL processes (Extract, Transform, Load) may meet some of the requirements in an IWMS, in general, they fall short on several major aspects: 1. They often lack comprehensive support for both structured and unstructured data processing; 2. they are database-centric and require detailed database and data warehouse knowledge to perform IWMS tasks, and hence they are tedious and time-consuming to operate and learn; 3. they are often inflexible and insufficient in coping with a wide variety of on-going IW maintenance tasks, such as adding new dimensions and handling regular and lengthy data updates with potential failures and errors. To cope with such issues, this paper describes an IWMS, called BIwTL (Business Information Warehouse Toolkit and Language), that automates and simplifies IWMS tasks by devising a high-level declarative information warehousing language, GIWL, and building the runtime system components for such a language. BIwTL hides system details, e.g., databases, full text indexers, and data warehouse models, from users by automatically generating appropriate runtime scripts and executing them based on the GIWL language specification. Moreover, BIwTL supports structured and unstructured information processing by embedding flexible data extraction and transformation capabilities, while ensuring high performance processing for large datasets. In addition, this paper systematically studied the core tasks around information warehousing and identified five key areas. In particular, we describe our technologies in three areas, i.e., constructing an IW, data loading, and maintaining an IW. We have implemented such technologies in BIwTL 1.0 and validated it in real world environments with a number of customers. Our experience suggests that BIwTL is light-weight, simple, efficient, and flexible. Copyright 2007 ACM.

Date

Publication

SIGMOD 2007

Authors

Share