Abstract
The need to analyze huge amount of data for various business intelligence applications is well known. However, the rate at which enterprise data is generated now demands periodic migration of older data from the operational data warehouse to magnetic tapes. In this paper, we propose an "Active data archival service" in which the data is seamlessly archived on the cloud while ensuring that the archived data can be queried without any perceptible change to the end-user. This takes the burden of maintaining the archive off the user and shifts it to the archival service. We discuss the architecture of the service, challenges arising therein due to the federation of data brought on by the archival and how we handle these issues. Specifically, we investigate how the relational data needs to be transformed so that storing and retrieving the data from the cloud is efficient and seamless to the end user. We present our insights through an experimental study using TPC-DS benchmark. © 2012 IEEE.