Data prefetching for large tiered storage systems

Giovanni Cherubini; Yusik Kim; Mark A. Lantz; Vinodh Venkatesan

doi:10.1109/ICDM.2017.99

ICDM 2017

Conference paper

15 Dec 2017

Data prefetching for large tiered storage systems

View publication

Abstract

In multi-tier storage systems with large amounts of data, most of the data is stored on inexpensive slower tiers such as cloud or tape to achieve cost savings. This also implies that retrieving the data from the slower storage tiers incurs high latency. Therefore, it would be beneficial to proactively prefetch data from slower tiers to faster tiers by predicting future data accesses. State-of-the-art access prediction methods typically record access history of individual files, data objects, or data segments. However, in systems with large amounts of infrequently accessed (or cold) data, file-level access history is often unavailable for much of the data due to the low frequency of access. In this paper, we extract information from file metadata to predict file accesses in a storage system. The proposed method relies on the hypothesis that users and applications access data stored in the system in a given context and that the context and, therefore, the set of files that are likely to be accessed can be identified by detecting access patterns in file metadata. As an application, we consider the LOFAR radio telescope's long term archive, where the access patterns are learned based on a rich set of metadata, and these patterns are then used to make predictions as to likely future accesses by the astronomers.

Paper