Seamlessly integrating disk and tape in a multi-tiered distributed file system
The explosion of data volumes in enterprise environments and limited budgets have triggered the need for multi-tiered storage systems. With the bulk of the data being extremely infrequently accessed, tape is a natural fit for storing such data. In this paper we present our approach to a file storage system that seamlessly integrates disk and tape, enabling a bottomless and cost-effective storage architecture that can scale to accommodate Big Data requirements. The proposed system offers access to data through a POSIX filesystem interface under a single global namespace, optimizing the placement of data across disk and tape tiers. Using a self-contained, standardized and open filesystem format on the removable tape media, the proposed system avoids dependence on proprietary software and external metadata servers to access the data stored on tape. By internally managing the tape tier resources, such as tape drives and cartridges, the system relieves the user from the burden of dealing with the complexities of tape storage. Our implementation, which is based on the GPFS and LTFS filesystems, demonstrates the applicability of the proposed architecture in real-world environments. Our experimental evaluation has shown that this is a very promising approach in terms scalability, performance and manageability. The proposed system has been productized by IBM as LTFS Enterprise Edition.