Investigating hybrid SSD FTL schemes for Hadoop workloads
Abstract
The Flash Translation Layer (FTL) is the core engine for Solid State Disks (SSD). It is responsible for managing the virtual to physical address mappings and emulating the functionality of a normal block-level device. SSD performance is highly dependent on the design of the FTL. For the last few years, several FTL schemes have been proposed. Hybrid FTL schemes have gained more popularity since they try to combine the benefits of both page-level mapping and block-level mapping schemes. Examples include BAST, FAST, LAST, etc. To provide high performance, FTL designers face several cross cutting issues: the right balance between coarse and fine grain address mapping, the asymmetric nature of reads and writes, the write amplification property of Flash memory, and the wear-out behavior of ash. The MapReduce paradigm has become a very popular paradigm for performing parallel and distributed computations on large data. Hadoop, an open-source implementation of MapReduce, has accelerated MapReduce adoption. Flash SSD is increasingly being used as a storage solution in Hadoop deployments for faster processing and better energy utilization. Little work has been done to understand the endurance implications of SSD on Hadoop-based work- loads. In this paper, using a highly exible and reconfigurable kernel-level simulation infrastructure, we investigate the internal characteristics of various hybrid FTL schemes using a representative set of Hadoop workloads. Our investigation brings out the wear-out behavior of SSD for Hadoop- based workloads including wear-leveling details, garbage collection, translation and block/page mappings, and advocates the need for dynamic tuning of FTL parameters for these workloads.