Quick Access to Compressed Data in Storage Systems
Primary storage systems that compress data in real time, use some form of on disk metadata to perform the virtualization needed in storing compressed data. Usually this metadata is in the form of B-trees (eventually compressed) and stored on disk. For random accesses to compressed data, where the metadata is not in cache, this additional layer significantly slows down random reads and writes. Our solution is to use much less metadata that only provides an approximation of the location of compressed data on disk and can be easily stored in the memory of the storage system. Read operations are extended to compensate for the imprecise position information in the metadata, and index marks embedded in the data are used to locate the required data within the expanded read. The data placement of written data is constrained to be described by the reduced metadata. The placement uses a piecewise linear scheme based on the locality in compressibility of data and we support this assumption with experiments.