MCDB: A monte carlo approach to managing uncertain data
Ravi Jampani, Luis Leopoldo Perez, et al.
SIGMOD 2008
Large-scale Machine Learning (ML) algorithms are often iterative, using repeated read-only data access and I/O-bound matrix-vector multiplications. Hence, it is crucial for performance to fit the data into single-node or distributed main memory to enable fast matrix-vector operations. General-purpose compression struggles to achieve both good compression ratios and fast decompression for block-wise uncompressed operations. Therefore, we introduce Compressed Linear Algebra (CLA) for lossless matrix compression. CLA encodes matrices with lightweight, value-based compression techniques and executes linear algebra operations directly on the compressed representations. We contribute effective column compression schemes, cache-conscious operations, and an efficient sampling-based compression algorithm. Our experiments show good compression ratios and operations performance close to the uncompressed case, which enables fitting larger datasets into available memory. We thereby obtain significant end-to-end performance improvements.
Ravi Jampani, Luis Leopoldo Perez, et al.
SIGMOD 2008
Ahmed Elgohary, Matthias Boehm, et al.
SIGMOD Record
Peter J. Haas, Jeffrey F. Naughton, et al.
Journal of Computer and System Sciences
Joseph P. Bigus, M. Campbell, et al.
IBM J. Res. Dev