Disaggregated RDDs: Extending and Analyzing Apache Spark for Memory Disaggregated Infrastructures

Achilleas Tzenetopoulos; Michele Gazzetti; Dimosthenis Masouros; Christian Pinto; Sotirios Xydis; Dimitrios Soudris

doi:10.1109/IC2E61754.2024.00019

IC2E 2024

Conference paper

24 Sep 2024

Disaggregated RDDs: Extending and Analyzing Apache Spark for Memory Disaggregated Infrastructures

View publication

Abstract

Apache Spark has become essential in large-scale data processing as the demand for scalable data analytics grows. With memory costs constituting a significant portion of server expenses, the under-utilization and fragmentation of resources pose a substantial challenge for data center operators reliant on economies of scale. Memory disaggregation emerges as a solution to this challenge, by leveraging remote memory pools to reduce resource fragmentation and under-utilization. Yet, these advantages are not without cost. Disaggregated memory systems introduce increased latency and reduced bandwidth, significantly impacting job execution latency. This necessitates careful optimization and management strategies to effectively balance the trade-offs between accessibility and performance. This paper introduces cache-remote, a custom Apache Spark configuration balancing memory disaggregation benefits with execution efficiency. Cache-remote uses remote memory for RDD caching (Disaggregated RDDs) and local memory for latency-sensitive computations. Our work includes a comprehensive evaluation of different memory allocation policies and Spark configurations on a hardware setup that supports memory disaggregation. We expand upon prior work by exploring a range of solutions that cater to varying tolerances for job completion latency, introducing new points to the latency-memory usage Pareto. Notably, our cache-remote approach enhances the efficiency of current disaggregated memory allocation strategies. It achieves a substantial reduction in local memory utilization - up to 24.8% - while incurring minimal execution time overhead of merely 7%, compared to local-only policies.

Conference paper