Efficient shared memory orchestration towards demand driven memory slicing

Qi Zhang; Ling Liu; Calton Pu; Wenqi Cao; Semih Sahin

doi:10.1109/ICDCS.2018.00121

ICDCS 2018

Conference paper

19 Jul 2018

Efficient shared memory orchestration towards demand driven memory slicing

View publication

Abstract

Memory is increasingly becoming a bottleneck for big data and latency-sensitive applications in virtualized systems. Memory efficiency is critical for high-performance execution of virtual machines (VMs). Mechanisms proposed for improving memory utilization often rely on an accurate estimation of VM working set size at runtime, which is difficult under changing workloads. This paper explores opportunities for improving memory efficiency and their impacts on the performance of VM executions. First, we show that if each VM is initialized with an application-specified lower bound memory, then by maintaining a shared memory region across VMs in the presence of temporal memory usage variations on the host, those VMs under high memory pressure can minimize their performance loss by opportunistically and transparently harvesting idle memory on other VMs. Second, we show that by enabling on-demand VM memory allocation and deallocation in the presence of changing workloads, VM performance degradation due to memory swapping can be reduced effectively, compared to the conventional VM configuration scenario, in which all VMs are allocated with the upper-bound of memory requested by their applications. Third, we show that by providing shared memory pipes between co-located VMs, the inter-VM communication can speed up by avoiding unnecessary overhead of communication via the network. We develop MemLego, a lightweight shared memory based system, to achieve all these benefits without requiring any modification to user applications and the OSes. We demonstrate the effectiveness of these opportunities through extensive experiments on unmodified Redis and MemCached. Using MemLego, the throughput of Redis and Memcached improves by up to 4x over the native system without MemLego, up to 2 orders of magnitude when the applications working set size does not fit in memory.

Conference paper