About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
Big Data 2018
Conference paper
Mira: Sharing Resources for Distributed Analytics at Small Timescales
Abstract
Modern distributed analytics stacks consist of application frameworks that enable processing of large amounts of data, and a resource manager that allows applications to share computational resources. The initial use case for these systems was running batch jobs with long lifetimes (e.g., a few hours), but, since their inception, new use cases have emerged where users increasingly use them to gain insight interactively, or even online. Efficiently sharing resources under these additional use cases, requires operating at smaller timescales (minutes or even seconds) than the existing systems were designed for and are capable of.In this paper, we present Mira, a system for optimized elastic execution of short-running and interactive data-analytics applications with low-latency execution startup, fast resource management and efficient resource utilization on shared clusters. We analyze the resource sharing overheads in a commonly used distributed processing stack (Spark+YARN) and reveal opportunities to accelerate applications in shared environments. Our experiments show, that Mira is able to reduce resource sharing related overheads by more than 400× and reduce application runtime by up to 4.2×.