Primula: A practical shuffle/sort operator for serverless computing

Marc Sanchez-Artigas; Germán T. Eizaguirre; Gil Vernik; Lachlan Stuart; Pedro Garcia-Lopez

doi:10.1145/3429357.3430522

Middleware 2020

Conference paper

07 Dec 2020

Primula: A practical shuffle/sort operator for serverless computing

View publication

Abstract

Serverless computing has recently gained much attention as a feasible alternative to always-on IaaS for data processing. However, existing severless frameworks are not (yet) usable enough to reach out to a large number of users. To wit, they still require developers to specify the number of serverless functions for a simple sort job. We report our experience in designing Primula, a serverless sort operator that abstracts away users from the complexities of resource provisioning, skewed data and stragglers, yielding the most accessible sort primitive to date. Our evaluation on the IBM Cloud platform demonstrates the usability of Primula without abandoning performance (e.g., 3x faster than a serverless Spark backend and 62% slower than a hybrid serverless/IaaS solution).

Conference paper