Outsourcing Data Processing Jobs with Lithops

Josep Sampé; Marc Sanchez-Artigas; Gil Vernik; Ido Yehekzel; Pedro Garcia-Lopez

doi:10.1109/TCC.2021.3129000

IEEE TCC

Paper

01 Jan 2021

Outsourcing Data Processing Jobs with Lithops

View publication

Abstract

Unexpectedly, the rise of serverless computing has also collaterally started the ‘`democratization’' of massive-scale data parallelism. This new trend heralded by PyWren pursues to enable untrained users to execute single-machine code in the cloud at massive scale through platforms like AWS Lambda. Driven by this vision, this article presents Lithops, which carries forward the pioneering work of PyWren to better exploit the innate parallelism of la MapReduce tasks atop several Functions-as-a-Service platforms. Instead of waiting for a cluster to be up and running in the cloud, makes easy the task of spawning hundreds and thousands of cloud functions to execute a large job in a few seconds from start. With Lithops, for instance, users can painlessly perform exploratory data analysis from within a Jupyter notebook, while it is the Lithops's engine which takes care of launching the parallel cloud functions, loading dependencies, automatically partitioning the data, etc. In this article, we describe the design and innovative features of Lithops and evaluate it using several representative applications, including sentiment analysis, Monte Carlo simulations, and hyperparameter tuning. These applications manifest the Lithops ability to scale single-machine code computations to thousands of cores. And very importantly, without the need of booting a cold cluster or keeping a warm cluster for occasional tasks.

Conference paper