DSServe - Data Science using Serverless

Dhaval Patel; Shuxin Lin; Jayant Kalagnanam

doi:10.1109/BigData55660.2022.10020441

Big Data 2022

Conference paper

17 Dec 2022

DSServe - Data Science using Serverless

View publication

Abstract

AI Applications uses various data science tools such as Jupyter notebook to prescribe a series of steps, commonly referred as workflow, for building AI Solutions. The steps in workflow can be as simple as loading the data from remote storage, visualize the data for better understanding or conducting data quality study, or it can be as complex as generating features for modeling, best model discovery processes, etc. Clearly, different steps of the data science workflow has varying requirement of compute resources. Moreover, the execution of steps in workflow are Adhoc and Subjective. With wider availability of various Serverless technology, in this paper, we demonstrate a generalized framework that can be used to provide on demand scale out capability for the Data Science Workflow. In particular, we selected the most common AI operation, namely Automatic Model Selection, as an example to demonstrate benefits of serverless computing. We conducted a detailed experimental results using IBM Code Engine technology to validate the benefits of our proposed approach.

Conference paper