Big Data 2020
Conference paper

Smart-ML: A System for Machine Learning Model Exploration using Pipeline Graph

View publication


In this paper, we describe an overarching ML system with a simple programming interface that leverages existing AI and ML frameworks to make the task of model exploration easier. The proposed system introduces a new programming construct namely pipeline graph (a directed acyclic graph) consisting of multiple machine learning operations provided by different ML repositories. End user uses the pipeline graph as a common interface for modeling different ML tasks such as classification, regression, and timeseries prediction, while enabling efficient execution on different environments (Spark, Celery and Cloud). We further annotated the pipeline graph with a hyper-parameter grid and an option to try-out a wide range of optimization strategies (i.e., Random, Bayesian, Bandit, AutoLearn, etc). Given a large pre-defined pipeline graph along with its hyper-parameters, we provided a general-purpose, scalable and efficient pipeline-graph exploration technique to provide the automated solutions to a variety of ML tasks. We compare our automated approach to several state-of-the-art automated AI systems and find that we achieve performance comparable to the best results, while often producing simpler pipelines using off the shelf components. Our evaluation suite consists of experiments on 60+ classifications and regressions datasets.