Scalable Distributed Computing Systems for Incremental Machine Learning in Big Data Applications

Dhaval Salwala; Seshu Tirupathi; Brian Quanz; Wesley Gifford; Stuart Siegel

INFORMS 2022

Talk

16 Oct 2022

Scalable Distributed Computing Systems for Incremental Machine Learning in Big Data Applications

Abstract

Terabytes of daily raw data generated by IoT sensors are indispensable for investigating time-series problems like short-term forecasting of the target variable, and failure predictions. Pure batch learning algorithms can be challenging with this high frequency and high-volume data as concept drifts would require frequent retraining of the deployed models leading to significant downtimes. Therefore, incremental models or coupled batch-incremental models are gaining increasing importance to handle these problems. In this talk, we will present a distributed computing system that can scale to perform incremental learning for big data and efficiently perform a parameter search in big data applications to dynamically generate the most efficient incremental modelling pipelines with every stream of new incoming data, followed by synthetic and real world use cases.

Workshop