Machine Learning Platform for Extreme Scale Computing on Compressed IoT Data
Abstract
With the lowering costs of sensors, high-volume and high-velocity data are increasingly being generated and analyzed, especially in IoT domains like energy and smart homes. Consequently, applications that require accurate short-term forecasts and predictions are also steadily increasing. In this paper, we provide an overview of a novel end-to-end platform that provides efficient ingestion, compression, transfer, query processing, and machine learning-based analytics for high-frequency and high-volume time series from IoT. The performance of the platform is evaluated using real-world dataset from RES installations. The results show the importance of high-frequency analytics and the surprisingly positive impact of error bounded lossy compression on machine learning in the form of AutoML. For example, when detecting yaw misalignments in wind turbines, an improvement of 9% in accuracy was observed for AutoML models on lossy compressed data compared to the current industry standard of 10-minute aggregated data. Thus, these small-scale experiments show the potential of the platform, and larger pilots are planned.