A Verifiable Imputation Analysis for Univariate Time Series and Enabling Package

Nianjun Zhou; Dhaval Patel; Arun Iyengar; Shrey Shrivastava; Anuradha Bhamidipaty

doi:10.1109/BigData50022.2020.9377909

Big Data 2020

Conference paper

10 Dec 2020

A Verifiable Imputation Analysis for Univariate Time Series and Enabling Package

View publication

Abstract

This paper proposes a verifiable imputation process and an enabling tool for univariate time series. Common ad-hoc and case-specific imputation are not enough to ensure high quality and effective imputation. We adopt the similar verification logic of supervised learning. We use artificial missing sampling as the test set to estimate a set of imputers' performances and use the estimated performances to select the best imputer. To ensure the correctness of selection, we analyze the impact of various factors on estimation accuracy. Those factors are missing rate, size of artificial missing data and patterns, selected imputers, and noise level. We propose a two-step verifiable imputation process to integrate all of the steps. With this process, we can always leverage the most suitable imputer to achieve a high quality of imputation without tedious and error-prone data cleaning efforts. We implement the tool as a Python package, with many imputers with their unique capabilities and a API. We automate the imputation through a standard process, which returns imputed results and detailed rationales of selection along with quality metrics.

Paper