About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
Big Data 2020
Conference paper
A Verifiable Imputation Analysis for Univariate Time Series and Enabling Package
Abstract
This paper proposes a verifiable imputation process and an enabling tool for univariate time series. Common ad-hoc and case-specific imputation are not enough to ensure high quality and effective imputation. We adopt the similar verification logic of supervised learning. We use artificial missing sampling as the test set to estimate a set of imputers' performances and use the estimated performances to select the best imputer. To ensure the correctness of selection, we analyze the impact of various factors on estimation accuracy. Those factors are missing rate, size of artificial missing data and patterns, selected imputers, and noise level. We propose a two-step verifiable imputation process to integrate all of the steps. With this process, we can always leverage the most suitable imputer to achieve a high quality of imputation without tedious and error-prone data cleaning efforts. We implement the tool as a Python package, with many imputers with their unique capabilities and a API. We automate the imputation through a standard process, which returns imputed results and detailed rationales of selection along with quality metrics.