Pre-processing censored survival data using inverse covariance matrix based calibration

Bhanukiran Vinzamuri; Yan Li; C. Reddy

doi:10.1109/TKDE.2017.2719028

IEEE TKDE

Paper

01 Oct 2017

Pre-processing censored survival data using inverse covariance matrix based calibration

Download paper

Abstract

Censoring is a common phenomenon that arises in many longitudinal studies where an event of interest could not be recorded within the given time frame. Censoring causes missing time-to-event labels, and this effect is compounded when dealing with datasets which have high amounts of censored instances. In addition, dependent censoring in the data, where censoring is dependent on the covariates in the data leads to bias in standard survival estimators. This motivates us to develop an approach for pre-processing censored data which calibrates the right censored (RC) times in an attempt to reduce the bias in the survival estimators. This calibration is done using an imputation method which estimates the sparse inverse covariance matrix over the dataset in an iterative convergence framework. During estimation, we apply row and column-based regularization to account for both row and column-wise correlations between different instances while imputing them. This is followed by comparing these imputed censored times with the original RC times to obtain the final calibrated RC times. These calibrated RC times can now be used in the survival dataset in place of the original RC times for more effective prediction. One of the major benefits of our calibration approach is that it is a pre-processing method for censored data which can be used in conjunction with any survival prediction algorithm and improve its performance. We evaluate the goodness of our approach using a wide array of survival prediction algorithms which are applied over crowdfunding data, electronic health records (EHRs), and synthetic censored datasets. Experimental results indicate that our calibration method improves the AUC values of survival prediction algorithms, compared to applying them directly on the original survival data.

Conference paper