Bootstrap-based Feature Selection to Balance Model Discrimination and Predictor Significance: A Study of Stroke Prediction in Atrial Fibrillation
Abstract
Atrial fibrillation (AF) is a common cardiac arrhythmias, which increases the risk and severity of ischemic stroke. For predicting ischemic stroke in AF patients, a risk prediction model that can achieve both good model discrimination (e.g., A UC) and statistical significance ofpredictors is required in real clinical practices. In this paper, we propose a new bootstrap-based wrapper (Boots-wrapper) method of feature selection, and apply this method on Chinese Atrial Fibrillation Registry data to develop 1-year stroke prediction models in AF. The proposed method can heuristically search a subset of features to maximize the discrimination of the prediction model and minimize the penalty for the non-significant features. To achieve robust feature selection, we perform bootstrap sampling to get a more reliable estimate of the variation and significance statistics. The experimental results show that Boots-wrapper can balance model discrimination and statistical significance offeatures for developing AF stroke prediction models.