Automated feature enhancement for predictive modeling using external knowledge
Abstract
Supervised machine learning is the task of learning a function that maps features to a target. The strength of that function or the model depends directly on the features provided to the learning algorithm. Specifically, a crucial means of improving the model quality is to add new predictive features. This is often performed by domain specialists or data scientists. It is a hard and time-consuming task because the domain expert needs to identify data sources for new features, join them, and then select those that actually are relevant to the prediction. We present a new system called KAFE (Knowledge Aided Feature Engineering), an interactive predictive modeling system that automatically utilizes structured knowledge present on the web to perform feature addition to improve the accuracy of predictive models. In this proposal, we describe the key techniques such as feature inference and selection, relevant data indexing, and demonstrate its use through an interactive Jupyter notebook.