Cognito: Automated Feature Engineering for Supervised Learning
Abstract
Feature engineering involves constructing novel features from given data with the goal of improving predictive learning performance. Feature engineering is predominantly a human-intensive and time consuming step that is central to the data science workflow. In this paper, we present a novel system called 'Cognito', that performs automatic feature engineering on a given dataset for supervised learning. The system explores various feature construction choices in a hierarchical and non-exhaustive manner, while progressively maximizing the accuracy of the model through a greedy exploration strategy. Additionally, the system allows users to specify domain or data specific choices to prioritize the exploration. Cognito is capable of handling large datasets through sampling and built-in parallelism, and integrates well with a state-of-The-Art model selection strategy. We present the design and operation of Cognito, along with experimental results on eight real datasets to demonstrate its efficacy.