Combinatorial feature selection problems
Abstract
Motivated by frequently recurring themes in information retrieval and related disciplines, we define a genre of problems called combinatorial feature selection problems. Given a set S of multidimensional objects, the goal is to select a subset K of relevant dimensions (or features) such that some desired property Π holds for the set S restricted to K. Depending on Π, the goal could be to either maximize or minimize the size of the subset K. Several well-studied feature selection problems can be cast in this form. We study the problems in this class derived from several natural and interesting properties Π, including variants of the classical p-center problem as well as problems akin to determining the VC-dimension of a set system. Our main contribution is a theoretical framework for studying combinatorial feature selection, providing (in most cases essentially tight) approximation algorithms and hardness results for several instances of these problems.