Ensemble feature selection with discriminative and representative properties for malware detection
Abstract
Malware data are typically depicted with extremely high-dimensional features, which lays an excessive computational burden on detection methods. For the sake of effectiveness and efficiency, feature selection is an indispensable part for malware detection. In this paper, we propose an ensemble feature selection method with integration of discriminative and representative properties for malware detection. Based on the labeled and unlabeled data, the most discriminative and representative features are selected, respectively. The former extracts the features that are most distinctive with respect to the classes, and the latter focuses on the features that best represent the data. A comprehensive metric is subsequently obtained, which retains the most informative features.