Synthesizing near-optimal malware specifications from suspicious behaviors
Abstract
Fueled by an emerging underground economy, malware authors are exploiting vulnerabilities at an alarming rate. To make matters worse, obfuscation tools are commonly available, and much of the malware is open source, leading to a huge number of variants. Behavior-based detection techniques are a promising solution to this growing problem. However, these detectors require precise specifications of malicious behavior that do not result in an excessive number of false alarms. In this paper, we present an automatic technique for extracting optimally discriminative specifications, which uniquely identify a class of programs. Such a discriminative specification can be used by a behavior-based malware detector. Our technique, based on graph mining and concept analysis, scales to large classes of programs due to probabilistic sampling of the specification space. Our implementation, called HOLMES, can synthesize discriminative specifications that accurately distinguish between programs, sustaining an 86% detection rate on new, unknown malware, with 0 false positives, in contrast with 55% for commercial signature-based antivirus (AV) and 62-64% for behavior-based AV (commercial or research). © 2010 IEEE.