Dynamic Possible Source Count Analysis for Data Leakage Prevention
Abstract
Dynamic Taint Analysis (DTA) is a widely studied technique that can effectively detect various attacks and information leakage. In the context of detecting information leakage, taint is a flag added to data to indicate whether secret data can be inferred from it. DTA tracks the flow of tainted data in a language runtime environment and identifies secret data leakage when tainted data is transmitted externally. We found that existing DTAs can produce false negatives and false positives in complex data flows because of the binary nature of taint. Since taint is binary, meaning either secret data is inferable (=1) or non-inferable (=0), it cannot represent intermediate states that may slightly infer the secret data, and these states are quantized to 0 or 1. As a result of this quantization, existing methods are unable to distinguish between outputs that are practically secure and those that pose a real security threat in complex data flows, resulting in false positives and false negatives. To address this problem, we introduce the concept of Possible Source Count (PSC) and propose Dynamic Possible source Count Analysis (DPCA), which tracks PSC instead of taint. PSC is a metric that indicates how many secrets can be identified by observing the data. DPCA tracks and computes the PSC of each data item using dynamic symbolic execution. By evaluating the PSC of data that reaches the sink point, DPCA can effectively distinguish between data that is practically secure and data that poses a security threat.