About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
KDD 2011
Conference paper
Revisiting sequential pattern hiding to enhance utility
Abstract
Sequence datasets are encountered in a plethora of applications spanning from web usage analysis to healthcare studies and ubiquitous computing. Disseminating such datasets offers remarkable opportunities for discovering interesting knowledge patterns, but may lead to serious privacy violations if sensitive patterns, such as business secrets, are disclosed. In this work, we consider how to sanitize data to prevent the disclosure of sensitive patterns during sequential pattern mining, while ensuring that the nonsensitive patterns can still be discovered. First, we re-define the problem of sequential pattern hiding to capture the information loss incurred by sanitization in terms of both events'modification (distortion) and lost nonsensitive knowledge patterns (side-effects). Second, we model sequences as graphs and propose two algorithms to solve the problem by operating on the graphs. The first algorithm attempts to sanitize data with minimal distortion, whereas the second focuses on reducing the side-effects. Extensive experiments show that our algorithms outperform the existing solution in terms of data distortion and side-effects and are more efficient. Copyright 2011 ACM.