About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICDM 2002
Conference paper
InfoMiner+: Mining partial periodic patterns with gap penalties
Abstract
In this paper, we focus on mining periodic patterns allowing some degree of imperfection in the form of random replacement from a perfect periodic pattern. Information gain was proposed to identify patterns with events of vastly different occurrence frequencies and adjust for the deviation from a pattern. However, it does not take any penalty if there exists some gap between the pattern occurrences. In many applications, e.g., bio-informatics, it is important to identify subsequences that a pattern repeats perfectly (or near perfectly). As a solution, we extend the information gain measure to include a penalty for gaps between pattern occurrences. We call this measure as generalized information gain. Furthermore, we want to find subsequence S′ such that for a pattern P, the generalized information gain of P in S′ is high. This is particularly useful in locating repeats in DNA sequences. In this paper, we developed an effective mining algorithm, InfoMiner+, to simultaneously mine significant patterns and the associated subsequences. © 2002 IEEE.