Publication
ICDM 2002
Conference paper

InfoMiner+: Mining partial periodic patterns with gap penalties

Abstract

In this paper, we focus on mining periodic patterns allowing some degree of imperfection in the form of random replacement from a perfect periodic pattern. Information gain was proposed to identify patterns with events of vastly different occurrence frequencies and adjust for the deviation from a pattern. However, it does not take any penalty if there exists some gap between the pattern occurrences. In many applications, e.g., bio-informatics, it is important to identify subsequences that a pattern repeats perfectly (or near perfectly). As a solution, we extend the information gain measure to include a penalty for gaps between pattern occurrences. We call this measure as generalized information gain. Furthermore, we want to find subsequence S′ such that for a pattern P, the generalized information gain of P in S′ is high. This is particularly useful in locating repeats in DNA sequences. In this paper, we developed an effective mining algorithm, InfoMiner+, to simultaneously mine significant patterns and the associated subsequences. © 2002 IEEE.

Date

Publication

ICDM 2002

Authors

Share