About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICDE 2003
Conference paper
Sequence data mining techniques and applications
Abstract
Many interesting real-life mining applications rely on modeling data as sequences of discrete multi-attribute records. Mining models for network intrusion detection view data as sequences of TCP/IP packets. Text information extraction systems model the input text as a sequence of words and delimiters. Customer data mining applications profile buying habits of customers as a sequence of items purchased. In computational biology, DNA, RNA and protein data are all best modeled as sequences. Classifying, clustering and characterizing such sequence data presents interesting issues in feature engineering, discretization and pattern discovery. In this seminar we will review techniques ranging from item set counting, MDL-based discretization and Markov modeling to perform various supervised and unsupervised pattern discovery tasks on sequences. We will present case studies from network intrusion detection and DNA sequence mining to illustrate these techniques.