Publication
ICDE 2003
Conference paper

Sequence data mining techniques and applications

View publication

Abstract

Many interesting real-life mining applications rely on modeling data as sequences of discrete multi-attribute records. Mining models for network intrusion detection view data as sequences of TCP/IP packets. Text information extraction systems model the input text as a sequence of words and delimiters. Customer data mining applications profile buying habits of customers as a sequence of items purchased. In computational biology, DNA, RNA and protein data are all best modeled as sequences. Classifying, clustering and characterizing such sequence data presents interesting issues in feature engineering, discretization and pattern discovery. In this seminar we will review techniques ranging from item set counting, MDL-based discretization and Markov modeling to perform various supervised and unsupervised pattern discovery tasks on sequences. We will present case studies from network intrusion detection and DNA sequence mining to illustrate these techniques.

Date

Publication

ICDE 2003

Authors

Share