About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICEBE 2015
Conference paper
Relative Patterns Discovery toward Big Data Analytics
Abstract
Recently, enterprises and governments invested aggressively in big data analytics because it is truly representative of popular opinion based on millions of people. Despite bringing new opportunities, big data encounters the challenges such as extremely large number of observations (e.g., Millions of transactions), high dimensionality (e.g., Thousands of items), and immediate response. Taking big data into consideration, the conventional association analysis is frustrated by the extraction of patterns information. Specifically, the computational complexity of frequent item sets mining increases exponentially by the number of items, which has been proven to be an NP-Complete problem. Although many studies used a pruning-patterns strategy to reduce the complexity, it probably distorts the shape of data and incurs inaccurate result. In this paper, we introduce relative patterns discovery (named RPD) that explores the same patterns between each two observations. To show that RPD is a pragmatic solution toward big data analytics, we design a scalable outlier detection method (named SOD) based on the concept of RPD. Particularly, SOD can score the anomaly without enumerate all the relative patterns. The empirical investigations, conducted with various real-world datasets, demonstrate that SOD performs well even in the environment of large number of observations and high dimensionality.