Relative Patterns Discovery toward Big Data Analytics
Recently, enterprises and governments invested aggressively in big data analytics because it is truly representative of popular opinion based on millions of people. Despite bringing new opportunities, big data encounters the challenges such as extremely large number of observations (e.g., Millions of transactions), high dimensionality (e.g., Thousands of items), and immediate response. Taking big data into consideration, the conventional association analysis is frustrated by the extraction of patterns information. Specifically, the computational complexity of frequent item sets mining increases exponentially by the number of items, which has been proven to be an NP-Complete problem. Although many studies used a pruning-patterns strategy to reduce the complexity, it probably distorts the shape of data and incurs inaccurate result. In this paper, we introduce relative patterns discovery (named RPD) that explores the same patterns between each two observations. To show that RPD is a pragmatic solution toward big data analytics, we design a scalable outlier detection method (named SOD) based on the concept of RPD. Particularly, SOD can score the anomaly without enumerate all the relative patterns. The empirical investigations, conducted with various real-world datasets, demonstrate that SOD performs well even in the environment of large number of observations and high dimensionality.