Outlier detection in sparse data with factorization machines

Mengxiao Zhu; Charu Aggarwal; Shuai Ma; Hui Zhang; Jinpeng Huai

doi:10.1145/3132847.3132987

CIKM 2017

Conference paper

06 Nov 2017

Outlier detection in sparse data with factorization machines

View publication

Abstract

In sparse data, a large fraction of the entries take on zero values. Some examples of sparse data include short text snippets (such as tweets in Twitter) or some feature representations of categorical data sets with a large number of values, in which traditional methods for outlier detection typically fail because of the difficulty of computing distances. To address this, it is important to use the latent relations between such values. Factorization machines represent a natural methodology for this, and are naturally designed for the massive-domain setting because of their emphasis on sparse data sets. In this study, we propose an outlier detection approach for sparse data with factorization machines. Factorization machines are also efficient due to their linear complexity in the number of non-zero values. In fact, because of their efficiency, they can even be extended to traditional settings for numerical data by an appropriate feature engineering effort. We show that our approach is both effective and efficient for sparse categorical, short text and numerical data by an extensive experimental study.

Conference paper