Believe it today or tomorrow? Detecting untrustworthy information from dynamic multi-source data
A vast ocean of data is collected every day, and numerous applications call for the extraction of actionable insights from data. One important task is to detect untrustworthy information because such information usually indicates critical, unusual, or suspicious activities. In this paper, we study the important problem of detecting untrustworthy information from a novel perspective of correlating and comparing multiple sources that describe the same set of items. Different from existing work, we recognize the importance of time dimension in modeling the commonalities among multiple sources. We represent dynamic multi-source data as tensors and develop a joint non-negative tensor factorization approach to capture the common patterns across sources. We then conduct a comparison between source input and common patterns to identify inconsistencies as an indicator of untrustworthiness. An incremental factorization approach is developed to improve the computational efficiency on dynamically arriving data. We also propose a method to handle data sparseness. Experiments are conducted on hotel rating, network traffic flow, and weather forecast data that are collected from multiple sources. Results demonstrate the advantages of the proposed approach in detecting inconsistent and untrustworthy information.