Probabilistic-mismatch anomaly detection: Do one's medications match with the diagnoses
Abstract
Anomaly detection in healthcare data like patient records is no trivial task. The anomalies in these datasets are often caused by mismatches between different types of feature, e.g., medications that do not match with the diagnoses. Existing anomaly detection methods do not perform well when detecting 'mismatches' between multiple types of feature, especially when the feature space is high-dimensional and sparse. This paper introduces a novel anomaly detection paradigm: Probabilistic-Mismatch Anomaly Detection (PMAD), which detects mismatches between features by modeling a normal instance with a common latent probability distribution that governs the generation of all types of feature. Under this paradigm, the target of anomaly detection is to find instances with dissimilar latent distributions. We further propose Topical PMAD based on an extended Latent Dirichlet Allocation (LDA) model, which is able to capture the latent relationship between features in a high-dimensional space. Experiments on both synthetic data and real-world patient records show that Topical PMAD can effectively detect anomalies with mismatched features, and is highly robust against high-dimensional data as well as inaccurate model selection. The real-world anomalies detected on a patient record dataset show a promising application prospect.