Planning-based reasoning for automated large-scale data analysis
In this paper, we apply planning-based reasoning to orchestrate the data analysis process automatically, with a focus on two applications: early detection of health complications in critical care, and detection of anomalous behaviors of network hosts in enterprise networks. Our system uses expert knowledge and AI planning to reason about possibly incomplete, noisy, or inconsistent observations, derived from data by deploying an open set of analytics, to generate plausible and consistent hypotheses about the state of the world. From these hypotheses, relevant actions are triggered leading to the deployment of additional analytics, or adaptation of existing analytics, that produce new observations for further reasoning. Planning-based reasoning is enabled by knowledge models obtained from domain experts that describe entities in the world, their states, and relationship to observations. To address the associated knowledge engineering challenges, we propose a modeling language named LTS++ and build an Integrated Development Environment. We also develop a process that provides support and guidance to domain experts, with no planning expertise, in defining and constructing models. We use this modeling process to capture knowledge for the two applications and to collect user feedback. Furthermore, we conduct empirical evaluation to demonstrate the feasibility of our approach and the benefits of using planning-based reasoning in these applications, at large real-world scales. Specifically, in the network monitoring scenario, we show that the system can dynamically deploy and manage analytics for the effective detection of anomalies and malicious behaviors with lead times of over 15 minutes, in an enterprise network with over 2 million hosts (entities).