Cohort comparison of event sequences with balanced integration of visual analytics and statistics
Finding the differences and similarities between two datasets is a common analytics task. With temporal event sequence data, this task is complex because of the many ways single events and event sequences can differ between the two datasets (or cohorts) of records: the structure of the event sequences (e.g., event order, co-occurring events, or event frequencies), the attributes of events and records (e.g., patient gender), or metrics about the timestamps themselves (e.g., event duration). In exploratory analyses, running statistical tests to cover all cases is time-consuming and determining which results are significant becomes cumbersome. Current analytics tools for comparing groups of event sequences emphasize a purely statistical or purely visual approach for comparison. This paper presents a taxonomy of metrics for comparing cohorts of temporal event sequences, showing that the problem-space is bounded. We also present a visual analytics tool, CoCo (for "Cohort Comparison"), which implements balanced integration of automated statistics with an intelligent user interface to guide users to significant, distinguishing features between the cohorts. Lastly, we describe two early case studies: the first with a research team studying medical team performance in the emergency department and the second with pharmacy researchers.