Big Data 2014
Conference paper

TRISTAN: Real-time analytics on massive time series using sparse dictionary compression

View publication


Large-scale critical infrastructures such as transportation, energy, or water distribution networks are increasingly equipped with smart sensor technologies. Low-latency analytics on the resulting times series would open the door to many exciting opportunities to improve our grasp on complex urban systems. However, sensor-generated time series often turn out to be noisy, non-uniformly sampled, and misaligned in practice, making them ill-suited for traditional data processing. In this paper, we introduce TRISTAN (massive TRIckletS Time series ANalysis), a new data management system for efficient storage and real-time processing of fine-grained time series data. TRISTAN relies on a dedicated, compressed sparse representation of the time series using a dictionary. In contrast to previous approaches, TRISTAN is able to execute most analytics queries on the compressed data directly, and supports efficient and approximate query answering based on the most significant atoms of the dictionary only. We present the overall architecture of our system and discuss its performance on several smarter city datasets, showing that TRISTAN can achieve up to 20:1 compression ratios and 250x speedup compared to a state-of-the-art system.