Feature guided in-situ indices generation and data placement on distributed deep memory hierarchies
In-situ analytics have been increasingly adopted by leadership scientific applications to gain fast insights into massive output data of simulations. Current practice buffers the output data in DRAM for analytics processing, constraining it to DRAM capacity un-used by the simulation. The rapid growth of data size requires alternative approaches to accommodating data-rich analytics, such as using solid-state disks (SSDs) to increase effective memory capacity. For this purpose, this paper explores software solutions for exploring the deep memory hierarchies expected on future high-end machines. Leveraging the fact that many analytics are sensitive to data features (regions- of-interest) hidden in the data being processed, the approach incorporates the knowledge of the data features into in-situ data management. It uses adaptive index creation/refinement to reduce the overhead of index management. In addition, it uses data features to predict data skew and improve load balance through controlling data distribution and placement on distributed staging servers. The experimental results show that such feature-guided optimizations achieve substantial improvements over state-of-the-art approaches for managing output data in-situ.