About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
WORKS 2016
Conference paper
Integrating domain-data steering with code-profiling tools to debug data-intensive workflows
Abstract
Computer simulations may be composed of scientific programs chained in a coherent flow and executed in High Performance Computing environments. These executions may present anomalies associated to the data that flows in parallel among programs. Several parallel code-profiling tools already support performance analysis, such as Tuning and Analysis Utilities (TAU) or provide fine-grained performance statistics such as the System Activity Report (SAR). However, these tools do not associate their results to their corresponding dataflows. Such analysis is fundamental to trace back the data origins of an error. In this paper, we propose to couple a workflow monitoring data approach to parallel code-profiling tools for workflow executions. The goal is to profile and debug parallel workflow executions by querying a database that is able to integrate performance, resource consumption, provenance, and domain data from simulation programs at runtime. We have implemented our data monitoring approach as a software component that was coupled to TAU and SAR code profiling tools. We show how querying the resulting integrated database enables domain-aware runtime steering of performance anomalies by using the astronomy Montage workflow, as a motivating example. We observe that the overhead introduced by our approach is negligible.