About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
SC 2021
Conference paper
Generalizable Coordination of Large Multiscale Workflows: Challenges and Learnings at Scale
Abstract
The advancement of machine learning techniques and the availability of heterogeneous computing are propelling the demand for large multiscale simulations that can automatically and autonomously couple diverse components to solve complex problems at multiple scales. Nevertheless, the current capabilities are limited to coupling two scales. In the first-ever demonstration of using three resolution scales, we present a scalable and generalizable framework as we expand MuMMI, an award-winning workflow, beyond its original design. We discuss the challenges and learnings in executing a massive simulation campaign that utilized over 600,000 node-hours on Summit, achieving more than 98% GPU occupancy for over 83% of the time. We enable orders of magnitude scaling, including coordinating 24,000 jobs, and managing several TBs of new data per day and over a billion files in total. Finally, we describe the generalizability of our framework and discuss how the presented framework may be used for new applications.