Publication
SC 2011
Conference paper

Poster: Scalable infrastructure to support supercomputer resiliency-aware applications and load balancing

View publication

Abstract

High performance computing systems display increasing complexity and component counts. This trend exposes weak-nesses in the underlying clustering infrastructure needed for continuous availability, maximizing utilization, and efficient administration of such systems. To mitigate the problem, we present a highly scalable clustering infrastructure, based on peer-to-peer technologies, for supporting resiliency-aware applications as well as efficient monitoring and load balancing. Supported services include Membership, Publishsubscribe messaging, Convergecast, Attribute replication and a DHT. We present a preliminary evaluation taken from an IBM BlueGene/P, demonstrating scalability up to ∼ 256K nodes.

Date

01 Dec 2011

Publication

SC 2011

Authors

Share