WSC 2017
Conference paper

Using quality of service lanes to control the impact of raid traffic within a burst buffer

The next generation of leadership supercomputer systems will require a medium-term layer of storage. The basis for this stratum of storage will be a Storage I/O Node (SION). For increased reliability, a redundancy algorithm will be implemented on top of groups of SIONs. In addition to the overheads of implementing a redundancy mechanism, a large cost of using a RAID strategy comes from the possibility of increased network congestion due to rebuild operations. To better understand the impact of RAID rebuild traffic, we have developed a simulation model of the SIONs. After validation, we use this model to investigate the impact of several configuration parameters, including redundancy mechanism and the physical arrangement of hardware. Additionally, our model analyzes the use of Quality of Service lanes to limit the impact of RAID traffic. We conclude with a series of recommendations for configuring a resilient and high performing I/O subsystem.