Efficient fault-tolerant mesh and hypercube architectures
Abstract
The authors present an efficient method for tolerating faults in d-dimensional mesh and hypercube architectures. The approach consists of adding spare processors and communication links so that the resulting architecture can be reconfigured to form the desired mesh or hypercube in the presence of faults. The cost of the fault-tolerant architecture is optimized by adding exactly k spare processors and minimizing the number of links per processor. The results are surprisingly efficient. For example, when the desired architecture is a d-dimensional mesh and k=1, the fault-tolerant architecture has the same maximum degree as the desired architecture and has only one spare processor. Efficient layouts are presented for fault-tolerant two- and three-dimensional meshes, and it is shown that multiplexers and buses can be used to reduce the degree of the fault-tolerant architectures.