Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published September 1993 | Published
Journal Article Open

Fault-tolerant meshes and hypercubes with minimal numbers of spares

Abstract

Many parallel computers consist of processors connected in the form of a d-dimensional mesh or hypercube. Two- and three-dimensional meshes have been shown to be efficient in manipulating images and dense matrices, whereas hypercubes have been shown to be well suited to divide-and-conquer algorithms requiring global communication. However, even a single faulty processor or communication link can seriously affect the performance of these machines. This paper presents several techniques for tolerating faults in d-dimensional mesh and hypercube architectures. Our approach consists of adding spare processors and communication links so that the resulting architecture will contain a fault-free mesh or hypercube in the presence of faults. We optimize the cost of the fault-tolerant architecture by adding exactly k spare processors (while tolerating up to k processor and/or link faults) and minimizing the maximum number of links per processor. For example, when the desired architecture is a d-dimensional mesh and k = 1, we present a fault-tolerant architecture that has the same maximum degree as the desired architecture (namely, 2d) and has only one spare processor. We also present efficient layouts for fault-tolerant two- and three-dimensional meshes, and show how multiplexers and buses can be used to reduce the degree of fault-tolerant architectures. Finally, we give constructions for fault-tolerant tori, eight-connected meshes, and hexagonal meshes.

Additional Information

© Copyright 1993 IEEE. Reprinted with permission. Manuscript received July 2, 1991; revised November 12, 1991, and August 28, 1992. This work is based on "Fault-Tolerant Meshes with Minimal Numbers of Spares," by J. Bruck, R. Cypher, and C. T. Ho, which appeared in the Proceedings of the 3rd IEEE Symposium on Parallel and Distributed Processing, Dallas, TX, Dec. 2-5, 1991, pp. 288-295, and "Efficient Fault-Tolerant Mesh and Hypercube Architectures," which appeared in the Proceedings of the 22nd Annual International Symposium on Fault-Tolerant Computing, Boston, MA, July 8-10, 1992, pp. 162-169. ©1991, 1992 IEEE. The authors would like to thank the anonymous referees for their helpful comments and suggestions

Attached Files

Published - BRUieeetc93a.pdf

Files

BRUieeetc93a.pdf
Files (1.8 MB)
Name Size Download all
md5:5288bd359a3aa2fe29fce30664ea18a5
1.8 MB Preview Download

Additional details

Created:
August 22, 2023
Modified:
October 17, 2023