Fault-tolerant meshes and hypercubes with minimal numbers of spares

Creators: Bruck, Jehoshua; Cypher, Robert; Ho, Ching-Tien

Style

An error occurred while generating the citation.

Abstract

Many parallel computers consist of processors connected in the form of a d-dimensional mesh or hypercube. Two- and three-dimensional meshes have been shown to be efficient in manipulating images and dense matrices, whereas hypercubes have been shown to be well suited to divide-and-conquer algorithms requiring global communication. However, even a single faulty processor or communication link can seriously affect the performance of these machines. This paper presents several techniques for tolerating faults in d-dimensional mesh and hypercube architectures. Our approach consists of adding spare processors and communication links so that the resulting architecture will contain a fault-free mesh or hypercube in the presence of faults. We optimize the cost of the fault-tolerant architecture by adding exactly k spare processors (while tolerating up to k processor and/or link faults) and minimizing the maximum number of links per processor. For example, when the desired architecture is a d-dimensional mesh and k = 1, we present a fault-tolerant architecture that has the same maximum degree as the desired architecture (namely, 2d) and has only one spare processor. We also present efficient layouts for fault-tolerant two- and three-dimensional meshes, and show how multiplexers and buses can be used to reduce the degree of fault-tolerant architectures. Finally, we give constructions for fault-tolerant tori, eight-connected meshes, and hexagonal meshes.

Additional Information

© Copyright 1993 IEEE. Reprinted with permission. Manuscript received July 2, 1991; revised November 12, 1991, and August 28, 1992. This work is based on "Fault-Tolerant Meshes with Minimal Numbers of Spares," by J. Bruck, R. Cypher, and C. T. Ho, which appeared in the Proceedings of the 3rd IEEE Symposium on Parallel and Distributed Processing, Dallas, TX, Dec. 2-5, 1991, pp. 288-295, and "Efficient Fault-Tolerant Mesh and Hypercube Architectures," which appeared in the Proceedings of the 22nd Annual International Symposium on Fault-Tolerant Computing, Boston, MA, July 8-10, 1992, pp. 162-169. ©1991, 1992 IEEE. The authors would like to thank the anonymous referees for their helpful comments and suggestions

Attached Files

Published - BRUieeetc93a.pdf

Files

BRUieeetc93a.pdf

Files (1.8 MB)

Name	Size	Download all
BRUieeetc93a.pdf md5:5288bd359a3aa2fe29fce30664ea18a5	1.8 MB	Preview Download

Additional details

	All versions	This version
Views	0	0
Downloads	14	14
Data volume	26.4 MB	26.4 MB