Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published April 1995 | Published
Book Section - Chapter Open

PCODE: an efficient and reliable collective communication protocol for unreliable broadcast domain

Abstract

Existing programming environments for clusters are typically built on top of a point-to-point communication layer (send and receive) over local area networks (LANs) and, as a result, suffer from poor performance in the collective communication part. For example, a broadcast that is implemented using a TCP/IP protocol (which is a point-to-point protocol) over a LAN is obviously inefficient as it is not utilizing the fact that the LAN is a broadcast medium. We have observed that the main difference between a distributed computing paradigm and a message passing parallel computing paradigm is that, in a distributed environment the activity of every processor is independent while in a parallel environment the collection of the user-communication layers in the processors can be modeled as a single global program. We have formalized the requirements by defining the notion of a correct global program. This notion provides a precise specification of the interface between the transport layer and the user-communication layer. We have developed PCODE, a new communication protocol that is driven by a global program and proved its correctness. We have implemented the PCODE protocol on a collection of IBM RS/6000 workstations and on a collection of Silicon Graphics Indigo workstations, both communicating via UDP broadcast. The experimental results we obtained indicate that the performance advantage of PCODE over the current point-to-point approach (TCP) can be as high as an order of magnitude on a cluster of 16 workstations.

Additional Information

© Copyright 1996 IEEE. Reprinted with permission. Meeting Date: 04/25/1995 - 04/28/1995. Supported in part. by the NSF Young Investigator Award CCR-9457811, by a grant from the IBM Almaden Research Center, San Jose, California and by a grant from the AT&T Foundation. Wv would like to thank especially Dalia Malki for her invaluable advice, coding ideas and trouble shooting. Thanks to Yair Amir for his coding ideas and advice on IPC and to Jim Wiley for useful help and advice on AIX.

Attached Files

Published - BRUipps95.pdf

Files

BRUipps95.pdf
Files (1.1 MB)
Name Size Download all
md5:4bcd1c6a5c5670472ed2e81ab754b46b
1.1 MB Preview Download

Additional details

Created:
August 22, 2023
Modified:
October 17, 2023