Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published October 2009 | Published
Book Section - Chapter Open

Hadoop distributed file system for the Grid

Abstract

Data distribution, storage and access are essential to CPU-intensive and data-intensive high performance Grid computing. A newly emerged file system, Hadoop distributed file system (HDFS), is deployed and tested within the Open Science Grid (OSG) middleware stack. Efforts have been taken to integrate HDFS with other Grid tools to build a complete service framework for the Storage Element (SE). Scalability tests show that sustained high inter-DataNode data transfer can be achieved for the cluster fully loaded with data-processing jobs. The WAN transfer to HDFS supported by BeStMan and tuned GridFTP servers shows large scalability and robustness of the system. The hadoop client can be deployed at interactive machines to support remote data access. The ability to automatically replicate precious data is especially important for computing sites, which is demonstrated at the Large Hadron Collider (LHC) computing centers. The simplicity of operations of HDFS-based SE significantly reduces the cost of ownership of Petabyte scale data storage over alternative solutions.

Additional Information

© 2009 IEEE.

Attached Files

Published - 05402426.pdf

Files

05402426.pdf
Files (622.2 kB)
Name Size Download all
md5:f55f0ab7e3ca10f0af58376829905f9b
622.2 kB Preview Download

Additional details

Created:
August 19, 2023
Modified:
October 25, 2023