Published December 2013
| public
Journal Article
Joint Optimization of Overlapping Phases in MapReduce
- Creators
- Lin, Minghong
- Zhang, Li
- Wierman, Adam
- Tan, Jian
Chicago
Abstract
MapReduce is a scalable parallel computing framework for big data processing. It exhibits multiple processing phases, and thus an efficient job scheduling mechanism is crucial for ensuring efficient resource utilization. This work studies the scheduling challenge that results from the overlapping of the "map" and "shuffle" phases in MapReduce. We propose a new, general model for this scheduling problem. Further, we prove that scheduling to minimize average response time in this model is strongly NP-hard in the offline case and that no online algorithm can be constant-competitive in the online case. However, we provide two online algorithms that match the performance of the offline optimal when given a slightly faster service rate.
Additional Information
Copyright is held by author/owner(s). This work was supported by NSF grants CNS 0846025, DoE grant DE-EE0002890 and IBM Research.Additional details
- Eprint ID
- 47683
- Resolver ID
- CaltechAUTHORS:20140730-162927910
- NSF
- CNS 0846025
- Department of Energy (DOE)
- DE-EE0002890
- IBM Research
- Created
-
2014-07-31Created from EPrint's datestamp field
- Updated
-
2021-11-10Created from EPrint's last_modified field