IEEE - Institute of Electrical and Electronics Engineers, Inc. - Reliability Speedup: An Effective Metric for Parallel Application with Checkpointing

2009 International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)

Author(s): Zhiyuan Wang
Publisher: IEEE - Institute of Electrical and Electronics Engineers, Inc.
Publication Date: 1 December 2009
Conference Location: Higashi Hiroshima, Japan, Japan
Conference Date: 8 December 2009
Page(s): 247 - 254
ISBN (Paper): 978-0-7695-3914-0
DOI: 10.1109/PDCAT.2009.19
Regular:

With parallel computing system scaling up, the system reliability drastically decreases, so parallel applications running on such system must tolerate hardware failures. Checkpointing is widely... View More

Advertisement