IEEE - Institute of Electrical and Electronics Engineers, Inc. - Design and Implementation for Checkpointing of Distributed Resources Using Process-Level Virtualization

2016 IEEE International Conference on Cluster Computing (CLUSTER)

Author(s): Kapil Arya ; Rohan Garg ; Artem Y. Polyakov ; Gene Cooperman
Publisher: IEEE - Institute of Electrical and Electronics Engineers, Inc.
Publication Date: 1 September 2016
Conference Location: Taipei, Taiwan
Conference Date: 12 September 2016
Page(s): 402 - 412
ISBN (Electronic): 978-1-5090-3653-0
ISSN (Electronic): 2168-9253
DOI: 10.1109/CLUSTER.2016.55
Regular:

System-level checkpoint-restart is a critical technology for long-running jobs in high-performance computing. Yet, only two approaches to checkpointing MPI applications continue to survive in wide... View More

Advertisement