IEEE - Institute of Electrical and Electronics Engineers, Inc. - Algorithm-based checkpoint-free fault tolerance for parallel matrix computations on volatile resources

Proceedings. 20th International Parallel and Distributed Processing Symposium

Author(s): Zizhong Chen ; J. Dongarra
Publisher: IEEE - Institute of Electrical and Electronics Engineers, Inc.
Publication Date: 1 January 2006
Conference Location: Rhodes Island, Greece
Conference Date: 25 April 2006
ISBN (Paper): 1-4244-0054-6
DOI: 10.1109/IPDPS.2006.1639333
Regular:

As the size of today's high performance computers increases from hundreds, to thousands, and even tens of thousands of processors, node failures in these computers are becoming frequent events.... View More

Advertisement