Fault-Tolerance Techniques for High-Performance Computing (Paperback, Softcover reprint of the original 1st ed. 2015)


This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems.

R3,976

Or split into 4x interest-free payments of 25% on orders over R50
Learn more

Discovery Miles39760
Mobicred@R373pm x 12* Mobicred Info
Free Delivery
Delivery AdviceShips in 10 - 15 working days



Product Description

This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems.

Customer Reviews

No reviews or ratings yet - be the first to create one!

Product Details

General

Imprint

Springer International Publishing AG

Country of origin

Switzerland

Series

Computer Communications and Networks

Release date

October 2016

Availability

Expected to ship within 10 - 15 working days

First published

2015

Editors

,

Dimensions

235 x 155 x 18mm (L x W x T)

Format

Paperback

Pages

320

Edition

Softcover reprint of the original 1st ed. 2015

ISBN-13

978-3-319-35560-3

Barcode

9783319355603

Categories

LSN

3-319-35560-0



Trending On Loot