Basic HTML version of Foils prepared August 4 1996

Foil 25 The Workings of Typical Cluster Management Software - 5

From MetaComputing -- the Informal Supercomputer Tutorial for CRPC Annual Meeting at Argonne -- May 13 1996. by Mark Baker


Fault Tolerance
The master scheduler is also tasked with the responsibility of ensuring that jobs complete successfully.
It does this by monitoring jobs until they successfully finish.
If a job fails, due to problems other than an application runtime error, it will reschedule the job to run again.



Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Sun Apr 11 1999