Basic HTML version of Foils prepared August 4 1996

Foil 25 The Workings of Typical Cluster Management Software - 5

From MetaComputing -- the Informal Supercomputer Tutorial for CRPC Annual Meeting at Argonne -- May 13 1996. by Mark Baker


1 Fault Tolerance
2 The master scheduler is also tasked with the responsibility of ensuring that jobs complete successfully.
3 It does this by monitoring jobs until they successfully finish.
4 If a job fails, due to problems other than an application runtime error, it will reschedule the job to run again.

in Table To:


Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Sun Apr 11 1999