Basic HTML version of Foils prepared May 7 1996

Foil 28 The Workings of Typical Cluster Management Software - 5

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. by Mark Baker, Geoffrey Fox


Fault Tolerance
The master scheduler is also tasked with the responsibility of
ensuring that jobs complete successfully.
It does this by monitoring jobs until they successfully finish.
If a job fails, due to problems other than an application runtime
error, it will reschedule the job to run again.



Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Sun Apr 11 1999