The Workings of Typical Cluster Management Software - 5

The Workings of Typical Cluster Management Software - 5

Fault Tolerance

The master scheduler is also tasked with the responsibility of

ensuring that jobs complete successfully.

It does this by monitoring jobs until they successfully finish.

If a job fails, due to problems other than an application runtime

error, it will reschedule the job to run again.

Previous slide Next slide Back to the first slide View Graphic Version