The Workings of Typical Cluster Management Software - 5
The Workings of Typical Cluster Management Software - 5
Fault Tolerance
The master scheduler is also tasked with the responsibility of
ensuring that jobs complete successfully.
It does this by monitoring jobs until they successfully finish.
If a job fails, due to problems other than an application runtime
error, it will reschedule the job to run again.