The Workings of Typical Cluster Management Software - 5
The Workings of Typical Cluster Management Software - 5
Fault Tolerance
- The master scheduler is also tasked with the responsibility of ensuring that jobs complete successfully.
- It does this by monitoring jobs until they successfully finish.
- If a job fails, due to problems other than an application runtime error, it will reschedule the job to run again.