Optimum Checkpointing for Long-running Programs
Checkpoints are widely used to improve the performance of computer systems and programs in the presence of failures, and significantly reduce the cost of restarting a program each time that it fails. Application level checkpointing has been proposed for programs which may execute on platforms which are prone to failures, and also to reduce the execution time of programs which are prone to internal failures. Thus we propose a mathematical model to estimate the average execution time of a program that operates in the presence of dependability failures, without and with application level checkpointing, and use it to estimate the optimum interval in number of instructions executed between successive checkpoints. Specific emphasis is given on programs with loops, whereas the results are illustrated through simulation.