Published August 28, 2019 | Version 1.0
Conference paper Open

Optimum Checkpointing for Long-running Programs

  • 1. Imperial College London, London, United Kingdom
  • 2. Institute of Theoretical & Applied Informatics, Polish Academy of Sciences, Gliwice, Poland

Description

Checkpoints are widely used to improve the performance of computer systems and programs in the presence of failures, and significantly reduce the cost of restarting a program each time that it fails. Application level checkpointing has been proposed for programs which may execute on platforms which are prone to failures, and also to reduce the execution time of programs which are prone to internal failures. Thus we propose a mathematical model to estimate the average execution time of a program that operates in the presence of dependability failures, without and with application level checkpointing, and use it to estimate the optimum interval in number of instructions executed between successive checkpoints. Specific emphasis is given on programs with loops, whereas the results are illustrated through simulation.

Files

CEISEE_2019____Checkpointing_Paper.pdf

Files (669.9 kB)

Name Size Download all
md5:694bbec0ae23ec117062135057f5feca
669.9 kB Preview Download

Additional details

Funding

European Commission
SDK4ED – Software Development toolKit for Energy optimization and technical Debt elimination 780572