Conference paper Open Access

Optimum Checkpoints for Time and Energy

Erol Gelenbe; Pawel Boryzsko; Miltiadis Siavvas; Joanna Domanska

We study programs which operate in the presence of possible failures and which must be restarted from the beginning after each failure. In such systems checkpointsare introduced to reduce the large costs of program restarts when failures occur. Here we suggest that checkpoints should be introduced in a manner which assures effective reliability, while reducing both the computational overhead as much as possible, but also to save energy. We compute the total average program execution time in the presence of checkoints so as to limit the re-execution time of the program from the most recent checkpoint. We also study the total energy cnsumption of the program under the same conditions, and formulate an optimization problem to minimize a wighted sum of both average computation time and energy. This approach is placed in the context of Application Level Checkpointing and Restart (ALCR). We then focus on checkpoints placed at the beginning of a loop, and derive the optimum placement of checkpoints to minimize a weighted combination of the program's execution time and energy consumption. Numerical results are presented to illustrate the analysis. Finally we describe a software tool with a graphical interface that has been designed to assist a system designer in choosing the optimum checkpoint for a given program as a function of different failure rates and other parameters.

Files (18.0 MB)
Name Size
18.0 MB Download
Views 40
Downloads 6
Data volume 108.3 MB
Unique views 30
Unique downloads 6


Cite as