Journal article Open Access
Erol Gelenbe; Miltiadis Siavvas
Long-running software may operate on hardware platforms with limited energy resources such as batteries or photovoltaic, or on high-performance platforms that consume a large amount of energy. Since such systems may be subject to hardware failures, checkpointing is often used to assure the reliability of the application. Since checkpointing introduces additional computation time and energy consumption, we study how checkpoint intervals need to be selected so as to minimize a cost function that includes the execution time and the energy. Expressions for both the program’s energy consumption and execution time are derived as a function of the failure probability per instruction. A first principle based analysis yields the checkpoint interval that minimizes a linear combination of the average energy consumption and execution time of the program, in terms of the classical “Lambert function”. The sensitivity of the checkpoint to the importance attributed to energy consumption is also derived. The results are illustrated with numerical examples regarding programs of various lengths and showing the relation between the checkpoint interval that minimizes energy consumption and execution time, and the one that minimizes a weighted sum of the two. In addition, our results are applied to a popular software benchmark, and posted on a publicly accessible web site, together with the optimization software that we have developed.