Published January 27, 2018 | Version v1
Journal article Open

Exploring System Availability During Software-Based Self-Testing of Multi-core CPUs

  • 1. KIOS Research and Innovation Center of Excellence, University of Cyprus
  • 2. Electrical and Computer Engineering, University of Cyprus

Description

As technology scales, the increased vulnerability of modern systems due to unreliable components becomes a major problem in the era of multi-/many-core architectures. Recently, several on-line testing techniques have been proposed, aiming towards error detection of wear-out/aging-related defects that can appear during the lifetime of a system. In this work, firstly we investigate the relation between system test latency and test-time overhead in multi-/many-core systems with shared Last-Level Cache (LLC) for periodic Software-Based Self-Testing (SBST), under different test scheduling policies. Secondly, we propose a new methodology aiming to reduce the extra overhead related to testing that is incurred as the system scales up (i.e., the number of on-chip cores increases). The investigated scheduling policies primarily vary the number of cores concurrently under test in the overall system test session. Our extensive, workload-driven dynamic exploration reveals that there is an inverse relationship between the two test measures; as the number of cores concurrently under test increases, system test latency decreases, but at the cost of significantly increased test time, which sacrifices system availability for the actual workloads. Under given system test latency constraints, which dictate the recovery time in the event of error detection, our exploration framework identifies the scheduling policy under which the overall test-time overhead is minimized and, hence, system availability is maximized. For the evaluation of the proposed techniques, multi-/many-core systems consisting of 16 and 64 cores are explored in a full-system, execution-driven simulation framework running multi-threaded PARSEC workloads.

Notes

The final publication is available at Springer via http://dx.doi.org/10.1007/s10836-018-5706-0

Files

mskitsas_jetta2018.pdf

Files (355.1 kB)

Name Size Download all
md5:4acd9faa30540431aba96b327a444797
355.1 kB Preview Download

Additional details

Funding

KIOS CoE – KIOS Research and Innovation Centre of Excellence 739551
European Commission