Published May 20, 2024 | Version v1
Conference proceeding Open

Silent Data Corruptions in Computing Systems: Early Predictions and Large-Scale Measurements

  • 1. ROR icon National and Kapodistrian University of Athens
  • 2. ROR icon Meta (United States)


Silent Data Corruptions (SDCs) due to defects in computing chips (CPUs, GPUs, AI accelerators) is a critical threat to the quality of large-scale computing in different application domains: cloud computing, high-performance computing, edge computing. Recent public reports by cloud hyperscalers have emphasized that apart from the usual suspects for SDCs (memory, storage, network), the heart of the computations, the processing elements of all types generate an unexpectedly large rate of SDCs which can cause erroneous calculations and severe information loss. We report, in a consolidated form, recent efforts to correlate early microarchitecture-level simulation-based predictions about the likelihood, rates, severity, and root causes of SDCs and large-scale in-field studies in cloud data centers. Early microarchitecture-level prediction of SDC characteristics (susceptible units, workloads, instructions) can shed light to the cryptic problem of SDCs. The findings of a diligent pre-silicon analysis can assist better understanding of SDCs and can thus drive effective protection decisions either at the hardware or at the software levels at deployment stages.



Files (362.2 kB)

Name Size Download all
362.2 kB Preview Download

Additional details


Vitamin-V – Virtual Environment and Tool-boxing for Trustworthy Development of RISC-V based Cloud Services 101093062
European Commission
NEUROPULS – NEUROmorphic energy-efficient secure accelerators based on Phase change materials aUgmented siLicon photonicS 101070238
European Commission