Published March 28, 2022 | Version v1
Project deliverable Open

D4.2 – Report on algorithms for exascale robustness (fault tolerance and large-scale communications) in QMC flagship codes

Creators

  • 1. Max Planck Gesellschaft
  • 1. CNRS

Description

We expect exascale machines to enable QMC applications on larger systems than those that can be treated today. This implies that systems will have larger numbers of electrons, and/or larger Configuration Interaction (CI) expansions. In this Work Package (WP), we investigate ways to overcome new difficulties that will arise when running exascale simulations.

Exascale machines will often be used to run simulations that can’t run on smaller systems. So the computed data will be particularly valuable to users, and it should not be lost by accident during the simulation. In addition, an exascale machine will be such a complex piece of hardware and software that it is not reasonable to neglect system failures in the design of dedicated software. The first section of this document discusses different strategies used to make simulations robust to system failures.
 

Files

TREX-D4.2-Report on algorithms for exascale robustness in QMC flagship codes.pdf

Additional details

Funding

European Commission
TREX – Targeting Real chemical accuracy at the EXascale 952165