Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.
Published July 30, 2012 | Version v1
Working paper Open

Redundancy and Reliability for an HPC Data Centre

Creators

  • 1. National High Performance Computing Center of Turkey, Uydu, Yolu, Maslak, 34469, Đstanbul, Turkey

Contributors

  • 1. CSCS- Swiss National Supercomputing Centre, Lugano, Switzerland

Description

Defining a level of redundancy is a strategic question when planning a new data centre, as it will directly
impact the entire design of the building as well as the construction and operational costs. It will also affect
how to integrate future extension plans into the design. Redundancy is also a key strategic issue when
upgrading or retrofitting an existing facility.
Redundancy is a central strategic question to any business that relies on data centres for its operation. In
the traditional data centre reliant industries such as Internet Service Providers (ISP’s), banks, insurances, or
credit card services redundancy is of paramount importance, as a loss of availability has an immediate and
sometimes drastic impact on revenue or legal due diligence for example. For this reason, the industry has
formed a number of clear standards and guidelines that address the topic of redundancy and reliability.
Both these topics are of course just as important for HPC centres too, but not always in the same way given
that some of the trade-off mechanisms may differ substantially and thus make it difficult for an HPC centre to
rely fully on the existing standards used by the traditional data centre industry.
This white paper aims to discuss the key factors to be taken into account when selecting a level of
redundancy and reliability for an HPC centre, providing managers with a set of topics that need to be
considered when designing a new HPC centre or upgrading an existing one. These factors all have an impact
on the design and cost of construction as well as on future operational costs for your centre.

Files

HPC-Centre-Redundancy-Reliability-WhitePaper.pdf

Files (452.1 kB)

Name Size Download all
md5:582d1313a19a1ed4951e438e5381c28d
452.1 kB Preview Download

Additional details

Funding

PRACE-1IP – PRACE - First Implementation Phase Project 261557
European Commission