D6.2: Final Report on the PRACE Operational Services
Description
The objective of this deliverable is to present the activity done to operate and coordinate the common PRACE Operational services, foreseen by Task 6.1 of WP6 in the PRACE-4IP project, with a special focus on the last reporting period (May 2016 - April 2017).
The operation of the PRACE distributed HPC infrastructure involves the coordination of a set of services which integrate the Tier-0 systems and a number of national Tier-1 systems, providing services for Tier-0, in a “single” pan-European HPC infrastructure.
This work is the continuation of the work done by Task 6.1 in the previous PRAC-IP projects to give continuity to the PRACE Operational services for the HPC eco-system. In turn, the activity presented here will continue and further progress in Task 6.1 of the PRACE-5IP project, just started.
Seven Tier-0 systems are operational in the second year of the PRACE-4IP project period:
- JUQUEEN at GCS@FZJ;
- CURIE at GENCI@CEA;
- HAZELHEN at GCS@HLRS;
- SuperMUC at GCS@LRZ;
- SuperMUC phase2 at GCS@LRZ
- MARCONI (BDW & KNL) at CINECA;
- MareNostrum 3 at BSC.
Furthermore, operational support has been provided to 28 national Tier-1 systems, which provide services for Tier-0 (i.e. used from SMEs for the SHAPE activity, or as stepping stone towards Tier-0 systems, or to prototype and asses new operational services). These Tier-1 systems are distributed among 16 different countries, ensuring a wide distribution of the European HPC eco-system.
The version of the PRACE Service Catalogue, approved from the PRACE Board of Directors (BoD) in March 2015, has been revised during the second year of PRACE-4IP. In the process towards establishing a PRACE Quality of Service and quality control, the work on PRACE Operational Key Performance Indicators has been addressed. During this second period of the project, for some services, an activity started to undertake the measurement of the KPIs and their evaluation.
Based on the procedures for incident and change management the complete set of PRACE common services as defined in the Service Catalogue (Networking, Data, Compute, AAA and Security, User, Monitoring and Generic) have been operated and monitored on a day-by-day basis to assure continuity and integrity of the services.
The current PRACE network (based on a 10 Gb/s star-topology designed more than 10 years ago) has been up-graded in the last months to a new MD-VPN (Multi Domain Virtual Private Network) architecture, which allow much for more flexibility in configuration, faster setup and cheaper connectivity costs per year.
The Security Forum, coordinated by Task 6.1, is responsible for all security related activities and, by means of periodic teleconferences, constantly assures the monitoring of the whole HPC infrastructure and prevents possible incidents, which could cause vulnerability on the PRACE RI.
The activity done in PRACE-4IP Task 6.1 will smoothly continue and progress in PRACE-5IP WP6, where the coordinated operation of the common PRACE operational services will involve an additional Tier-0 site (CSCS Zurich) which joins the group of the Hosting Members partners in the PRACE 2 framework.
Files
4IP-D6.2.pdf
Files
(1.0 MB)
Name | Size | Download all |
---|---|---|
md5:e3670978388794f57f24c3e1d320af3a
|
1.0 MB | Preview Download |