Ease Access to Climate Simulations for Researchers: IS-ENES Climate4Impact

Easier access to climate data is very important for the climate change impact research communities. Many aspects are important for those users, such as extensive guidance, transparent access to datasets, on-demand processing capabilities (notably for data reduction). To fulfill this objective, the climate4impact (http://climate4impact.eu/) web portal and services has been developed in the European Union funded IS-ENES projects, targeting climate change impact modellers, impact and adaptation consultants, as well as other experts using climate change data. It provides to users harmonized access to climate model data through tailored services. One of the main objectives of climate4impact is to provide standardized web services and tools that are reusable in other portals. These services include web processing services, web coverage services and web mapping services. Tailored portals can be targeted to specific communities and/or countries/regions while making use of those services. Recently, it became obvious that to fulfill users' needs regarding on-demand data processing and calculations, the climate4impact platform had to be able to use existing research and e-infrastructures in order to offer scalable and flexible services. This is especially true in the current context of a large increase in the data volumes of climate science datasets. To easily accommodate heterogeneous systems, a containerized and modular approach is envisioned. Finally, in the context of data processing delegation, a robust approach for metadata, provenance and lineage is required.


I. INTRODUCTION
Proper climate data access is getting to another dimension for climate scientific researchers, as data volumes are increasing very fast. The current Coupled Model Intercomparison Experiments 6 (CMIP6 [1]) contains a very large number of experiments, climate models (with increased spatial and temporal resolutions), greenhouse gas scenarios, etc. The CMIP is a collaborative framework designed to improve knowledge of This project has received funding from the European Unions Horizon 2020 research and innovation programme under grant agreement No 777413 and No 824084. climate change. It was first organized in 1995 by the Working Group on Coupled Modelling (WGCM) of the World Climate Research Programmes (WCRP). It is developed in phases to foster the climate model improvements but also to support national and international assessments of climate change.
The current trend of much larger data volumes result in difficulties to process and analyze needed data for research and applications. This is especially true for end users and researchers that have limited computing and bandwidth resources. The whole climate data archive is expected to reach a volume of an estimated 30 Pb within the next few years and up to potentially 2000 Pb later in the decade. On-demand data processing solutions as close as possible to the data storage are emerging and are absolutely needed, thanks to newly developed standards, provenance and infrastructures.
In Europe several initiatives are taking place to support scientific on-demand data analytics at the European scale. However, they use heterogeneous systems and often incompatible authentication and semantics. But they offer the huge potential of interoperability, as for example the DARE escience platform (http://project-dare.eu), designed for efficient and traceable development of complex experiments and domain-specific services on the Cloud. Also, the IS-ENES (https://is.enes.org) consortium has developed a platform to ease access to climate data for the climate impact community (C4I: https://climate4impact.eu). The platform is based on existing standards such as Web Processing Service (WPS). It is important to enable generic access and cross-domain interoperability, as well as providing compliance and integration with the future European Open Science Cloud (EOSC) platform (https://www.eosc-portal.eu).

II. MOTIVATIONS: CURRENT SITUATION
The IS-ENES C4I platform is being developed since 2009 ( [2] and [3]), currently within the IS-ENES3 H2020 European project. It has evolved from a web portal presenting impactspecific national Use Cases with documentation and guidance to a platform of standardized re-usable services and building blocks. The evolution of the C4I platform has been user driven, according to user consultations throughout the years as well as suggestions and feedback from users and scientific conferences ( [4]). It became obvious that the proper targeted users of the platform are scientific researchers, either being from the climate domain or other scientific domains using climate simulations.
Currently, in the climate research community, the users are just beginning to get away from a download-locally-thenanalyze type of workflow. This is critical as this is no longer a possible workflow with the current data volumes needed for doing proper scientific research. There has been some national repositories set up with a subset of the most commonly used datasets, but even this approach is difficult to sustain and is not scalable.

HETEROGENEOUS SYSTEMS
One of the most pressing needs is for remote data processing, e.g. as near as the data storage as possible. It is not necessary that processing systems be co-located with the data servers, it can also be on intermediate systems and platforms located with a very high bandwidth and capacity compared to what users have available at their location. Several solutions are available, such as those offered by EUDAT (https://www.eudat.eu), or those of the future DARE Platform ( [5]), the European Open Science Cloud (EOSC), commercial clouds (Amazon, etc.), national infrastructures, etc. An integration approach is only possible with proper standardization, sufficient metadata, lineage and provenance. This is very important when doing delegations of calculations and processing. The challenge is to enable users to transparently use remote computing resources that are available to them. But this is also a strong requirement.

IV. FUTURE STEPS
It has been shown that climate science researchers need support in getting access to data. It is getting critical with respect to the large data volume increase of the datasets. The IS-ENES C4I Platform is helping those users, but the current architecture and implementation is not scalable. Currently the C4I architecture is being completely rethought and redesigned, not only the organization of the underlying architecture, but also on how users interact with the data. One of the idea is to go away from a POSIX file-based approach to a more database-like approach (data-centric).
The current development plan and new architecture sketch of C4I will be shown: the ways it will be redesigned with respect to user interaction, as well as its new underlying architecture. Major improvements on how users interact with data, such as an improved search interface, a better scalable access to processing with transparent access to external resources, etc, will be presented. In the redesigned version, C4I users will use the front-end specific wizards (along with guidance) to trigger the execution of data processing, seamlessly and transparently. Crucial aspects related to metadata, lineage and provenance will also be discussed and exposed, especially the importance of sufficient and relevant metadata in the current context (workflow composition, dataset locations, mapping with different systems, etc.) Further ahead, C4I will also propagate transparently the authentication and authorization of the user where needed, such as credentials to access data and/or computing resources and eventually storage as well. Security aspects (and those are critical for the success of this kind of architecture) will need to be assessed. However, it will not be discussed here.