nuts-STeauRY dataset: hydrochemical and catchment characteristics dataset for large sample studies of Carbon, Nitrogen, Phosphorus and Silicon in french watercourses
Authors/Creators
Description
nuts-STeauRY dataset: hydrochemical and catchment characteristics dataset for large sample studies of Carbon, Nitrogen, Phosphorus and Silicon in French watercourses
Antoine Casquin, Marie Silvestre, Vincent Thieu
10.5281/zenodo.10830852
v0.1, 18th March 2024
Brief overview of data:
· Carbon and nutrients data for 5470 continental French catchments
· Modelled discharge for 5128 of catchments out of 5470
· Geopackages with catchment delineations and outlets
· DEM conditioned to delimit additional catchments
· Land-use and climatic data for 5470 continental French catchments
Citation of this work
A data paper is currently being submitted with details of methods and results. Once published, it will be the preferential source to cite. The data paper will be link to the new version of the dataset that will be updated on doi.org/10.5281/zenodo.10830852. If you use this dataset in your research or report, you must cite it.
Motivations
Data was collected and curated for the nuts-STeauRY project (http://nuts-steaury.cnrs.fr), which deployed a national generic land to sea modelling chain.
Data was primarily used (see related works):
- To calibrate concentrations of dissolved organic carbon and dissolve silica in headwaters
- To validate spatially and temporally the modelling chain (DOC, NO3-, NH4+, TP, SRP, DSi)
Hydrochemical large sample datasets have numerous other uses: trends computations elucidate transfer mechanisms, machine learning, retrospective studies etc.
The objective here is to provide a large sample curated dataset of carbon and nutrients concentrations along with modelled discharges, catchment characteristics and delimitations for the continental France. Such large sample dataset aims at easing the large sample studies over France and/or Europe. Although part of the data gathered here is obtainable via public sources, the catchments delineations, their characteristics and modelled hydrology were note not publicly available yet. Moreover, a unification of units and detection and removal of outliers was performed on carbon and nutrients data.
Data sources & processing
Sampling points where snapped on the CCM database v2.1 (http://data.europa.eu/89h/fe1878e8-7541-4c66-8453-afdae7469221)(Vogt et al., 2007) and catchments were delineated using a 100m resolution Digital Elevation Model (DEM) conditioned by the hydrographic network and elementary catchments’ delineations of the CCM data v2.1. More than 6000 catchments were delineated and screened manually to check consistency: 5470 were retained.
Nutrient data was collected mainly through the Naiades portal (https://naiades.eaufrance.fr/), a database collecting water quality data produced by different water related actors across France. Nutrient data was also collected directly with regional water agencies (https://www.eau-seine-normandie.fr/, https://eau-grandsudouest.fr/, https://www.eaurmc.fr/, https://www.eau-artois-picardie.fr/, https://www.eau-rhin-meuse.fr/ and https://agence.eau-loire-bretagne.fr/home.html), and pre-processed using a database management system relying on PostgreSQL with PostGIS extension (Thieu & Silvestre, 2015). A three-pass strategy was used to curate raw carbon and nutrients data: 1. Removal of “obvious outliers”, 2. Detection of baseline change and correction if possible (or removal of data) 3. Removal of outliers using a quantile based approach by element and temporal series.
Hydrological time series are interpolation trough hydrograph transfer (de Lavenne et al., 2023) of 1664 time series of discharge completed with GR4J model (Pelletier & Andréassian, 2020; Pelletier 2021).
Land cover data was extracted from Corine Land Cover dataset for years 2000, 2006, 2012, and 2018 (EEA, 2020). Raw CLC typology contains 44 classes. Results of percent cover per year per class were computed for each catchment. An aggregated typology of 8 classes is also proposed.
Climatological data was extracted from daily reconstruction at 5 arcmin for temperatures and 1 arcmin for precipitation over Europe (Thiemig et al., 2022). Mean by catchment for min&max daily temperature and precipitation were computed for each catchment for the 1990-2019 period.
Nuts-STeauRY dataset
Carbon and nutrients time series
Time series of carbon and nutrients within the 1962-2019 period on 5470 stations: Dissolved Organic Carbon (DOC), Total Organic Carbon (TOC) Nitrates (NO3-), Nitrites (NO2-), Ammonia (NH4+), Soluble Reactive Phosphorus (SRP), Total Phosphorus (TP) and Dissolved Silica (DSi).
|var | n_unique_station| n_total_meas| mean_duration_y| mean_frequency_y|
|:---|----------------:|------------:|---------------:|----------------:|
|DOC | 4 992| 658 147| 14.3| 9.0|
|DSi | 3 299| 333 866| 12.9| 8.3|
|NH4 | 5 318| 907 343| 19.3| 8.7|
|NO2 | 5 264| 891 886| 19.2| 8.6|
|NO3 | 5 465| 939 279| 19.0| 9.0|
|SRP | 5 361| 910 107| 19.1| 8.7|
|TOC | 935| 111 993| 13.6| 9.6|
|TP | 5 199| 802 841| 17.1| 8.8|
Note that some SRP and DSi measurements were declared as realized on raw water. A thorough analysis of time series show no evidence of difference on baselines. For more accuracy, it is advised to filter out those analyses using the “fraction” attribute of each measurement.
Discharge modelled daily time series
Modelled naturalized discharge through hydrograph transfer and interpolated measured discharges when available for the 1980-2019 period.
A daily discharge was computed for 5128 catchments. For small catchments (< 1000 km2, n = 4530), hydrograph transfer was used, while for big catchments, a direct interpolation of measured/completed discharges was performed. The direct interpolation was only possible for 598 catchments > 1000 km2. The criteria retained for a direct interpolation is 0.8*area_discharge_station < area_quality < 1.2*area_discharge_station when discharge and quality stations were nested.
Hydrological time series uncertainties varies a lot depending on: quality of data source, distance from pseudo-gauged outlets, land cover of the catchments, natural spatial and temporal variability of discharge, size of the catchment (de Lavenne et al., 2016). We advise a cautious use of those modelled discharges as uncertainties could not be computed.
Catchments, outlets and conditioned DEM
5470 catchments and outlets are delivered as geopackages (EPSG: 3035).
The DEM, conditioned by CCM 2.1 is also delivered as a GeoTIFF (EPSG: 3035) as way to delimit new catchment for the area that are consistent with the dataset.
Catchments characteristics and climate
Refer to Data sources & processing and File descriptions.
File and attributes descriptions:
The key “sta_code” is present across all files. For time varying records, “date” can be a secondary key.
Description of CNPSi.csv data attributes
Each line is a couple measurement/parameter/station
· sta_code: Code of the station in the Sandre referentiel (public french "dataverse" for water data)
· sta_name: Name of the station in the Sandre referentiel (public french "dataverse" for water data)
· var: Abbreviation of parameter name
· fraction: "water_filtrated" or "water_raw"
· date: date of sampling
· hour: hour of sampling
· value: analytical result (concentration)
· provider: provider of the data
· producer: producer of the data
· from_db: "Naiades2022" (https://naiades.eaufrance.fr/france-entiere#/ dump from 2022) or "DoNuts" (Thieu, V., Silvestre, M., 2015. DoNuts: un système d’information sur les observations environnementales. Présentation Séminaire UMR Métis)
· n_meas: number of observations for a given parameter / station
· unit: unit of concentration
· element: "C" "N" "P" or "Si"
· year: year of observation
· month: month of observation
· day: day of observation
· julian_day: julian day observation (1-366)
· decade: decade of observation (one of "1961-1970", "1971-1980", "1981-1990", "1991-2000", "2001-2010", "2011-2020")
Description of CNPSi_stats.csv data attributes
Each line is a couple parameter / station
· sta_code: Code of the station in the Sandre referentiel (public french "dataverse" for water data)
· sta_name: Name of the station in the Sandre referentiel (public french "dataverse" for water data)
· var: Abbreviation of parameter name
· n_meas: number of observations for a given parameter / station
· start_year: year of first observation for a given parameter / station
· end_year: year of last observation for a given parameter / station
· duration_y_tot: total duration of observation in years for a given parameter / station
· duration_y_tot: duration of observation in years for a given parameter / station for years with at least 1 meas
· mean_nmeas_per_y_tot: mean number of observations per year considering total duration
· mean_nmeas_per_y_meas: mean number of observations per year considering years with measurements
· is_fully_continuous: TRUE if at least one measurement per year for a given parameter / station
· start_cont_seq: year in which starts the longest continuous sequence for a given parameter / station
· end_cont_seq: year in which ends the longest continuous sequence for a given parameter / station
· duration_y_cont_seq: duration in years for the longest continuous sequence for a given parameter / station
· nmeas_cont_seq: number of measurements for the longest continuous sequence for a given parameter / station
· mean_nmeas_per_y_cont_seq: mean number of observations per year for the longest continuous sequence for a given parameter / station
· mean: mean value (concentration) for a given parameter / station
· median: median value (concentration) for a given parameter / station
· sd: standard deviation (concentration) for a given parameter / station
· cv: coeficient of variation (concentration) for a given parameter / station
· c05,c25,c50,c75,c95: centiles 5, 25, 50, 75 & 95 for a given parameter / station
Description of catchments.gpkg and outlets.gpkg data attributes
Each line is a catchment or an outlet (sampling point)
File is a .gpkg (EPSG = 3035)
· sta_code: Code of the station in the Sandre referentiel (public french "dataverse" for water data)
· sta_name: Name of the station in the Sandre referentiel (public french "dataverse" for water data)
· watercourse: Name of the water course (from spatial join on IGN BD Topo)
· mun_name: Name of the municipality of the outlet (from spatial join on IGN BD Admin Express)
· ccm_wso_id: Seaoutlet id from CCM v2.1 database
· ccm_wso1_id: Elementary catchment id from CCM v2.1 database
· ccm_strahler: Strahler order of the catchment from CCM v2.1 database
· area_km2: Computed area in km2 of the catchment
Description of daily discharges data attributes
Each line corresponds to a daily modelled discharge at a quality station from 1980 to 2019
· sta_code: Code of the station in the Sandre referentiel (public french "dataverse" for water data)
· date: Date in format yyyy-mm-dd
· flow_mm: Discharge expressed in mm.d-1
· flow_m3s: Discharge expressed in m3.s-1
Description of climate data attributes
Each line in the pr_tmin_tmax_1990-2019_lt_mean.csv corresponds to a mean value within a catchment for the 1990-2019 period.
· sta_code: Code of the station in the Sandre referentiel (public french "dataverse" for water data)
· period: 1990-2019
· source: EMO-1 (pr) & EMO-5 (tmin, tmax)
· pr: mean yearly precipitation (mm)
· tmin: mean daily minimal temperature (°C)
· tmin: mean daily maximal temperature (°C)
Description of land cover data attributes
Each line in the clc_8class.csv and clc_44class.csv corresponds to Corine Land Cover (CLC) class for a year (1990, 2000, 2006, 2012, or 2018) and a catchment. Raw CLC typology describes 44 classes that were aggregated to 8 classes (see clc_44class_to_8class.csv).
· clc_44class.csv
o sta_code: Code of the station in the Sandre referentiel (public french "dataverse" for water data)
o year: Year as stated in CLC product
o clc_name: Description of land cover class in CLC product
o clc_code: Code for land cover class in CLC product
o percent_cover: Percent cover by CLC class in the catchment (0-100)
· clc_8class.csv
o sta_code: Code of the station in the Sandre referentiel (public french "dataverse" for water data)
o year: Year as stated in CLC product
o label_clc_8class: Description of land cover class in CLC product aggregated in 8 classes (see clc_44class_to_8class.csv)
o code_clc_8class: Code for land cover class in CLC product aggregated in 8 classes (see clc_44class_to_8class.csv)
o percent_cover: Percent cover by aggregated CLC class in the catchment (0-100)
· clc_44class_to_8class.csv
o code_clc: Code for land cover class in CLC product (44 classes)
o code_clc_8class: Code for land cover class in aggregated CLC product (8classes)
o label_clc_8class: Description of land cover class in CLC product aggregated in 8 classes (see clc_44class_to_8class.csv)
Acknowledgement
This publication has been prepared using European Union's Copernicus Land Monitoring Service information; https://doi.org/10.2909/960998c1-1870-4e82-8051-6485205ebbac
The authors thank Vasken Andréassian for communicating the discharge data and discharge station data and Alban de Lavenne for its help in using the transfr package, both for INRAE UR HYCAR.
References
de Lavenne, A., Skøien, J. O., Cudennec, C., Curie, F., & Moatar, F. (2016). Transferring measured discharge time series: Large-scale comparison of Top-kriging to geomorphology-based inverse modeling: transferring measured discharge time series. Water Resources Research, 52(7), 5555–5576. https://doi.org/10.1002/2016WR018716
de Lavenne, A., Loree, T., Squividant, H., & Cudennec, C. (2023). The transfR toolbox for transferring observed streamflow series to ungauged basins based on their hydrogeomorphology. Environmental Modelling & Software, 159, 105562. https://doi.org/10.1016/j.envsoft.2022.105562
EEA. (2020). Corine Land Cover édition 2018. CLC 2018. https://www.eea.europa.eu/data-and-maps/data/copernicus-land-monitoring-service-corine
Pelletier, A., & Andréassian, V. (2020). Hydrograph separation: An impartial parametrisation for an imperfect method. Hydrology and Earth System Sciences, 24(3), 1171–1187. https://doi.org/10.5194/hess-24-1171-2020
Pelletier, A. (2021). Complétion d'hydrogrammes avec le modèle GR4J - Note méthodologique. INRAE, UR HYCAR.
Thiemig, V., Gomes, G. N., Skøien, J. O., Ziese, M., Rauthe-Schöch, A., Rustemeier, E., Rehfeldt, K., Walawender, J. P., Kolbe, C., Pichon, D., Schweim, C., and Salamon, P.: EMO-5: a high-resolution multi-variable gridded meteorological dataset for Europe, Earth Syst. Sci. Data, 14, 3249–3272, https://doi.org/10.5194/essd-14-3249-2022, 2022
Thieu, V., Silvestre, M., 2015. DoNuts : un système d'information sur les observations environnementales. Présentation Séminaire UMR Métis
Vogt, J., A. de Jager, E. Rimaviciute, W. Mehl, S. Foisneau, K. Bódis, J. Dusart, M.L. Paracchini, P. Haastrup, & C. Bamps. (2007). A pan-European river and catchment database. (European Commission. Joint Research Centre. Institute for Environment and Sustainability.). Publications Office. https://data.europa.eu/doi/10.2788/35907
Files
_FileDescription.txt
Files
(5.0 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:0f3bd5c63229913b506dcdabca72cbc8
|
5.4 kB | Preview Download |
|
md5:3edab53f77a1fb0ec510dd143fe16fb6
|
419.6 kB | Preview Download |
|
md5:dcacc7ce4d83993734116ed6cc3cb1ca
|
111.6 MB | Download |
|
md5:ed42a6b8f7febbabc8a2905eb9635afa
|
50.4 MB | Preview Download |
|
md5:2d19ee6838424d4c400ce524f54a5d90
|
994 Bytes | Preview Download |
|
md5:3ec1f4e560139b78b2b39a73945899d9
|
7.8 MB | Preview Download |
|
md5:61fe0f3ce18686934c6d58834414ca31
|
1.2 GB | Preview Download |
|
md5:9e44f92d38c58e42b4403d2578cfcf07
|
9.6 MB | Preview Download |
|
md5:ea19632518f72f10385a22f5b0364fe1
|
3.0 GB | Preview Download |
|
md5:2929e09e9152bc96de149fbaf013a602
|
551.1 MB | Preview Download |
|
md5:566798a5d01dbcac8b55f24e529b7b2d
|
1.1 MB | Download |
|
md5:2637951c27f087c842f1e5f665765e1e
|
388.0 kB | Preview Download |
Additional details
Related works
- Is part of
- Project deliverable: https://hal.science/hal-04342281 (URL)
- Project deliverable: https://hal.science/hal-04342295 (URL)
- Is required by
- Conference proceeding: https://hal.science/hal-04470186v1 (URL)
- Conference proceeding: 10.13140/RG.2.2.32526.23364 (DOI)
Funding
- Office Français de la Biodiversité
Dates
- Available
-
2024-03-18
References
- de Lavenne, A., Loree, T., Squividant, H., & Cudennec, C. (2023). The transfR toolbox for transferring observed streamflow series to ungauged basins based on their hydrogeomorphology. Environmental Modelling & Software, 159, 105562. https://doi.org/10.1016/j.envsoft.2022.105562
- Pelletier, A. (2021). Complétion d'hydrogrammes avec le modèle GR4J - Note méthodologique. INRAE, UR HYCAR.
- Vogt, J., A. de Jager, E. Rimaviciute, W. Mehl, S. Foisneau, K. Bódis, J. Dusart, M.L. Paracchini, P. Haastrup, & C. Bamps. (2007). A pan-European river and catchment database. (European Commission. Joint Research Centre. Institute for Environment and Sustainability.). Publications Office. https://data.europa.eu/doi/10.2788/35907
- Thiemig, V., Gomes, G. N., Skøien, J. O., Ziese, M., Rauthe-Schöch, A., Rustemeier, E., Rehfeldt, K., Walawender, J. P., Kolbe, C., Pichon, D., Schweim, C., and Salamon, P.: EMO-5: a high-resolution multi-variable gridded meteorological dataset for Europe, Earth Syst. Sci. Data, 14, 3249–3272, https://doi.org/10.5194/essd-14-3249-2022, 2022
- EEA. (2020). Corine Land Cover édition 2018. CLC 2018. https://www.eea.europa.eu/data-and-maps/data/copernicus-land-monitoring-service-corine
- Pelletier, A., & Andréassian, V. (2020). Hydrograph separation: An impartial parametrisation for an imperfect method. Hydrology and Earth System Sciences, 24(3), 1171–1187. https://doi.org/10.5194/hess-24-1171-2020
- Thieu, V., Silvestre, M., 2015. DoNuts : un système d'information sur les observations environnementales. Présentation Séminaire UMR Métis