There is a newer version of the record available.

Published August 4, 2025 | Version v2
Dataset Open

GEMS-GER: A Machine Learning Benchmark Dataset of Long-Term Groundwater Levels in Germany with Meteorological Forcings and Site-Specific Environmental Features

Authors/Creators

  • 1. Karlsruhe Institute of Technology (KIT), Institute of Applied Geosciences
  • 1. Karlsruhe Institute of Technology (KIT), Institute of Applied Geosciences
  • 2. Federal Institute for Geosciences and Natural Resources

Description

This repository provides the dataset accompanying the GEMS-GER (Groundwater Levels, Environment, Meteorology, Site Properties – Germany) benchmark for machine learning-based groundwater modeling. The dataset includes long-term groundwater level time series, meteorological and hydrological forcing data, site-specific environmental properties, and benchmark model evaluation results. All data originate from official public sources and have been harmonized across the 16 German federal states (Bundesländer).

Contents of this repository:

  • Groundwater level time series (GEMS-GER_data/dynamic/MW_*.csv):

    Weekly aggregated groundwater levels (GWL) from 3,207 monitoring wells (1991–2022), including:

    • Daily temperature (mean, min, max)

    • Precipitation and humidity (HYRAS/DWD)

    • Real, potential, and reference evapotranspiration

    • Soil moisture and soil temperature (5 m)

    • Snow water equivalent, snowmelt, and runoff (ERA5-Land)

    • GWL_flag indicating observed vs. imputed values

    Site-specific static descriptors (GEMS-GER_data/static/static_features.csv):

    • Hydrogeology and soil type

    • Land use and climate classification

    • Elevation and derived topographic parameters (e.g. slope, TWI)

    Benchmark model performance (GEMS-GER_data/model_performance/)

    • Median NSE, RMSE, R², and Bias across 10 runs for each of the 3,207 wells:

    • model_performance_singlewell_model.csv – Single-well models (trained individually)
    • model_performance_global_model.csv – Global model (trained jointly on all wells)

    Pre-generated time series plots (GEMS-GER_data_figures/DYN_Feat_MW_*.pdf):

    • Visualizations of groundwater levels and selected forcing variables for all wells

    • Provided separately to reduce the size of the main dataset download

Directory structure:

GEMS-GER_data/
│── dynamic/                             # 3,207 individual CSV files, one per well
│      ├── MW_1.csv
│      ├── MW_2.csv
│      └── ...

│── static/
│      └── static_features.csv              # Site-specific static descriptors (e.g. geology, land use, climate)

│── model_performance/
│      ├── model_performance_singlewell_model.csv   # Median NSE, RMSE, R², Bias from 10-run ensemble (single-well models)
│      └── model_performance_global_model.csv       # Median NSE, RMSE, R², Bias from 10-run ensemble (global model)

GEMS-GER_data_figures/
│── DYN_Feat_MW_1.pdf
│── DYN_Feat_MW_2.pdf
│── ...

The dataset is intended for research and benchmarking in hydrogeology, data-driven groundwater modeling, and environmental machine learning. It forms the basis of the GEMS-GER benchmark, as described in the associated preprint. All data originate from public sources and have been harmonized across administrative and institutional boundaries to enable consistent large-scale analysis.

 

All associated code, documentation, and update announcements are maintained in the project's GitHub repository:  
https://github.com/KITHydrogeology/GEMS-GER ensuring transparency, traceability, and reproducibility.

Files

GEMS-GER_data.zip

Files (1.1 GB)

Name Size Download all
md5:f31b8b9389d70be99ab938cb984e68d9
288.0 MB Preview Download
md5:10ec69c857d72a904a4f371d551e6bff
844.4 MB Preview Download

Additional details

Additional titles

Subtitle (English)
Groundwater Levels, Environment, Meteorology, Site Properties

Software

Repository URL
https://github.com/KITHydrogeology/GEMS-GER
Programming language
Python