GEMS-GER: A Machine Learning Benchmark Dataset of Long-Term Groundwater Levels in Germany with Meteorological Forcings and Site-Specific Environmental Features
Authors/Creators
- 1. Karlsruhe Institute of Technology (KIT), Institute of Applied Geosciences
Contributors
Contact person:
- 1. Karlsruhe Institute of Technology (KIT), Institute of Applied Geosciences
- 2. Federal Institute for Geosciences and Natural Resources
Description
This repository provides the dataset accompanying the GEMS-GER (Groundwater Levels, Environment, Meteorology, Site Properties – Germany) benchmark for machine learning-based groundwater modeling. The dataset includes long-term groundwater level time series, meteorological and hydrological forcing data, site-specific environmental properties, and benchmark model evaluation results. All data originate from official public sources and have been harmonized across the 16 German federal states (Bundesländer).
Contents of this repository:
-
Groundwater level time series (
GEMS-GER_data/dynamic/MW_*.csv):Weekly aggregated groundwater levels (GWL) from 3,207 monitoring wells (1991–2022), including:
-
Daily temperature (mean, min, max)
-
Precipitation and humidity (HYRAS/DWD)
-
Real, potential, and reference evapotranspiration
-
Soil moisture and soil temperature (5 m)
-
Snow water equivalent, snowmelt, and runoff (ERA5-Land)
-
GWL_flagindicating observed vs. imputed values
Site-specific static descriptors (
GEMS-GER_data/static/static_features.csv):-
Hydrogeology and soil type
-
Land use and climate classification
-
Elevation and derived topographic parameters (e.g. slope, TWI)
Benchmark model performance (
GEMS-GER_data/model_performance/)-
Median NSE, RMSE, R², and Bias across 10 runs for each of the 3,207 wells:
model_performance_singlewell_model.csv– Single-well models (trained individually)model_performance_global_model.csv– Global model (trained jointly on all wells)
Pre-generated time series plots (
GEMS-GER_data_figures/DYN_Feat_MW_*.pdf):-
Visualizations of groundwater levels and selected forcing variables for all wells
-
Provided separately to reduce the size of the main dataset download
-
Directory structure:
GEMS-GER_data/
│── dynamic/ # 3,207 individual CSV files, one per well
│ ├── MW_1.csv
│ ├── MW_2.csv
│ └── ...
│
│── static/
│ └── static_features.csv # Site-specific static descriptors (e.g. geology, land use, climate)
│
│── model_performance/
│ ├── model_performance_singlewell_model.csv # Median NSE, RMSE, R², Bias from 10-run ensemble (single-well models)
│ └── model_performance_global_model.csv # Median NSE, RMSE, R², Bias from 10-run ensemble (global model)
GEMS-GER_data_figures/
│── DYN_Feat_MW_1.pdf
│── DYN_Feat_MW_2.pdf
│── ...
The dataset is intended for research and benchmarking in hydrogeology, data-driven groundwater modeling, and environmental machine learning. It forms the basis of the GEMS-GER benchmark, as described in the associated preprint. All data originate from public sources and have been harmonized across administrative and institutional boundaries to enable consistent large-scale analysis.
All associated code, documentation, and update announcements are maintained in the project's GitHub repository:
https://github.com/KITHydrogeology/GEMS-GER ensuring transparency, traceability, and reproducibility.
Files
GEMS-GER_data.zip
Files
(1.1 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:f31b8b9389d70be99ab938cb984e68d9
|
288.0 MB | Preview Download |
|
md5:10ec69c857d72a904a4f371d551e6bff
|
844.4 MB | Preview Download |
Additional details
Additional titles
- Subtitle (English)
- Groundwater Levels, Environment, Meteorology, Site Properties
Software
- Repository URL
- https://github.com/KITHydrogeology/GEMS-GER
- Programming language
- Python