Published April 9, 2026 | Version v1.0.0
Dataset Open

Filter Variant Optimization Metrics and Station-Level Phenological Observations for Winter Wheat in Germany between 1993 and 2024

  • 1. Julius Kühn-Institut

Description

This dataset provides two complementary outputs of the PhenoPhaseR processing pipeline (Möller and Gerstmann, 2026) for
winter wheat (Triticum aestivum L., DWD Plant ID 202), covering all observed phenological phases (sowing to harvest) for the entire territory of Germany for the period 1993–2024.

PhenoPhaseR is an open-source R workflow that couples the PHASE model (Gerstmann et al., 2016) - a growing-degree-day (GDD) approach with adaptive outlier filtering - to derive spatially consistent, quality-controlled point observations of phenological phase-start dates from the DWD volunteer observer network. The dataset components provided here correspond to processing Steps 5–6 (filter variant optimization) of the seven-step pipeline.

Technical info (English)

Component 1a - Full Optimisation Landscape (OPT_ALL_202.csv)

A semicolon-delimited table (decimal separator: comma) containing the complete candidate matrix of all 45 filter variants evaluated per year–phase combination for all seven observed phenological phases of winter wheat. The file covers 10,080 rows (7 phases × 32 years × 45 variants including nine quantile thresholds q ∈ {0.30, …, 0.70} × five filter strengths f_std ∈ {1.0, 1.5, 2.0, 2.5, 3.0}). Variants for which the Pearson correlation r falls below the minimum threshold (r_min = 0.50) receive an OPT value of -Inf and are excluded from variant selection; the OPT_normalized score is still reported for completeness.

This file allows users to reconstruct the full optimisation landscape - for example to visualise how MAE, SN, and OPT vary jointly across all q–f_std combinations for a given year - and to apply alternative selection criteria beyond those implemented in Step 6.

The column structure is identical to OPT_MAX_202.csv (see Component 1b).

Component 1b - Selected Optimisation Metrics (OPT_MAX_202.csv)

A semicolon-delimited table (decimal separator: comma) containing the year-specific best-performing filter variant selected by the adaptive optimality score OPT = SN^α(y) × r for all seven observed phenological phases of winter wheat. The file covers 224 year–phase combinations (7 phases × 32 years, 1993-2024). Each row represents the single optimal filter variant selected per year–phase combination from the 45-row candidate matrix.

Column descriptions:

- YEAR - Reference year of the phenological phase
- Q - Optimal GDD quantile threshold q\* (minimises MAE)
- RMSE - Root mean squared error between predicted and observed DOY (days)
- MAE - Mean absolute error between predicted and observed DOY (days)
- SN - Sample size: number of stations retained after outlier filtering
- COR - Pearson correlation r between predicted and observed DOY
- STD - Selected outlier-filter strength f\_std
- PLANT - DWD crop type identifier (202 = winter wheat)
- PHASE - DWD phase identifier
- sn_exponent - Year-specific adaptive OPT exponent α(y)
- OPT - Raw optimality score (SN^α(y) × r)
- OPT_normalized - OPT score normalised to [0, 1] across all years

Phenological phases included (DWD phase ID/phase name/BBCH equivalent):

- Phase 10 - Sowing (BBCH 00
- Phase 12 - Emergence (BBCH 09)
- Phase 15 - Beginning of shooting (BBCH 30)
- Phase 18 - Heading (BBCH 51)
- Phase 19 - Milk ripening (BBCH 73/75)
- Phase 21 - Yellow ripening (BBCH 87)
- Phase 24 - Harvest (BBCH 99)

Component 2 - Filtered Station Observations (SHP_202-{PHASE}.zip)

Seven ZIP archives (one per phenological phase) each containing 32 annual ESRI Shapefiles (one per year, 1993-2024) of the quality-controlled and outlier-filtered point observations for winter wheat. These are the station datasets forwarded to Step 7 (spatial interpolation) after filter variant selection. Each year's shapefile consists of four standard component files (.shp, .dbf, .shx, .prj), named according to the convention
DOY_202-{PHASE}_{YEAR}.\*

Shapefile attributes:

- GRID_ID - DWD 1 km weather grid cell identifier
- STATION - DWD station identifier
- YEAR - Reference year
- PLANT - DWD crop type identifier (202 = winter wheat)
- PHASE - DWD phase identifier
- DATE - Observed phase-start date (calendar date)
- DOY_START - Observed phase-start DOY; stored as negative integer for winter crops (days counted back from 1 January of the following spring year)

Coordinate reference system: Gauss-Krüger Zone 3 (EPSG:31467, Transverse Mercator, Bessel 1841 ellipsoid, central meridian 9° E, false easting 3 500 000 m).

Geometry type: Point

Input Data

The observation data underlying both components were downloaded automatically from the DWD Climate Data Center (CDC) via the download_dwd_phenology() function of PhenoPhaseR and spatially coupled to the DWD 1 km weather grid via couple_phenology_stations(). Daily mean air temperatures required for GDD computation were extracted from the DWD gridded temperature product (1 km resolution) via load_gridded_temperature(). All input data are archived at:

DWD and JKI (2026). Input data collection for the PhenoPhaseR model.
Zenodo. https://doi.org/10.5281/zenodo.18772094

Methods

Filter variant optimisation is performed by Steps 5-6 of the PhenoPhaseR pipeline.

Step 5 - Critical DOY determination (critical_doy_determination()): For each combination of year, phase, and filter strength f_std, a photoperiod-weighted GDD accumulation (Gerstmann et al., 2016) is computed at each station. The q-th empirical quantile of the station-level GDD totals at the observed phenological date defines the threshold τ_q. The predicted critical DOY is the earliest day on which the station's running GDD sum exceeds τ_q. Stations whose absolute prediction error exceeds f_std × σ_DOY (where σ_DOY is the standard deviation of observed DOYs across all stations) are flagged as outliers and excluded. Accuracy metrics (RMSE, MAE, Pearson r, SN) are computed for each of the 45 candidate variants (q ∈ {0.30, 0.35, …, 0.70}, f_std ∈ {1.0, 1.5, 2.0, 2.5, 3.0}).

Step 6 - Adaptive filter variant selection (filter_variant_selector()): For each year, the optimal variant is identified by maximising the composite optimality score OPT = SN^α(y) × r, where α(y) is a year-specific exponentthat interpolates linearly from α_start = 1.0 (year with the highest station count) to α_end = 2.0 (year with the lowest station count), thereby increasingly penalising sample size loss in data-sparse years. Variants with r < 0.50 are excluded. The selected shapefile from Step 5 is forwarded to Step 7 (spatial interpolation).

The complete workflow is described in:

Möller, M. and Gerstmann, H. (2026). PhenoPhaseR: Reproducible processing workflow for interpolating phenological DWD observations. Zenodo. https://doi.org/10.5281/zenodo.18743008

Intended Use and DFFP Context

The full optimisation landscape (OPT_ALL_202.csv) enables detailed inspection of how MAE, SN, and OPT respond jointly across all 45 q–f_std candidates for each year and phase. This is particularly useful for diagnosing data-sparse years, comparing the sensitivity of the optimality score to the choice of quantile threshold, and developing alternative variant-selection strategies.

The selected metrics (OPT_MAX_202.csv) provide a longitudinal quality indicator of the phenological data basis for each year–phase combination. From a Data-Fitness-For-Purpose (DFFP) perspective (Säurich et al., 2026), declining SN values and increasing MAE over the 1993-2024 period reflect the progressive attrition of the DWD volunteer observer network and allow downstream users to assess whether individual years meet their application-specific quality thresholds before using the derived spatial products.

The filtered shapefiles are the direct input to spatial interpolation (Step 7) and can alternatively serve as standalone point datasets for applications that operate at station level, such as process-based crop model calibration (Heiß et al., 2026) or regional phenological trend analysis.

Temporal and Spatial Coverage

- Temporal extent: 1993–2024 (32 years)
- Spatial extent: Federal Republic of Germany
- Bounding box: 5.87° E – 15.04° E, 47.27° N – 55.06° N (approx.)
- Spatial reference: EPSG:31467 (Gauss-Krüger Zone 3)
- Observation source: DWD volunteer phenological observer network (CDC)

File List

Tabular optimisation metrics:

- OPT_ALL_202.csv - CSV (semicolon delimited, comma decimal)  - 10,080 records - Full optimisation landscape: all 45 q–f_std candidates per year–phase combination, all 7 phases, 1993-2024
- OPT_MAX_202.csv - CSV (semicolon delimited, comma decimal) - 224 records - Selected best-performing filter variant per year–phase combination, all 7 phases, 1993-2024

Filtered station observation shapefiles (one ZIP per phase, 32 annual
Shapefiles each, file naming: DOY_202-{PHASE}_{YEAR}.*):

- SHP_202-10.zip - Phase 10 - Sowing - BBCH 00 - 32 Shapefiles
- SHP_202-12.zip - Phase 12 - Emergence - BBCH 09 - 32 Shapefiles
- SHP_202-15.zip - Phase 15 - Beginning of shooting - BBCH 30 - 32 Shapefiles
- SHP_202-18.zip - Phase 18 - Heading - BBCH 51 - 32 Shapefiles
- SHP_202-19.zip - Phase 19 - Milk ripening - BBCH 73/75 - 32 Shapefiles
- SHP_202-21.zip - Phase 21 - Yellow ripening - BBCH 87 - 32 Shapefiles
- SHP_202-24.zip - Phase 24 - Harvest - BBCH 99 - 32 Shapefiles

Related Publications and Software

- Input data: DWD and JKI (2026). Input data collection for the PhenoPhaseR model. Zenodo. https://doi.org/10.5281/zenodo.18772094
- PHASE model: Gerstmann, H., Doktor, D., Gläßer, C., and Möller, M. (2016). PHASE: A geostatistical model for the Kriging-based spatial prediction of crop phenology. Computers and Electronics in Agriculture, 127, 726–738. https://doi.org/10.1016/j.compag.2016.07.032
- Observation network: Kaspar, F., Zimmermann, K., and Polte-Rudolf, C. (2014). An overview of the phenological observation network and the phenological database of Germany's national meteorological service. Advances in Science and Research, 11, 93–99. https://doi.org/10.5194/asr-11-93-2014
- DFFP framework: Säurich, J., Schwieder, M., Preidl, S., Beyer, F., and Möller, M. (2026). Are remote sensing-based crop type classifications suitable for calculating a landscape heterogeneity metric? A data-fitness-for-purpose assessment. Ecological Informatics, 95, 103660. https://doi.org/10.1016/j.ecoinf.2026.103660
- Downstream application: Heiß, I., Katte, A.-S., Koop, S., and Vogeler, I. (2026). Transparent and Reproducible Crop Model Calibration Using Exclusively Public Data: Improving Phenology and Yield Predictions in APSIMx. Environmental Modelling and Software. https://doi.org/10.1016/j.envsoft.2026.106968

Licence

This dataset is published under the Creative Commons Attribution 4.0 International licence (CC BY 4.0). You are free to share and adapt the material for any purpose, provided appropriate credit is given, a link to the licence is provided, and any changes are indicated.

Licence text: https://creativecommons.org/licenses/by/4.0/

Citation

Möller, M. (2026). Filter Variant Optimization Metrics and Station-Level Phenological Observations for Winter Wheat in Germany between 1993 and 2024. Zenodo. https://doi.org/10.5281/zenodo.19483112

Files

OPT_ALL_202.csv

Files (530.2 MB)

Name Size Download all
md5:91af1fe0679cfe46b7122cc1cca99099
1.2 MB Preview Download
md5:3915c2035dc31e11927e628ea0d79ef6
28.3 kB Preview Download
md5:e079705bdf2f69b8889a2202d97dd969
12.5 kB Preview Download
md5:2f1e202e46d98faf2e552bbeb26aef17
21.1 MB Preview Download
md5:b91fbaccfd2fe4402d30576d48a9cd7e
52.4 MB Preview Download
md5:94315ccb677c6ad1b90c34154817a650
77.9 MB Preview Download
md5:1bd19a9bd0165c2c2c39b1d9c215727e
95.2 MB Preview Download
md5:87f8b9c4df1f12a35d9072486d53a9c4
80.6 MB Preview Download
md5:fe04e9c546f718d4b58eac92d325f818
94.6 MB Preview Download
md5:1f2b85e7062dbfa6b2363c7a15e6f70e
107.0 MB Preview Download

Additional details

Related works

Is derived from
Dataset: 10.5281/zenodo.18772094 (DOI)
Is supplement to
Computational notebook: 10.5281/zenodo.18743008 (DOI)

Funding

Deutsche Forschungsgemeinschaft
FAIRe Dateninfrastruktur für die Agrosystemforschung 501899475