Published March 31, 2025 | Version v11
Dataset Open

EuroCropsML

  • 1. Technical University of Munich
  • 2. dida Datenschmiede GmbH

Description

EuroCropsML* is a ready-to-use ML dataset combining EuroCrops reference data with Sentinel-2 reflectance data from 2021. It contains data from Latvia, Portugal, and Estonia and is intended for benchmarking few-shot crop type classification. We used Eurostat's GISCO dataset to map the EuroCrops parcels to their NUTS1-3 region.

The provided data comes in two stages:

  1. raw_data.zip (stage 1): One dataframe per country containing a annual time series of observations for each parcel, as well as separate files for the parcels' geometries and classes (EC_hcat_c = 10-digit HCAT code indicating the hierarchy of the crop).
  2. preprocess.zip (stage 2): Read-to-use .npz-files. Each data point is saved in an .npz-file along with its metadata (parcel's centroid in [lon,lan]; observation dates). In addition, we performed some cloud removal steps. Each .npz-file is saved with the following naming convention: <NUTS3region>_<parcelID>_<EC_hcat_c>.npz

Furthermore, split.zip contains .json-files that split the files from preprocess.zip into a pre-training/meta-learning (train and validation) and fine-tuning (train, validation, and test) dataset. In total, we provide two use cases:

  • latvia_vs_estonia: pre-training on Latvia (103 distinct classes) and fine-tuning on Estonia (127 distinct classes, of which 46 have not been seen during pre-training)
  • latvia_portugal_vs_estonia: pre-training on Latvia and Portugal (142 distinct classes), fine-tuning on Estonia (127 distinct classes, of which 34 have not been seen during pre-training)
  • overlap_latvia_vs_estonia: pre-training on overlapping classes between Latvia and Estonia (81 distinct classes) and fine-tuning on Estonia (127 distinct classes, of which 46 have not been seen during pre-training)
  • overlap_latvia_portugal_vs_estonia: pre-training on overlapping classes between Latvia and Estonia as well as Portugal and Estonia (93 distinct classes in total), fine-tuning on Estonia (127 distinct classes, of which 34 have not been seen during pre-training)

For all cases, the fine-tuning split tays consistent and is as follows:

  • train: 1-, 5-, 10-, 20-, 100-, 200-, 500-shot (for few-shot classification and benchmarking) and all samples
  • validation: 1000 samples
  • test: all samples

 

Changelog

  • Version 11:
    • Added new splits: overlap_latvia_vs_estona and overlap_latvia_portugal_vs_estonia.
    • Rectified file Portugal.parquet that has been inadvertently corrupted in version 10.
  • Version 10: 
    • Added back in missing raw_data files 
    • Raw data geometries <Country>_geometries.geojson and labels <Country>_labels.geojson are now sourced from the shapefiles and, hence, contain all parcels, even if no Sentinel-2 is available.
  • Version 9: 
    • Replacement of some parcel IDs for Latvia and Estonia. These parcels were duplicates within the original source shapefile. Hence, the replacement does not affect the data itself, solely the parcel IDs are replaced. The following IDs are affected:
      country old parcel ID new parcel ID
      Estonia 20548567 22172347
      Estonia 21313556 22111331
      Latvia 12786929 13203478
      Latvia 12297424 12804307
      Latvia 12297361 12804296
      Latvia 12297423 12803325
      Latvia 12297421 12803323
      Latvia 12297422 12803324
      In the split files, for Latvia only ID 13203478 is affected. The remaining IDs are not part of the splits since they belong to the meadow class, which is downsampled for the pre-training splits.
    • This version is missing the <Country>.parquet files and cannot be used to run pre-processing.
  • Version 8: Adjustment of Portugal finetuning split such that it matches the Latvia finetuning split
  • Version 7: Added new few-shot fine-tuning splits: 200 and 500
  • Version 6: Added new (few-shot) fine-tuning splits: 20, 100, and all samples
  • Version 4: The EuroCrops shapefiles sometimes contain a couple of parcels that lie outside the national borders. We now map them to the closest NUTS region within the country. Please rely on this version or newer.
  • Version 3: Some parcels have been clipped incorrectly. 
  • Version 2: Remove datapoints that contain only cloudy observations (in preprocess.zip).
  • Version 1: Initial publication

* Contains Copernicus Sentinel data (2024), processed on EOLab

 

Country-secific data sources for EuroCrops reference data

Estonia:

INSPIRE GEOPORTAL

If link does not work, search for Estonia --> Geospatial Aid Application Estonia Agricultural parcels on the INSPIRE platform.

Latvia:

Lauku atbalsta dienests Updated Source

Portugal:

Download via WFS https://www.ifap.pt/isip/ows/isip.data/wfs or over the IFAP website.

 

 

 

Files

preprocess.zip

Files (4.8 GB)

Name Size Download all
md5:66d04b70399bdcd522ecd8ab0ea275ca
1.5 GB Preview Download
md5:63aabc161f299969cf764aa45d2c8c9a
3.3 GB Preview Download
md5:81ac1dd8c0632ec5f4dc51a766e955db
20.7 MB Preview Download

Additional details

Funding

Federal Ministry for Economic Affairs and Climate Action