EuroCropsML
- 1. Technical University of Munich
- 2. dida Datenschmiede GmbH
Description
EuroCropsML* is a ready-to-use ML dataset combining EuroCrops reference data with Sentinel-2 reflectance data from 2021. It contains data from Latvia, Portugal, and Estonia and is intended for benchmarking few-shot crop type classification. We used Eurostat's GISCO dataset to map the EuroCrops parcels to their NUTS1-3 region.
The provided data comes in two stages:
- raw_data.zip (stage 1): One dataframe per country containing a annual time series of observations for each parcel, as well as separate files for the parcels' geometries and classes (EC_hcat_c = 10-digit HCAT code indicating the hierarchy of the crop).
- preprocess.zip (stage 2): Read-to-use .npz-files. Each data point is saved in an .npz-file along with its metadata (parcel's centroid in [lon,lan]; observation dates). In addition, we performed some cloud removal steps. Each .npz-file is saved with the following naming convention: <NUTS3region>_<parcelID>_<EC_hcat_c>.npz
Furthermore, split.zip contains .json-files that split the files from preprocess.zip into a pre-training/meta-learning (train and validation) and fine-tuning (train, validation, and test) dataset. In total, we provide two use cases:
- latvia_vs_estonia: pre-training on Latvia (103 distinct classes) and fine-tuning on Estonia (127 distinct classes, of which 46 have not been seen during pre-training)
- latvia_portugal_vs_estonia: pre-training on Latvia and Portugal (142 distinct classes), fine-tuning on Estonia (127 distinct classes, of which 34 have not been seen during pre-training)
- overlap_latvia_vs_estonia: pre-training on overlapping classes between Latvia and Estonia (81 distinct classes) and fine-tuning on Estonia (127 distinct classes, of which 46 have not been seen during pre-training)
- overlap_latvia_portugal_vs_estonia: pre-training on overlapping classes between Latvia and Estonia as well as Portugal and Estonia (93 distinct classes in total), fine-tuning on Estonia (127 distinct classes, of which 34 have not been seen during pre-training)
For all cases, the fine-tuning split tays consistent and is as follows:
- train: 1-, 5-, 10-, 20-, 100-, 200-, 500-shot (for few-shot classification and benchmarking) and all samples
- validation: 1000 samples
- test: all samples
Changelog
- Version 11:
- Added new splits: overlap_latvia_vs_estona and overlap_latvia_portugal_vs_estonia.
- Rectified file Portugal.parquet that has been inadvertently corrupted in version 10.
- Version 10:
- Added back in missing raw_data files
- Raw data geometries <Country>_geometries.geojson and labels <Country>_labels.geojson are now sourced from the shapefiles and, hence, contain all parcels, even if no Sentinel-2 is available.
- Added back in missing raw_data files
- Version 9:
- Replacement of some parcel IDs for Latvia and Estonia. These parcels were duplicates within the original source shapefile. Hence, the replacement does not affect the data itself, solely the parcel IDs are replaced. The following IDs are affected:
In the split files, for Latvia only ID 13203478 is affected. The remaining IDs are not part of the splits since they belong to the meadow class, which is downsampled for the pre-training splits.country old parcel ID new parcel ID Estonia 20548567 22172347 Estonia 21313556 22111331 Latvia 12786929 13203478 Latvia 12297424 12804307 Latvia 12297361 12804296 Latvia 12297423 12803325 Latvia 12297421 12803323 Latvia 12297422 12803324 - This version is missing the <Country>.parquet files and cannot be used to run pre-processing.
- Replacement of some parcel IDs for Latvia and Estonia. These parcels were duplicates within the original source shapefile. Hence, the replacement does not affect the data itself, solely the parcel IDs are replaced. The following IDs are affected:
- Version 8: Adjustment of Portugal finetuning split such that it matches the Latvia finetuning split
- Version 7: Added new few-shot fine-tuning splits: 200 and 500
- Version 6: Added new (few-shot) fine-tuning splits: 20, 100, and all samples
- Version 4: The EuroCrops shapefiles sometimes contain a couple of parcels that lie outside the national borders. We now map them to the closest NUTS region within the country. Please rely on this version or newer.
- Version 3: Some parcels have been clipped incorrectly.
- Version 2: Remove datapoints that contain only cloudy observations (in preprocess.zip).
- Version 1: Initial publication
* Contains Copernicus Sentinel data (2024), processed on EOLab
Country-secific data sources for EuroCrops reference data
Estonia:
If link does not work, search for Estonia --> Geospatial Aid Application Estonia Agricultural parcels on the INSPIRE platform.
Latvia:
Lauku atbalsta dienests Updated Source
Portugal:
Download via WFS https://www.ifap.pt/isip/ows/isip.data/wfs or over the IFAP website.
Files
preprocess.zip
Additional details
Funding
- Federal Ministry for Economic Affairs and Climate Action