EuroCropsML
- 1. Technical University of Munich
- 2. dida Datenschmiede GmbH
Description
EuroCropsML* is a ready-to-use ML dataset combining EuroCrops reference data with Sentinel-2 reflectance data from 2021. It contains data from Latvia, Portugal, and Estonia and is intended for benchmarking few-shot crop type classification. We used Eurostat's GISCO dataset to map the EuroCrops parcels to their NUTS1-3 region.
The provided data comes in two stages:
- raw_data.zip (stage 1): One dataframe per country containing a annual time series of observations for each parcel, as well as separate files for the parcels' geometries and classes (EC_hcat_c = 10-digit HCAT code indicating the hierarchy of the crop).
- preprocess.zip (stage 2): Read-to-use .npz-files. Each data point is saved in an .npz-file along with its metadata. In addition, we performed some cloud removal steps. Each .npz-file is saved with the following naming convention: <NUTS3region>_<parcelID>_<EC_hcat_c>.npz
Furthermore, split.zip contains .json-files that split the files from preprocess.zip into a pre-training/meta-learning (train and validation) and fine-tuning (train, validation, and test) dataset. In total, we provide two use cases:
- latvia_portugal_vs_estonia: pre-training on Latvia and Portugal (142 distinct classes), fine-tuning on Estonia (127 distinct classes, of which 34 have not been seen during pre-training)
- latvia_vs_estonia: pre-training on Latvia (103 distinct classes) and fine-tuning on Estonia (127 distinct classes, of which 46 have not been seen during pre-training)
For both use cases, the fine-tuning split is as follows:
- train: 1-, 5-, 10-, 20-, 100-, 200-, 500-shot (for few-shot classification and benchmarking) and all samples
- validation: 1000 samples
- test: all samples
Changelog
- Version 9:
- Raw data geometries <Country>_geometries.geojson and labels <Country>_labels.geojson are now sourced from the shapefiles and, hence, contain all parcels, even if no Sentinel-2 is available.
- Replacement of some parcel IDs for Latvia and Estonia. These parcels were duplicates within the original source shapefile. Hence, the replacement does not affect the data itself, solely the parcel IDs are replaced (in preprocess.zip and split.zip). The following IDs are affected:
-
country old parcel ID new parcel ID Estonia 20548567 22172347 Estonia 21313556 22111331 Latvia 12786929 13203478 Latvia 12297424 12804307 Latvia 12297361 12804296 Latvia 12297423 12803325 Latvia 12297421 12803323 Latvia 12297422 12803324
- Version 8: Adjustment of Portugal finetuning split such that it matches the Latvia finetuning split
- Version 7: Added new few-shot fine-tuning splits: 200 and 500
- Version 6: Added new (few-shot) fine-tuning splits: 20, 100, and all samples
- Version 4: The EuroCrops shapefiles sometimes contain a couple of parcels that lie outside the national borders. We now map them to the closest NUTS region within the country. Please rely on this version or newer.
- Version 3: Some parcels have been clipped incorrectly.
- Version 2: Remove datapoints that contain only cloudy observations (in preprocess.zip).
- Version 1: Initial publication
* Contains Copernicus Sentinel data (2024), processed on EOLab
Country-secific data sources for EuroCrops reference data
Estonia:
If link does not work, search for Estonia
--> Geospatial Aid Application Estonia Agricultural parcels
on the INSPIRE platform.
Latvia:
Lauku atbalsta dienests Updated Source
Portugal:
Download via WFS https://www.ifap.pt/isip/ows/isip.data/wfs
or over the IFAP website.
Files
preprocess.zip
Additional details
Funding
- Federal Ministry for Economic Affairs and Climate Action