Published April 25, 2023 | Version v1
Dataset Open

Crowds & Machines Next level: Meditteranean wheat classification labels from gamified crowd-sourcing

  • 1. 52impact B.V.
  • 2. Blackshore B.V.

Description

Machine learning (and especially deep learning) algorithms need lots of training and validation datasets, which are often unavailable. Creating on-ground datasets is costly and time consuming. Within the European Space Agency funded project ‘Crowds & Machine – Next Level’ (by Blackshore B.V., 52impact B.V. and The Hague Centre for Strategic Studies), we aimed to solve this issue by generating labelled data effectively using an innovative gamified crowdsourced-based method.

The objective of the project ‘Crowds & Machines Next Level’ was to generate labelled data for the training and validation of machine learning algorithms to classify the crop wheat. We make those labelled datasets freely available as open data to organisations that use machine learning for their activities, mainly companies and knowledge institutes. As part of the project we developed example scripts (Jupyter notebooks) that enable organisations to use the crowdsourced generated data smoothly for their own machine learning systems. 

BlackShore has developed the online platform Cerberus to enable large scale generation of labelled datasets, which is deployed on twenty locations around the Mediterranean Sea to generate labelled datasets of wheat and other land cover classes (see table). Those different locations encompass a diversity of climate regions, harvest cultures and crop calendars, posing a challenge to the training of machine learning algorithms. Gamers click on hexagons plotted on top of very high resolution satellite imagery (captured during the harvest period in 2021), and by combining 3 different hexagon grids those clicks are converted into triangles. Each triangle has a number of clicks (by different users) per land cover category, which provides a measure of accuracy to the label.

52impact developed example tutorials to use the data to train pixel-based (Random Forest) and segmentation-based (U-Net) machine learning models, using Sentinel-2 imagery (provided in the data folder), which can be forked here: https://bitbucket.org/52impact/crowds-machines.
 

Overview of locations
ID location_id Country Region Shape Harvest period VHR image date S-2 pre-harvest S-2 harvest S-2 post-harvest
01 portugalAlentejo Portugal Alentejo 01_Portugal_Alentejo_SELECTION 10 Jul - 1 Aug 07/07/2021 14/05/2021 13/07/2021 22/08/2022
02 spainAndalusia Spain Andalusia 02_Spain_Andalusia_SELECTION 10 Jul - 1 Aug 02/07/2021 16/05/2021 15/07/2021 03/09/2021
03 spainAragon Spain Aragon 03_Spain_Aragon_SELECTION 10 Jul - 1 Aug 26/10/2021 20/05/2021 19/07/2021 05/09/2021
04 franceAude France Aude 04_France_Aude_SELECTION 1 Jul - 1 Oct 22/09/2021 12/05/2021 10/08/2021 18/11/2021
05 franceCamargue France Camargue 05_France_Camargue_SELECTION 1 Jul - 1 Oct 07/10/2021 12/05/2021 10/08/2021 18/11/2021
06 franceProvence France Provence 06_France_Provence_SELECTION 1 Jul - 1 Oct 26/10/2021 19/05/2021 17/08/2021 20/11/2021
07_08 italyMarche Italy Marche (East and West) 07_08_Italy_Marche_SELECTION 1 Jul - 1 Sept 09/08/2021 26/05/2021 25/07/2021 20/11/2021
09 italySardinia Italy Sardinia 09_Italy_Sardinia_SELECTION 1 Jul - 1 Sept 31/08/2021 26/05/2021 22/07/2021 10/10/2021
10 italySicily Italy Sicily 10_Italy_Sicily_SELECTION 1 Jul - 1 Sept 19/09/2021 22/05/2021 26/07/2021 10/10/2021
11 italyPugliaNorth Italy Puglia (North) 11_Italy_PugliaNorth_SELECTION 1 Jul - 1 Sept 06/10/2021 11/06/2021 31/07/2021 04/10/2021
12 italyPuglia Italy Puglia 12_Italy_Puglia_SELECTION 1 Jul - 1 Sept 19/08/2021 03/06/2021 02/08/2021 21/10/2021
13 greeceWest Greece West 13_Greece_West_SELECTION 1 Sept - 1 Nov 02/09/2021 27/07/2021 05/10/2021 14/12/2021
14 greeceThessaly Greece Thessaly 14_Greece_Thessaly_SELECTION 1 Sept - 1 Nov 14/07/2021 27/07/2021 25/09/2021 19/12/2021
15 greeceMacedoniaCentral Greece Macedonia (Central) 15_Greece_MacedoniaCentral_SELECTION 1 Jun - 1 Aug 22/07/2021 13/05/2021 22/07/2021 15/09/2021
16 greeceMacedoniaEast Greece Macedonia (East) 16_Greece_MacedoniaEast_SELECTION 1 Jun - 1 Aug 05/08/2021 25/05/2021 29/07/2021 27/10/2021
17 greeceRhodes Greece Rhodes 17_Greece_Rhodes_SELECTION 15 May - 1 Jul 09/05/2021 25/03/2021 24/05/2021 22/08/2021
18 cyprusLarnaca Cyprus Larnaca 18_Cyprus_Larnaca_SELECTION 15 May - 1 Jul 05/06/2021 19/03/2021 07/06/2021 21/08/2021
19 turkeyCyprus Cyprus (T) Farmagusta 19_Turkey_Cyprus_SELECTION 15 May - 1 Jul 05/06/2021 29/03/2021 17/06/2021 26/08/2021
20 egyptBehera Egypt Behera 20_Egypt_Behera_SELECTION 1 Apr - 1 Jul 06/03/2021 26/01/2021 07/03/2021 19/08/2021

The following data is provided:

  • Triangulated_data.zip: contains per region and per category a geopackage (gpkg) file containing triangular polygons with the number of clicks per polygon. The filename of the polygon files depends on the location and category. For example, a file that contains the triangles corresponding to Cattle in Alentejo, Portugal, is called: 01_Portugal_Alentejo_Cattle.gpkg
  • Data.zip: all data necessary to run the Jupyter notebooks, i.e., location data, cropped Sentinel-2 satellite imagery (for training location IDs 01, 02, 12 and 15, and validation locations near IDs 02 and 15) and also the triangulated polygons.
  • Models.zip: pre-trained random forest and U-Net models based on the data, which can be generated by the Jupyter notebooks.
     

Files

data.zip

Files (2.3 GB)

Name Size Download all
md5:a41f2f90acc4c620d4218c20dcec58f5
95.1 MB Preview Download
md5:6fe308c8a83f58519a02aeac38f17b09
2.2 GB Preview Download
md5:8a0a8b9355b99963a63e5d57fca46184
7.3 MB Preview Download