Crowds & Machines Next level: Meditteranean wheat classification labels from gamified crowd-sourcing
- 1. 52impact B.V.
- 2. Blackshore B.V.
Description
Machine learning (and especially deep learning) algorithms need lots of training and validation datasets, which are often unavailable. Creating on-ground datasets is costly and time consuming. Within the European Space Agency funded project ‘Crowds & Machine – Next Level’ (by Blackshore B.V., 52impact B.V. and The Hague Centre for Strategic Studies), we aimed to solve this issue by generating labelled data effectively using an innovative gamified crowdsourced-based method.
The objective of the project ‘Crowds & Machines Next Level’ was to generate labelled data for the training and validation of machine learning algorithms to classify the crop wheat. We make those labelled datasets freely available as open data to organisations that use machine learning for their activities, mainly companies and knowledge institutes. As part of the project we developed example scripts (Jupyter notebooks) that enable organisations to use the crowdsourced generated data smoothly for their own machine learning systems.
BlackShore has developed the online platform Cerberus to enable large scale generation of labelled datasets, which is deployed on twenty locations around the Mediterranean Sea to generate labelled datasets of wheat and other land cover classes (see table). Those different locations encompass a diversity of climate regions, harvest cultures and crop calendars, posing a challenge to the training of machine learning algorithms. Gamers click on hexagons plotted on top of very high resolution satellite imagery (captured during the harvest period in 2021), and by combining 3 different hexagon grids those clicks are converted into triangles. Each triangle has a number of clicks (by different users) per land cover category, which provides a measure of accuracy to the label.
52impact developed example tutorials to use the data to train pixel-based (Random Forest) and segmentation-based (U-Net) machine learning models, using Sentinel-2 imagery (provided in the data folder), which can be forked here: https://bitbucket.org/52impact/crowds-machines.
ID | location_id | Country | Region | Shape | Harvest period | VHR image date | S-2 pre-harvest | S-2 harvest | S-2 post-harvest |
---|---|---|---|---|---|---|---|---|---|
01 | portugalAlentejo | Portugal | Alentejo | 01_Portugal_Alentejo_SELECTION | 10 Jul - 1 Aug | 07/07/2021 | 14/05/2021 | 13/07/2021 | 22/08/2022 |
02 | spainAndalusia | Spain | Andalusia | 02_Spain_Andalusia_SELECTION | 10 Jul - 1 Aug | 02/07/2021 | 16/05/2021 | 15/07/2021 | 03/09/2021 |
03 | spainAragon | Spain | Aragon | 03_Spain_Aragon_SELECTION | 10 Jul - 1 Aug | 26/10/2021 | 20/05/2021 | 19/07/2021 | 05/09/2021 |
04 | franceAude | France | Aude | 04_France_Aude_SELECTION | 1 Jul - 1 Oct | 22/09/2021 | 12/05/2021 | 10/08/2021 | 18/11/2021 |
05 | franceCamargue | France | Camargue | 05_France_Camargue_SELECTION | 1 Jul - 1 Oct | 07/10/2021 | 12/05/2021 | 10/08/2021 | 18/11/2021 |
06 | franceProvence | France | Provence | 06_France_Provence_SELECTION | 1 Jul - 1 Oct | 26/10/2021 | 19/05/2021 | 17/08/2021 | 20/11/2021 |
07_08 | italyMarche | Italy | Marche (East and West) | 07_08_Italy_Marche_SELECTION | 1 Jul - 1 Sept | 09/08/2021 | 26/05/2021 | 25/07/2021 | 20/11/2021 |
09 | italySardinia | Italy | Sardinia | 09_Italy_Sardinia_SELECTION | 1 Jul - 1 Sept | 31/08/2021 | 26/05/2021 | 22/07/2021 | 10/10/2021 |
10 | italySicily | Italy | Sicily | 10_Italy_Sicily_SELECTION | 1 Jul - 1 Sept | 19/09/2021 | 22/05/2021 | 26/07/2021 | 10/10/2021 |
11 | italyPugliaNorth | Italy | Puglia (North) | 11_Italy_PugliaNorth_SELECTION | 1 Jul - 1 Sept | 06/10/2021 | 11/06/2021 | 31/07/2021 | 04/10/2021 |
12 | italyPuglia | Italy | Puglia | 12_Italy_Puglia_SELECTION | 1 Jul - 1 Sept | 19/08/2021 | 03/06/2021 | 02/08/2021 | 21/10/2021 |
13 | greeceWest | Greece | West | 13_Greece_West_SELECTION | 1 Sept - 1 Nov | 02/09/2021 | 27/07/2021 | 05/10/2021 | 14/12/2021 |
14 | greeceThessaly | Greece | Thessaly | 14_Greece_Thessaly_SELECTION | 1 Sept - 1 Nov | 14/07/2021 | 27/07/2021 | 25/09/2021 | 19/12/2021 |
15 | greeceMacedoniaCentral | Greece | Macedonia (Central) | 15_Greece_MacedoniaCentral_SELECTION | 1 Jun - 1 Aug | 22/07/2021 | 13/05/2021 | 22/07/2021 | 15/09/2021 |
16 | greeceMacedoniaEast | Greece | Macedonia (East) | 16_Greece_MacedoniaEast_SELECTION | 1 Jun - 1 Aug | 05/08/2021 | 25/05/2021 | 29/07/2021 | 27/10/2021 |
17 | greeceRhodes | Greece | Rhodes | 17_Greece_Rhodes_SELECTION | 15 May - 1 Jul | 09/05/2021 | 25/03/2021 | 24/05/2021 | 22/08/2021 |
18 | cyprusLarnaca | Cyprus | Larnaca | 18_Cyprus_Larnaca_SELECTION | 15 May - 1 Jul | 05/06/2021 | 19/03/2021 | 07/06/2021 | 21/08/2021 |
19 | turkeyCyprus | Cyprus (T) | Farmagusta | 19_Turkey_Cyprus_SELECTION | 15 May - 1 Jul | 05/06/2021 | 29/03/2021 | 17/06/2021 | 26/08/2021 |
20 | egyptBehera | Egypt | Behera | 20_Egypt_Behera_SELECTION | 1 Apr - 1 Jul | 06/03/2021 | 26/01/2021 | 07/03/2021 | 19/08/2021 |
The following data is provided:
- Triangulated_data.zip: contains per region and per category a geopackage (gpkg) file containing triangular polygons with the number of clicks per polygon. The filename of the polygon files depends on the location and category. For example, a file that contains the triangles corresponding to Cattle in Alentejo, Portugal, is called: 01_Portugal_Alentejo_Cattle.gpkg
- Data.zip: all data necessary to run the Jupyter notebooks, i.e., location data, cropped Sentinel-2 satellite imagery (for training location IDs 01, 02, 12 and 15, and validation locations near IDs 02 and 15) and also the triangulated polygons.
- Models.zip: pre-trained random forest and U-Net models based on the data, which can be generated by the Jupyter notebooks.