Pl@ntNet-CrowdSWE-v2: Pl@ntNet collaborative learning with South-Western-Europe dataset
Authors/Creators
-
Lefort, Tanguy
(Data curator)1, 2, 3, 4, 5
-
AFFOUARD, Antoine
(Data curator)2, 6
-
Charlier, Benjamin
(Project member)4, 1, 5
-
Lombardo, Jean-Christophe
(Data curator)2, 3
-
Chouet, Mathias
(Data curator)7, 8
-
BOTELLA, Christophe
(Data curator)2, 9
-
Goëau, Hervé
(Project member)7, 8
-
Salmon, Joseph
(Project manager)4, 1, 5, 10
-
BONNET, Pierre-Antoine
(Project manager)7, 8
-
joly, alexis
(Project manager)2, 3
-
1.
Institut Montpelliérain Alexander Grothendieck
-
2.
National Institute for Research in Computer and Control Sciences
-
3.
Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier
-
4.
Université de Montpellier
-
5.
Centre National de la Recherche Scientifique
-
6.
Centre de Coopération Internationale en Recherche Agronomique pour le Développement
-
7.
French Agricultural Research Centre for International Development
-
8.
UMR Botanique et Modélisation de l'Architecture des Plantes et des végétations
-
9.
IROKO: Sciences environnementales guidées par les données
-
10.
Institut Universitaire de France
Description
This repository contains the Pl@ntNet South Western Europe (SWE) crowdsourced dataset (V2), including species identification and user votes for observations made between 2017 and 2023 in the SWE flora.
In total, the dataset contains 5,561,512 plant observations labeled by 765,981 users between January 2017 and October 2023. The users have proposed 9,132 species, while the AI system has provided (possibly low) probabilities covering 57,660 species in total. In addition, 98 experts were selected to obtain ground truth values for 21,656 observations.
| Statistic | Value |
|---|---|
| Total observations | 5,561,512 |
| Total users | 765,981 |
| Total species (mentioned by AI or humans) | 57,660 |
| Human proposed species | 9,132 |
| Expert-validated observations | 21,656 |
The main difference with the current version Pl@ntNet-CrowdSWE-v2 and the original Pl@ntNet-CrowdSWE dataset is that mutli-image observations were removed.
Directory Structure
Pl@ntNet-CrowdSWE-v2/
├── votes/
│ ├── ai_votes.json
├── ground_truth.json
│ ├── human_votes.json
│ └── PN_valid_votes.json
├── ai_scores/
│ ├── ai_scores.json
│ └── ai_scores_all.json
└── converters/
├── all_valid_id.json
├── authors.json
├── reverse_unified_classes.json
└── unified_classes.json
votes
The votes folder contains several types of votes: each task (identified by obsID) correspond to a plant picture for which a species is provided (identified by a class label from 0 to 57,659). The three kind of votes are as follows:
human_votes.json: The crowdsourced votes in this file includes over 5 million tasks with votes from 765,881 users. The data is structured as follows:
{
"obsID": {
"userID1": "vote",
"userID2": "vote",
...
},
...
}
ground_truth.json: A partial ground truth created by 98 experts. EachobsIDis associated with a class label if an expert voted for a species, or-1otherwise.ai_votes.json: AI-generated votes (as of January 2025), where each key is also anobsIDand the value is the predicted class.PN_valid_votes.json: the validated human labels obtained from the Pl@ntNet label aggregation strategy (extracted in August 2025). They are aggregated human labels, and consolidated using an iterative algorithm. To run the Pl@ntNet label aggregation strategy (available in the peerannot library), use the files in theaggregationfolder.
ai_Scores
ai_scores_all.json: Softmax scores from the AI model (threshold:0.001).ai_scores.json: Top-1 softmax scores from the AI model. This is the softmax score associated to the votes inai_votes.json.
converters
The converters folder provides essential files for data processing:
all_valid_id.json: Contains valid observation IDs (the last part of the URL:https://identify.plantnet.org/fr/k-world-flora/observations/<id>).authors.json: Identifies the author of each task (obsID). If the author did not propose a species, the value is set to-1.unified_classes.json: Maps species names to unified class labels (e.g.,{"Quercus ilex L.": "1234", "Pinus halepensis Mill.": "5678", ...}). This dictionary converts botanical names to numeric identifiers from 0 to 57,659.reverse_unified_classes.json: The inverse mapping that converts class labels back to species names (e.g.,{"1234": "Quercus ilex L.", "5678": "Pinus halepensis Mill.", ...}). Use this to translate numeric predictions into readable species names.
To run the Pl@ntNet label aggregation strategy
To run the Pl@ntNet label aggregation strategy described in the associated journal paper (https://doi.org/10.1111/2041-210X.14486) and available in the peerannot library, several other pieces of information are needed.
- We need to know for each task which user was the author (if they proposed an initial species determination). This information is stored in the
authors.txtfile, where each row is theobsIdand the value is theuserIDof the author. If the author did not propose any species, this identification is set to-1. - To run the label aggregation strategies taking into account the AI vote, use the
ai_votes.json. Each species is associated with a number, including newly introduced species by the AI. - Finally, for strategies taking into account the prediction score, we release the
ai_scores.jsonfile, where each key is theobsIDand each value is the probability given for the predicted class (i.e., the op-1 answer). For a more exhaustive score outputs, consider theai_scores_all.jsonfile.
Files
Pl@ntNet-CrowdSWE-v2.zip
Files
(677.6 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:b517abd3263a9352ee5579425945fd91
|
677.6 MB | Preview Download |
Additional details
Related works
- Is described by
- Journal article: 10.1111/2041-210X.14486 (DOI)
Funding
- Agence Nationale de la Recherche
- Pl@ntAgroEco 22-PEAE0009
- Agence Nationale de la Recherche
- IA CaMeLOt ANR-20-CHIA-0001-01
- Grand Équipement National de Calcul Intensif (France)
- A0151011389
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement
- GUARDEN 101060693
- European Union
- MAMBO (Horizon EU) 101060639
Dates
- Updated
-
2025-11-24
Software
- Repository URL
- https://peerannot.github.io/
- Programming language
- Python