Pl@ntNet-CrowdSWE-v2: Pl@ntNet collaborative learning with South-Western-Europe dataset

Lefort, Tanguy; AFFOUARD, Antoine; Charlier, Benjamin; Lombardo, Jean-Christophe; Chouet, Mathias; BOTELLA, Christophe; Goëau, Hervé; Salmon, Joseph; BONNET, Pierre-Antoine; joly, alexis

doi:10.5281/zenodo.17913995

Published December 12, 2025 | Version v3

Dataset Open

Pl@ntNet-CrowdSWE-v2: Pl@ntNet collaborative learning with South-Western-Europe dataset

1. Institut Montpelliérain Alexander Grothendieck
2. National Institute for Research in Computer and Control Sciences
3. Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier
4. Université de Montpellier
5. Centre National de la Recherche Scientifique
6. Centre de Coopération Internationale en Recherche Agronomique pour le Développement
7. French Agricultural Research Centre for International Development
8. UMR Botanique et Modélisation de l'Architecture des Plantes et des végétations
9. IROKO: Sciences environnementales guidées par les données
10. Institut Universitaire de France

This repository contains the Pl@ntNet South Western Europe (SWE) crowdsourced dataset (V2), including species identification and user votes for observations made between 2017 and 2023 in the SWE flora.

In total, the dataset contains 5,561,512 plant observations labeled by 765,981 users between January 2017 and October 2023. The users have proposed 9,132 species, while the AI system has provided (possibly low) probabilities covering 57,660 species in total. In addition, 98 experts were selected to obtain ground truth values for 21,656 observations.

Statistic	Value
Total observations	5,561,512
Total users	765,981
Total species (mentioned by AI or humans)	57,660
Human proposed species	9,132
Expert-validated observations	21,656

The main difference with the current version Pl@ntNet-CrowdSWE-v2 and the original Pl@ntNet-CrowdSWE dataset is that mutli-image observations were removed.

Directory Structure

Pl@ntNet-CrowdSWE-v2/
├── votes/
│   ├── ai_votes.json
    ├── ground_truth.json
│   ├── human_votes.json
│   └── PN_valid_votes.json
├── ai_scores/
│   ├── ai_scores.json
│   └── ai_scores_all.json
└── converters/
    ├── all_valid_id.json
    ├── authors.json
    ├── reverse_unified_classes.json
    └── unified_classes.json

`votes`

The votes folder contains several types of votes: each task (identified by obsID) correspond to a plant picture for which a species is provided (identified by a class label from 0 to 57,659). The three kind of votes are as follows:

human_votes.json : The crowdsourced votes in this file includes over 5 million tasks with votes from 765,881 users. The data is structured as follows:

{
  "obsID": {
    "userID1": "vote",
    "userID2": "vote",
    ...
  },
  ...
}

ground_truth.json: A partial ground truth created by 98 experts. Each obsID is associated with a class label if an expert voted for a species, or -1 otherwise.
ai_votes.json: AI-generated votes (as of January 2025), where each key is also an obsID and the value is the predicted class.
PN_valid_votes.json: the validated human labels obtained from the Pl@ntNet label aggregation strategy (extracted in August 2025). They are aggregated human labels, and consolidated using an iterative algorithm. To run the Pl@ntNet label aggregation strategy (available in the peerannot library), use the files in the aggregation folder.

`ai_Scores`

ai_scores_all.json: Softmax scores from the AI model (threshold: 0.001).
ai_scores.json: Top-1 softmax scores from the AI model. This is the softmax score associated to the votes in ai_votes.json.

`converters`

The converters folder provides essential files for data processing:

all_valid_id.json: Contains valid observation IDs (the last part of the URL: https://identify.plantnet.org/fr/k-world-flora/observations/<id>).
authors.json: Identifies the author of each task (obsID). If the author did not propose a species, the value is set to -1.
unified_classes.json: Maps species names to unified class labels (e.g., {"Quercus ilex L.": "1234", "Pinus halepensis Mill.": "5678", ...}). This dictionary converts botanical names to numeric identifiers from 0 to 57,659.
reverse_unified_classes.json: The inverse mapping that converts class labels back to species names (e.g., {"1234": "Quercus ilex L.", "5678": "Pinus halepensis Mill.", ...}). Use this to translate numeric predictions into readable species names.

To run the Pl@ntNet label aggregation strategy

To run the Pl@ntNet label aggregation strategy described in the associated journal paper (https://doi.org/10.1111/2041-210X.14486) and available in the peerannot library, several other pieces of information are needed.

We need to know for each task which user was the author (if they proposed an initial species determination). This information is stored in the authors.txt file, where each row is the obsId and the value is the userID of the author. If the author did not propose any species, this identification is set to -1.
To run the label aggregation strategies taking into account the AI vote, use the ai_votes.json. Each species is associated with a number, including newly introduced species by the AI.
Finally, for strategies taking into account the prediction score, we release the ai_scores.json file, where each key is the obsID and each value is the probability given for the predicted class (i.e., the op-1 answer). For a more exhaustive score outputs, consider the ai_scores_all.json file.

Files

Pl@ntNet-CrowdSWE-v2.zip

Files (677.6 MB)

Name	Size	Download all
Pl@ntNet-CrowdSWE-v2.zip md5:b517abd3263a9352ee5579425945fd91	677.6 MB	Preview Download

Additional details

Is described by: Journal article: 10.1111/2041-210X.14486 (DOI)

Agence Nationale de la Recherche
Pl@ntAgroEco 22-PEAE0009
Agence Nationale de la Recherche
IA CaMeLOt ANR-20-CHIA-0001-01
Grand Équipement National de Calcul Intensif (France)
A0151011389
Centre de Coopération Internationale en Recherche Agronomique pour le Développement
GUARDEN 101060693
European Union
MAMBO (Horizon EU) 101060639

Updated: 2025-11-24

Repository URL: https://peerannot.github.io/
Programming language: Python

	All versions	This version
Views	943	179
Downloads	309	42
Data volume	70.2 GB	29.8 GB

Directory Structure

`votes`

`ai_Scores`

`converters`

To run the Pl@ntNet label aggregation strategy

Pl@ntNet-CrowdSWE-v2.zip

Files (677.6 MB)

Related works

Funding

Dates

Software

Pl@ntNet-CrowdSWE-v2: Pl@ntNet collaborative learning with South-Western-Europe dataset

Authors/Creators

Description

Directory Structure

votes

ai_Scores

converters

To run the Pl@ntNet label aggregation strategy

Files

Pl@ntNet-CrowdSWE-v2.zip

Files (677.6 MB)

Additional details

Related works

Funding

Dates

Software

`votes`

`ai_Scores`

`converters`