Published December 12, 2025 | Version v3
Dataset Open

Pl@ntNet-CrowdSWE-v2: Pl@ntNet collaborative learning with South-Western-Europe dataset

  • 1. ROR icon Institut Montpelliérain Alexander Grothendieck
  • 2. EDMO icon National Institute for Research in Computer and Control Sciences
  • 3. ROR icon Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier
  • 4. ROR icon Université de Montpellier
  • 5. ROR icon Centre National de la Recherche Scientifique
  • 6. ROR icon Centre de Coopération Internationale en Recherche Agronomique pour le Développement
  • 7. EDMO icon French Agricultural Research Centre for International Development
  • 8. ROR icon UMR Botanique et Modélisation de l'Architecture des Plantes et des végétations
  • 9. ROR icon IROKO: Sciences environnementales guidées par les données
  • 10. ROR icon Institut Universitaire de France

Description

This repository contains the Pl@ntNet South Western Europe (SWE) crowdsourced dataset (V2), including species identification and user votes for observations made between 2017 and 2023 in the SWE flora.

In total, the dataset contains 5,561,512 plant observations labeled by 765,981 users between January 2017 and October 2023. The users have proposed 9,132 species, while the AI system has provided (possibly low) probabilities covering 57,660 species in total. In addition, 98 experts were selected to obtain ground truth values for 21,656 observations.

 

Statistic Value
Total observations 5,561,512
Total users 765,981
Total species (mentioned by AI or humans) 57,660
Human proposed species 9,132
Expert-validated observations 21,656

 

The main difference with the current version Pl@ntNet-CrowdSWE-v2 and the original Pl@ntNet-CrowdSWE dataset is that mutli-image observations were removed.

Directory Structure

Pl@ntNet-CrowdSWE-v2/
├── votes/
│   ├── ai_votes.json
    ├── ground_truth.json
│   ├── human_votes.json
│   └── PN_valid_votes.json
├── ai_scores/
│   ├── ai_scores.json
│   └── ai_scores_all.json
└── converters/
    ├── all_valid_id.json
    ├── authors.json
    ├── reverse_unified_classes.json
    └── unified_classes.json

votes

The votes folder contains several types of votes: each task (identified by obsID) correspond to a plant picture for which a species is provided (identified by a class label from 0 to 57,659). The three kind of votes are as follows:

  • human_votes.json : The crowdsourced votes in this file includes over 5 million tasks with votes from 765,881 users. The data is structured as follows:
{
  "obsID": {
    "userID1": "vote",
    "userID2": "vote",
    ...
  },
  ...
}
  • ground_truth.json: A partial ground truth created by 98 experts. Each obsID is associated with a class label if an expert voted for a species, or -1 otherwise.
  • ai_votes.json: AI-generated votes (as of January 2025), where each key is also an obsID and the value is the predicted class.
  • PN_valid_votes.json: the validated human labels obtained from the Pl@ntNet label aggregation strategy (extracted in August 2025). They are aggregated human labels, and consolidated using an iterative algorithm. To run the Pl@ntNet label aggregation strategy (available in the peerannot library), use the files in the aggregation folder.

ai_Scores

  • ai_scores_all.json: Softmax scores from the AI model (threshold: 0.001).
  • ai_scores.json: Top-1 softmax scores from the AI model. This is the softmax score associated to the votes in ai_votes.json.

converters

The converters folder provides essential files for data processing:

  • all_valid_id.json: Contains valid observation IDs (the last part of the URL: https://identify.plantnet.org/fr/k-world-flora/observations/<id>).
  • authors.json: Identifies the author of each task (obsID). If the author did not propose a species, the value is set to -1.
  • unified_classes.json: Maps species names to unified class labels (e.g., {"Quercus ilex L.": "1234", "Pinus halepensis Mill.": "5678", ...}). This dictionary converts botanical names to numeric identifiers from 0 to 57,659.
  • reverse_unified_classes.json: The inverse mapping that converts class labels back to species names (e.g., {"1234": "Quercus ilex L.", "5678": "Pinus halepensis Mill.", ...}). Use this to translate numeric predictions into readable species names.

To run the Pl@ntNet label aggregation strategy

To run the Pl@ntNet label aggregation strategy described in the associated journal paper (https://doi.org/10.1111/2041-210X.14486) and available in the peerannot library, several other pieces of information are needed.

  • We need to know for each task which user was the author (if they proposed an initial species determination). This information is stored in the authors.txt file, where each row is the obsId and the value is the userID of the author. If the author did not propose any species, this identification is set to -1.
  • To run the label aggregation strategies taking into account the AI vote, use the ai_votes.json. Each species is associated with a number, including newly introduced species by the AI.
  • Finally, for strategies taking into account the prediction score, we release the ai_scores.json file, where each key is the obsID and each value is the probability given for the predicted class (i.e., the op-1 answer). For a more exhaustive score outputs, consider the ai_scores_all.json file.

Files

Pl@ntNet-CrowdSWE-v2.zip

Files (677.6 MB)

Name Size Download all
md5:b517abd3263a9352ee5579425945fd91
677.6 MB Preview Download

Additional details

Related works

Is described by
Journal article: 10.1111/2041-210X.14486 (DOI)

Funding

Agence Nationale de la Recherche
Pl@ntAgroEco 22-PEAE0009
Agence Nationale de la Recherche
IA CaMeLOt ANR-20-CHIA-0001-01
Grand Équipement National de Calcul Intensif (France)
A0151011389
Centre de Coopération Internationale en Recherche Agronomique pour le Développement
GUARDEN 101060693
European Union
MAMBO (Horizon EU) 101060639

Dates

Updated
2025-11-24

Software

Repository URL
https://peerannot.github.io/
Programming language
Python