Pl@ntNet-CrowdSWE: Pl@ntNet collaborative learning with South-Western-Europe dataset

Lefort, Tanguy; AFFOUARD, Antoine; Chartier, Benjamin; Lombardo, Jean-Christophe; Chouet, Mathias; Goëau, Hervé; Salmon, Joseph; BONNET, Pierre-Antoine; joly, alexis

doi:10.5281/zenodo.10782465

Published March 5, 2024 | Version v1

Dataset Open

Pl@ntNet-CrowdSWE: Pl@ntNet collaborative learning with South-Western-Europe dataset

1. Institut Montpelliérain Alexander Grothendieck
2. National Institute for Research in Computer and Control Sciences
3. Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier
4. Université de Montpellier
5. Centre National de la Recherche Scientifique
6. Centre de Coopération Internationale en Recherche Agronomique pour le Développement
7. French Agricultural Research Centre for International Development
8. UMR Botanique et Modélisation de l'Architecture des Plantes et des végétations
9. Institut Universitaire de France

Pl@ntNet-CrowdSWE: Pl@ntNet collaborative learning with South-Western-Europe dataset

This repository contains the files for the Pl@ntNet South Western Europe (SWE) crowdsourced dataset.
It contains all species identification and user votes for observations made between 2017 and 2023 in the SWE flora.

In total, more than 6 699 593 plant observations are labeld by 823 251 users between january 2017 and october 2023. In addition, 98 experts were selected to obtain ground truth values for 26 811 observations.

The structure of the dataset is described below, and a `readme.md` file is available in the record.

In short directory structure

Pl@ntNet SWE dataset
├── answers
│   ├── answers.json
│   └── ground_truth.txt
├── converters
│   ├── tasks.json
│   └── classes.json
└── aggregation
    ├── authors.txt
    ├── ai_classes.json
    ├── ai_answers.json
    ├── ai_scores.json
    └── k-southwestern-europe.json

Crowdsourced data

In the answers folder are located the crowdsourced answers and the associated ground truths.
The crowdsourced answers are stored in the answers.json file. It gathers more than 6 million tasks with answers from 823 251 users. It is formatted as a json entry with levels representing the observation ID, the users, and their associated vote for the species label.

{
obsID: {userID: vote, userID2: vote,...},
...
}

A list of 98 experts was created to gather a partial ground truth in the ground_truth.txt file.
Each row represents an observation and the associated class label is the current considered ground truth.
This file lets us compute several performance metrics such as the accuracy of the label aggregation.

Converters

In the converters folder, you can find the converters to obtain the Pl@ntNet official observation numbers (the last part of the URL https://identify.plantnet.org/fr/k-world-flora/observations/<id>) from the obsID used in answers.json. This is stored in the tasks.json file.
A similar dictionary converts the species proposed by users to a single label in {0, 1, 2, ...}.
This mapping is stored in classes.json.

As plant species can also have synonyms, we release the two files used to clean the user answers. The species.json file contains a list with all the accepted species determinations from the World Checklist of Vascular Plants.
Then, we focused on the SWE flora and replaced synonyms with the underlying species using the k-southwestern-europe.json checklist by Plants Of the World Online (POWO) by Kew’s Royal Botanical Garden. This checklist is written as follows:

[
   {
    "species": species name,
    "synonyms": [
        synonym1,
        synonym2,
        ...
        ]
   },
   ...
]

Files to run the Pl@ntNet label aggregation strategy

To run the Pl@ntNet label aggregation strategy available in the peerannot library, several other pieces of information are needed and located in the aggregation folder.

- First, we need to know for each task which user was the author (if they proposed an initial species determination).
This information is stored in the authors.txt dataset, where each row is the obsID and the value is the userID of the author. If the author did not propose any species, this identification is set to -1.

- Then, to run the label aggregation strategies taking into account the AI vote, we extend the `classes.json` file with the AI-predicted classes into the ai_classes.json file. Each species is associated with a number, including newly introduced species by the AI.
- Then, we need the AI predictions. The AI answers are stored in the ai_answers.json file where each key is the obsID and each value represents the class predicted by the AI. Synonyms were also removed using the k-southwestern-europe.json file.
- Finally, for strategies taking into account the prediction score, we release the ai_scores.json file, where each key is the obsID and each value is the probability given for the predicted class.

Files

plantnet_swe.zip

Files (183.3 MB)

Name	Size	Download all
plantnet_swe.zip md5:037a6abccd51aa7cd9018e280d3bbdd5	183.3 MB	Preview Download
readme.md md5:330bb61c4ba2b45eb0aa30b6ea43335d	3.8 kB	Preview Download

Additional details

Pl@ntAgroEco 22-PEAE0009: Agence Nationale de la Recherche
IA CaMeLOt ANR-20-CHIA-0001-01: Agence Nationale de la Recherche; Grand Équipement National de Calcul Intensif (France)
GUARDEN 101060693: Centre de Coopération Internationale en Recherche Agronomique pour le Développement
MAMBO (Horizon EU) 101060639: European Union

	All versions	This version
Views	146	146
Downloads	21	21
Data volume	1.5 GB	1.5 GB

Pl@ntNet-CrowdSWE: Pl@ntNet collaborative learning with South-Western-Europe dataset

Creators

Description

Pl@ntNet-CrowdSWE: Pl@ntNet collaborative learning with South-Western-Europe dataset

In short directory structure

Crowdsourced data

Converters

Files to run the Pl@ntNet label aggregation strategy

Files

plantnet_swe.zip

Files (183.3 MB)

Additional details

Funding