Pl@ntNet-CrowdSWE: Pl@ntNet collaborative learning with South-Western-Europe dataset
Creators
- 1. Institut Montpelliérain Alexander Grothendieck
- 2. National Institute for Research in Computer and Control Sciences
- 3. Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier
- 4. Université de Montpellier
- 5. Centre National de la Recherche Scientifique
- 6. Centre de Coopération Internationale en Recherche Agronomique pour le Développement
- 7. French Agricultural Research Centre for International Development
- 8. UMR Botanique et Modélisation de l'Architecture des Plantes et des végétations
- 9. Institut Universitaire de France
Description
Pl@ntNet-CrowdSWE: Pl@ntNet collaborative learning with South-Western-Europe dataset
This repository contains the files for the Pl@ntNet South Western Europe (SWE) crowdsourced dataset.
It contains all species identification and user votes for observations made between 2017 and 2023 in the SWE flora.
In total, more than 6 699 593 plant observations are labeld by 823 251 users between january 2017 and october 2023. In addition, 98 experts were selected to obtain ground truth values for 26 811 observations.
The structure of the dataset is described below, and a `readme.md` file is available in the record.
In short directory structure
Pl@ntNet SWE dataset
├── answers
│ ├── answers.json
│ └── ground_truth.txt
├── converters
│ ├── tasks.json
│ └── classes.json
└── aggregation
├── authors.txt
├── ai_classes.json
├── ai_answers.json
├── ai_scores.json
└── k-southwestern-europe.json
Crowdsourced data
In the answers
folder are located the crowdsourced answers and the associated ground truths.
The crowdsourced answers are stored in the answers.json
file. It gathers more than 6 million tasks with answers from 823 251 users. It is formatted as a json entry with levels representing the observation ID, the users, and their associated vote for the species label.
{
obsID: {userID: vote, userID2: vote,...},
...
}
A list of 98 experts was created to gather a partial ground truth in the ground_truth.txt
file.
Each row represents an observation and the associated class label is the current considered ground truth.
This file lets us compute several performance metrics such as the accuracy of the label aggregation.
Converters
In the converters
folder, you can find the converters to obtain the Pl@ntNet official observation numbers (the last part of the URL https://identify.plantnet.org/fr/k-world-flora/observations/<id>
) from the obsID used in answers.json
. This is stored in the tasks.json
file.
A similar dictionary converts the species proposed by users to a single label in {0, 1, 2, ...}.
This mapping is stored in classes.json
.
As plant species can also have synonyms, we release the two files used to clean the user answers. The species.json
file contains a list with all the accepted species determinations from the World Checklist of Vascular Plants.
Then, we focused on the SWE flora and replaced synonyms with the underlying species using the k-southwestern-europe.json
checklist by Plants Of the World Online (POWO) by Kew’s Royal Botanical Garden. This checklist is written as follows:
[
{
"species": species name,
"synonyms": [
synonym1,
synonym2,
...
]
},
...
]
Files to run the Pl@ntNet label aggregation strategy
To run the Pl@ntNet label aggregation strategy available in the peerannot library, several other pieces of information are needed and located in the aggregation
folder.
- First, we need to know for each task which user was the author (if they proposed an initial species determination).
This information is stored in the authors.txt
dataset, where each row is the obsID and the value is the userID of the author. If the author did not propose any species, this identification is set to -1.
- Then, to run the label aggregation strategies taking into account the AI vote, we extend the `classes.json` file with the AI-predicted classes into the ai_classes.json
file. Each species is associated with a number, including newly introduced species by the AI.
- Then, we need the AI predictions. The AI answers are stored in the ai_answers.json
file where each key is the obsID and each value represents the class predicted by the AI. Synonyms were also removed using the k-southwestern-europe.json
file.
- Finally, for strategies taking into account the prediction score, we release the ai_scores.json
file, where each key is the obsID and each value is the probability given for the predicted class.
Files
plantnet_swe.zip
Files
(183.3 MB)
Name | Size | Download all |
---|---|---|
md5:037a6abccd51aa7cd9018e280d3bbdd5
|
183.3 MB | Preview Download |
md5:330bb61c4ba2b45eb0aa30b6ea43335d
|
3.8 kB | Preview Download |
Additional details
Funding
- Pl@ntAgroEco 22-PEAE0009
- Agence Nationale de la Recherche
- IA CaMeLOt ANR-20-CHIA-0001-01
- Agence Nationale de la Recherche
- Grand Équipement National de Calcul Intensif (France)
- GUARDEN 101060693
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement
- MAMBO (Horizon EU) 101060639
- European Union