Dataset Open Access
Camille Garcin; Alexis Joly; Pierre Bonnet; Maximilien Servajean; Joseph Salmon
Pl@ntNet-300K is an image dataset aimed at evaluating set-valued classification. It was built from the database of Pl@ntnet citizen observatory and consists of 306146 images, covering 1081 species. We highlight two particular features of the dataset, inherent to the way the images are acquired and to the intrinsic diversity of plants morphology:
i) The dataset exhibits a strong class imbalance, meaning that a few species represent most of the images.
ii) Many species are visually similar, making identification difficult even for the expert eye.
These two characteristics make the present dataset a good candidate for the evaluation of set-valued classification methods and algorithms. Therefore, we recommend two set-valued evaluation metrics associated with the dataset (top-K and average-K) and we provide the results of a baseline approach based on a resnet50 trained with a cross-entropy loss. The full description of the dataset can be found in (to be provided soon).
The scientific publication (NEURIPS 2022) describing the dataset and providing baseline results can be found here: https://openreview.net/forum?id=eLYinD0TtIt
Utilities to load the data and train models with pytorch can be found here: https://github.com/plantnet/PlantNet-300K/
|All versions||This version|
|Data volume||141.3 TB||70.6 TB|