Pl@ntNet-300K image dataset

Camille Garcin; Alexis Joly; Pierre Bonnet; Antoine Affouard; Jean-Christophe Lombardo; Mathias Chouet; Maximilien Servajean; Titouan Lorieul; Joseph Salmon

doi:10.5281/zenodo.5645731

Published April 29, 2021 | Version 1.1

Dataset Open

Pl@ntNet-300K image dataset

1. IMAG, Univ Montpellier, Inria, CNRS
2. Inria, LIRMM, Univ Montpellier, CNRS
3. CIRAD, AMAP
4. CIRAD, AMAP, Inria, LIRMM, Univ Montpellier, CNRS
5. LIRMM, AMIS, UPVM, Univ Montpellier, CNRS
6. IMAG, Univ Montpellier, CNRS

This paper presents a novel image dataset with high intrinsic ambiguity and a long-tailed distribution built from the database of Pl@ntNet citizen observatory. It consists of 306146 plant images covering 1081 species. We highlight two particular features of the dataset, inherent to the way the images are acquired and to the intrinsic diversity of plants morphology:

(i) the dataset has a strong class imbalance, i.e. a few species account for most of the images, and,

(ii) many species are visually similar, rendering identification difficult even for the expert eye.

These two characteristics make the present dataset well suited for the evaluation of set-valued classification methods and algorithms. Therefore, we recommend two set-valued evaluation metrics associated with the dataset (macro-average top-k accuracy and macro-average average-k accuracy) and we provide baseline results established by training deep neural networks using the cross-entropy loss.

A full description of the dataset as well as baseline experiments can be found in the following publication:

"Pl@ntNet-300K: a plant image dataset with high label ambiguity and a long-tailed distribution", Camille Garcin, Alexis Joly, Pierre Bonnet, Antoine Affouard, Jean-Christophe Lombardo, Mathias Chouet, Maximilien Servajean, Titouan Lorieul and Joseph Salmon, in Proc. of Thirty-fifth Conference on Neural Information Processing Systems, Datasets and Benchmarks Track, 2021.

Please cite the above reference for any publication using the dataset.

Utilities to load the data and train models with pytorch can be found here: https://github.com/plantnet/PlantNet-300K/

Files

plantnet_300K.zip

Files (31.7 GB)

Name	Size	Download all
plantnet_300K.zip md5:db27d149f2a6c304b887353c07021687	31.7 GB	Preview Download

Additional details

European Commission
COS4CLOUD – Co-designed Citizen Observatories Services for the EOS-Cloud 863463

	All versions	This version
Views	34,852	22,578
Downloads	7,550	4,173
Data volume	563.8 TB	292.9 TB

Pl@ntNet-300K image dataset

Creators

Description

Files

plantnet_300K.zip

Files (31.7 GB)

Additional details

Funding