Dataset Open Access

Pl@ntNet-300K image dataset

Camille Garcin; Alexis Joly; Pierre Bonnet; Antoine Affouard; Jean-Christophe Lombardo; Mathias Chouet; Maximilien Servajean; Titouan Lorieul; Joseph Salmon


Citation Style Language JSON Export

{
  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.5645731", 
  "title": "Pl@ntNet-300K image dataset", 
  "issued": {
    "date-parts": [
      [
        2021, 
        4, 
        29
      ]
    ]
  }, 
  "abstract": "<p>This paper presents a novel image dataset with high intrinsic ambiguity and a long-tailed distribution built from the database of Pl@ntNet citizen observatory. It consists of 306146&nbsp;plant images covering 1081&nbsp;species. We highlight two particular features of the dataset, inherent to the way the images are acquired and to the intrinsic diversity of plants morphology:</p>\n\n<p>&nbsp; &nbsp; (i) the dataset has a strong class imbalance, i.e. a few species account for most of the images, and,</p>\n\n<p>&nbsp; &nbsp; (ii) many species are visually similar, rendering identification difficult even for the expert eye.</p>\n\n<p>&nbsp; &nbsp; These two characteristics make the present dataset well suited for the evaluation of set-valued classification methods and algorithms. Therefore, we recommend two set-valued evaluation metrics associated with the dataset (macro-average top-k accuracy&nbsp;and macro-average average-k accuracy) and we provide baseline results established by training deep neural networks using the cross-entropy loss.</p>\n\n<p>A full description of the dataset as well as baseline experiments can be found in the following&nbsp;publication:</p>\n\n<p>&quot;<a href=\"https://openreview.net/forum?id=eLYinD0TtIt\">Pl@ntNet-300K: a plant image dataset with high label ambiguity and a long-tailed distribution</a>&quot;, Camille Garcin, Alexis Joly, Pierre Bonnet, Antoine Affouard, Jean-Christophe Lombardo, Mathias Chouet, Maximilien Servajean,&nbsp;Titouan Lorieul and Joseph Salmon, in Proc. of Thirty-fifth Conference on Neural Information Processing Systems, Datasets and Benchmarks Track, 2021.</p>\n\n<p>&nbsp;Please cite the above&nbsp;reference for any publication using the dataset.</p>\n\n<p>Utilities to load the data and train models with pytorch can be found here: <a href=\"https://github.com/plantnet/PlantNet-300K/\">https://github.com/plantnet/PlantNet-300K/</a></p>", 
  "author": [
    {
      "family": "Camille Garcin"
    }, 
    {
      "family": "Alexis Joly"
    }, 
    {
      "family": "Pierre Bonnet"
    }, 
    {
      "family": "Antoine Affouard"
    }, 
    {
      "family": "Jean-Christophe Lombardo"
    }, 
    {
      "family": "Mathias Chouet"
    }, 
    {
      "family": "Maximilien Servajean"
    }, 
    {
      "family": "Titouan Lorieul"
    }, 
    {
      "family": "Joseph Salmon"
    }
  ], 
  "version": "1.1", 
  "type": "dataset", 
  "id": "5645731"
}
3,948
2,024
views
downloads
All versions This version
Views 3,9481,309
Downloads 2,024934
Data volume 64.1 TB29.6 TB
Unique views 3,0981,147
Unique downloads 1,088339

Share

Cite as