Dataset Open Access
Camille Garcin;
Alexis Joly;
Pierre Bonnet;
Antoine Affouard;
Jean-Christophe Lombardo;
Mathias Chouet;
Maximilien Servajean;
Titouan Lorieul;
Joseph Salmon
<?xml version='1.0' encoding='utf-8'?> <resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd"> <identifier identifierType="DOI">10.5281/zenodo.5645731</identifier> <creators> <creator> <creatorName>Camille Garcin</creatorName> <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0003-4504-7040</nameIdentifier> <affiliation>IMAG, Univ Montpellier, Inria, CNRS</affiliation> </creator> <creator> <creatorName>Alexis Joly</creatorName> <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-2161-9940</nameIdentifier> <affiliation>Inria, LIRMM, Univ Montpellier, CNRS</affiliation> </creator> <creator> <creatorName>Pierre Bonnet</creatorName> <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-2828-4389</nameIdentifier> <affiliation>CIRAD, AMAP</affiliation> </creator> <creator> <creatorName>Antoine Affouard</creatorName> <affiliation>CIRAD, AMAP, Inria, LIRMM, Univ Montpellier, CNRS</affiliation> </creator> <creator> <creatorName>Jean-Christophe Lombardo</creatorName> <affiliation>Inria, LIRMM, Univ Montpellier, CNRS</affiliation> </creator> <creator> <creatorName>Mathias Chouet</creatorName> <affiliation>CIRAD, AMAP, Inria, LIRMM, Univ Montpellier, CNRS</affiliation> </creator> <creator> <creatorName>Maximilien Servajean</creatorName> <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-9426-2583</nameIdentifier> <affiliation>LIRMM, AMIS, UPVM, Univ Montpellier, CNRS</affiliation> </creator> <creator> <creatorName>Titouan Lorieul</creatorName> <affiliation>Inria, LIRMM, Univ Montpellier, CNRS</affiliation> </creator> <creator> <creatorName>Joseph Salmon</creatorName> <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-3181-0634</nameIdentifier> <affiliation>IMAG, Univ Montpellier, CNRS</affiliation> </creator> </creators> <titles> <title>Pl@ntNet-300K image dataset</title> </titles> <publisher>Zenodo</publisher> <publicationYear>2021</publicationYear> <subjects> <subject>plant</subject> <subject>images</subject> <subject>identification</subject> <subject>classification</subject> <subject>Pl@ntNet</subject> <subject>species</subject> </subjects> <dates> <date dateType="Issued">2021-04-29</date> </dates> <resourceType resourceTypeGeneral="Dataset"/> <alternateIdentifiers> <alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/5645731</alternateIdentifier> </alternateIdentifiers> <relatedIdentifiers> <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.4726652</relatedIdentifier> </relatedIdentifiers> <version>1.1</version> <rightsList> <rights rightsURI="https://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International</rights> <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights> </rightsList> <descriptions> <description descriptionType="Abstract"><p>This paper presents a novel image dataset with high intrinsic ambiguity and a long-tailed distribution built from the database of Pl@ntNet citizen observatory. It consists of 306146&nbsp;plant images covering 1081&nbsp;species. We highlight two particular features of the dataset, inherent to the way the images are acquired and to the intrinsic diversity of plants morphology:</p> <p>&nbsp; &nbsp; (i) the dataset has a strong class imbalance, i.e. a few species account for most of the images, and,</p> <p>&nbsp; &nbsp; (ii) many species are visually similar, rendering identification difficult even for the expert eye.</p> <p>&nbsp; &nbsp; These two characteristics make the present dataset well suited for the evaluation of set-valued classification methods and algorithms. Therefore, we recommend two set-valued evaluation metrics associated with the dataset (macro-average top-k accuracy&nbsp;and macro-average average-k accuracy) and we provide baseline results established by training deep neural networks using the cross-entropy loss.</p> <p>A full description of the dataset as well as baseline experiments can be found in the following&nbsp;publication:</p> <p>&quot;<a href="https://openreview.net/forum?id=eLYinD0TtIt">Pl@ntNet-300K: a plant image dataset with high label ambiguity and a long-tailed distribution</a>&quot;, Camille Garcin, Alexis Joly, Pierre Bonnet, Antoine Affouard, Jean-Christophe Lombardo, Mathias Chouet, Maximilien Servajean,&nbsp;Titouan Lorieul and Joseph Salmon, in Proc. of Thirty-fifth Conference on Neural Information Processing Systems, Datasets and Benchmarks Track, 2021.</p> <p>&nbsp;Please cite the above&nbsp;reference for any publication using the dataset.</p> <p>Utilities to load the data and train models with pytorch can be found here: <a href="https://github.com/plantnet/PlantNet-300K/">https://github.com/plantnet/PlantNet-300K/</a></p></description> </descriptions> <fundingReferences> <fundingReference> <funderName>European Commission</funderName> <funderIdentifier funderIdentifierType="Crossref Funder ID">10.13039/100010661</funderIdentifier> <awardNumber awardURI="info:eu-repo/grantAgreement/EC/Horizon 2020 Framework Programme - Research and Innovation action/863463/">863463</awardNumber> <awardTitle>Co-designed Citizen Observatories Services for the EOS-Cloud</awardTitle> </fundingReference> </fundingReferences> </resource>
All versions | This version | |
---|---|---|
Views | 3,948 | 1,309 |
Downloads | 2,024 | 934 |
Data volume | 64.1 TB | 29.6 TB |
Unique views | 3,098 | 1,147 |
Unique downloads | 1,088 | 339 |