Published October 26, 2020 | Version v1
Dataset Open

Data from: Integration and harmonization of trait data from plant individuals across heterogeneous sources

  • 1. University of Amsterdam
  • 2. New York Botanical Garden
  • 3. Universidade do Extremo Sul Catarinense
  • 4. Université de Yaoundé I
  • 5. Institut de Recherche pour le Développement

Description

Trait data represent the basis for ecological and evolutionary research and have relevance for biodiversity conservation, ecosystem management and earth system modelling. The collection and mobilization of trait data has strongly increased over the last decade, but many trait databases still provide only species-level, aggregated trait values (e.g. ranges, means) and lack the direct observations on which those data are based. Thus, the vast majority of trait data measured directly from individuals remains hidden and highly heterogeneous, impeding their discoverability, semantic interoperability, digital accessibility and (re-)use. Here, we integrate quantitative measurements of verbatim trait information from plant individuals (e.g. lengths, widths, counts and angles of stems, leaves, fruits and inflorescence parts) from multiple sources such as field observations and herbarium collections. We develop a workflow to harmonize heterogeneous trait measurements (e.g. trait names and their values and units) as well as additional information related to taxonomy, measurement or fact and occurrence. This data integration and harmonization builds on vocabularies and terminology from existing metadata standards and ontologies such as the Ecological Trait-data Standard (ETS), the Darwin Core (DwC), the Thesaurus Of Plant characteristics (TOP) and the Plant Trait Ontology (TO). A metadata form filled out by data providers enables the automated integration of trait information from heterogeneous datasets. We illustrate our tools with data from palms (family Arecaceae), a globally distributed (pantropical), diverse plant family that is considered a good model system for understanding the ecology and evolution of tropical rainforests. We mobilize nearly 140,000 individual palm trait measurements in an interoperable format, identify semantic gaps in existing plant trait terminology and provide suggestions for the future development of a thesaurus of plant characteristics. Our work thereby promotes the semantic integration of plant trait data in a machine-readable way and shows how large amounts of small trait data sets and their metadata can be integrated into standardized data products.

Notes

See the Readme file and the accompanying publication in Ecological Informatics for details.

Funding provided by: Nederlandse Organisatie voor Wetenschappelijk Onderzoek
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100003246
Award Number: 824.15.007

Funding provided by: Universiteit van Amsterdam
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100001827
Award Number: starting grant

Funding provided by: Universiteit van Amsterdam
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100001827
Award Number: Faculty Research Cluster 'Global Ecology'

Files

core_table.csv

Files (17.9 MB)

Name Size Download all
md5:62ed906243a675d8208b3410331a9645
15.9 MB Preview Download
md5:c25e04413e0dbcd0dbcf0766ef811ee1
442.7 kB Preview Download
md5:1f02eb4ecb024d9746e6a99033d88062
990.2 kB Preview Download
md5:3b2bafa004f6db493304d863e63aaf1e
477 Bytes Preview Download
md5:8fc038716db4566b3834ff52a79659fd
626.4 kB Preview Download