AusTraits data compilation - a curated plant trait database for the Australian floraThis document describes the structure of the AusTraits compilation, corresponding to Version 3.0.2 of the dataset. Note that the information provided below is based on the information provided within the file definitions.yml.
For details on access, structure and usage please visit https://doi.org/10.5281/zenodo.3568417
The compiled AusTraits database has the following main components:
austraits
├── traits
├── sites
├── contexts
├── methods
├── excluded_data
├── taxa
├── taxonomic_updates
├── definitions
├── contributors
├── sources
└── build_info
These elements include all the data and contextual information submitted with each contributed datasets. Each component is defined as follows:
Description: A table containing measurements of plant traits.
Content:
| key | value |
|---|---|
| dataset_id |
Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005.
|
| taxon_name | Currently accepted name of taxon in the Australian Plant Census or, for unplaced species, in the Australian Plant Names Index. |
| site_name |
Name of site where individual was sampled. Cross-references between similar columns in sites and traits.
|
| context_name |
Name of contextual senario where individual was sampled. Cross-references between similar columns in contexts and traits.
|
| observation_id |
A unique identifier for the observation, useful for joining traits coming from the same observation_id. These are assigned automatically, based on the dataset_id and row number of the raw data.
|
| trait_name |
Name of trait sampled. Allowable values specified in the table traits.
|
| value | Measured value. |
| unit | Units of the sampled trait value after aligning with AusTraits standards. |
| date |
Date sample was taken, in the format yyyy-mm-dd, but with days and months only when specified.
|
| value_type | A categorical variable describing the type of trait value recorded. |
| replicates |
Number of replicate measurements that comprise the data points for the trait for each measurement. A numeric value (or range) is ideal and appropriate if the value type is a mean, median, min or max. For these value types, if replication is unknown the entry should be unknown. If the value type is raw_value the replicate value should be 1. If the value type is expert_mean, expert_min, or expert_max the replicate value should be .na.
|
| original_name | Name given to taxon in the original data supplied by the authors |
Description: A table containing observations of site characteristics associated with information in traits. Cross referencing between the two dataframes is possible using combinations of the variables dataset_id, site_name.
Content:
| key | value |
|---|---|
| dataset_id |
Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005.
|
| site_name |
Name of site where individual was sampled. Cross-references between similar columns in sites and traits.
|
| site_property |
The site characteristic being recorded. Name should include units of measurement, e.g. longitude (deg). Ideally we have at least these variables for each site - longitude (deg), latitude (deg), description.
|
| value | Measured value. |
Description: A table containing observations of contextual characteristics associated with information in traits. Cross referencing between the two dataframes is possible using combinations of the variables dataset_id, context_name.
Content:
| key | value |
|---|---|
| dataset_id |
Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005.
|
| context_name |
Name of contextual senario where individual was sampled. Cross-references between similar columns in contexts and traits.
|
| context_property |
The contextual characteristic being recorded. Name should include units of measurement, e.g. elevation (m).
|
| value | Measured value. |
Description: A table containing details on methods with which data were collected, including time frame and source.
Content:
| key | value |
|---|---|
| dataset_id |
Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005.
|
| trait_name |
Name of trait sampled. Allowable values specified in the table traits.
|
| methods | A textual description of the methods used to collect the trait data. Whenever available, methods are taken near-verbatim from referenced source. Methods can include descriptions such as ‘measured on botanical collections’,‘data from the literature’, or a detailed description of the field or lab methods used to collect the data. |
| year_collected_start | The year data collection commenced. |
| year_collected_end | The year data collection was completed. |
| description | A 1-2 sentence description of the purpose of the study. |
| collection_type |
A field to indicate where the majority of plants on which traits were measured were collected - in the field, lab, glasshouse, botanical collection, or literature. The latter should only be used when the data were sourced from the literature and the collection type is unknown.
|
| sample_age_class |
A field to indicate if the study was completed on adult or juvenile plants.
|
| sampling_strategy | A written description of how study sites were selected and how study individuals were selected. When available, this information is lifted verbatim from a published manuscript. For botanical collections, this field ideally indicates which records were ‘sampled’ to measure a specific trait. |
| source_primary_citation | Citation for primary source. This detail is generated from the primary source in the metadata. |
| source_primary_key |
Citation key for primary source in sources. The key is typically of format Surname_year.
|
| source_secondary_citation | Citations for secondary source. This detail is generated from the secondary source in the metadata. |
| source_secondary_key |
Citation key for secondary source in sources. The key is typically of format Surname_year.
|
Description: A table of data that did not pass quality test and so were excluded from the master dataset.
Description: A table of all taxonomic changes implemented in the construction of AusTraits. Changes are determined by comapring against the APC (Australian Plant Census) and APNI (Australian Plant Names Index).
Content:
| key | value |
|---|---|
| dataset_id |
Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005.
|
| original_name | Name given to taxon in the original data supplied by the authors |
| cleaned_name |
Name of the taxon after implementing any changes encoded for this taxon in the metadata file in the specified correpsonding dataset_id.
|
| taxonIDClean |
Where it could be indentified, the taxonID of the cleaned_name for this taxon in the APC.
|
| taxonomicStatusClean |
Taxonomic status of the taxon identified by taxonIDClean in the APC.
|
| alternativeTaxonomicStatusClean |
The status of alternative records with the name cleaned_name in the APC.
|
| acceptedNameUsageID | ID of the accepted name for taxon in the APC or APNI. |
| taxon_name | Currently accepted name of taxon in the Australian Plant Census or, for unplaced species, in the Australian Plant Names Index. |
Description: A table containing details on taxa associated with information in traits. This information has been sourced from the APC (Australian Plant Census) and APNI (Australian Plant Names Index) and is released under a CC-BY3 license.
Content:
| key | value |
|---|---|
| taxon_name | Currently accepted name of taxon in the Australian Plant Census or, for unplaced species, in the Australian Plant Names Index. |
| source | Source of taxnonomic information, either APC or APNI. |
| acceptedNameUsageID | Identifier for the accepted name of the taxon. |
| scientificNameAuthorship | Authority for accepted of the taxon indicated under taxon_name. |
| taxonRank | Rank of the taxon. |
| taxonomicStatus | Taxonomic status of the taxon. |
| family | Family of the taxon. |
| genus | Genus of the taxon. |
| taxonDistribution | Known distribution of the taxon. |
| ccAttributionIRI | Source of taxonomic information. |
Description: A copy of the definitions for all tables and terms. Information included here was used to process data and generate any documentation for the study.
Description: A table of people contributing to each study.
Content:
| key | value |
|---|---|
| dataset_id |
Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005.
|
| name | Name of contributor |
| institution | Last known institution or affiliation |
| role | Their role in the study |
Description: Bibtex entries for all primary and secondary sources in the compilation.
Description: A description of the computing environment used to create this version of the dataset, including version number, git commit and R session_info.
The core organising unit behind AusTraits is the dataset_id. Records are organisation as coming from a particular study, defined by the dataset_id. Our preferred format for dataset_id is surname of the first author of any corresponding publication, followed by the year, as surname_year. E.g. Falster_2005. Wherever there are multiple studies with the same id, we add a suffix _2, _3 etc. E.g.Falster_2005, Falster_2005_2.
As well as a dataset_id, each trait measurement has an associated observation_id. Observation IDs bind together related measurements within any dataset, and thereby allow transformation between long (e.g. with variables trait_name and value) and wide (e.g. with traits as columns) formats.
Generally, observation_id has the format dataset_id_XX where XX is a unique number within each dataset. For example, if multiple traits were collected on the same individual, the observation_id allows us to gather these together. For floras, which report a species averages, the observation_id is assigned at the species level.
For datasets that arrive in wide format we assume each row has a unique observation_id. For datasets that arrive in long format, the observation_id is assigned based on a specified grouping variable. If missing, observation_id is assigned based on taxon_name.
As well as dataset_id and observation_id, where appropriate, trait values are associated with a site_name. Unique combinations of dataset_id and site_name can be used to cross-match against the sites table, which provide further details on the site sampled.
As well as dataset_id, observation_id, and site_name, where appropriate, trait values are associated with a context_name. Unique combinations of dataset_id and context_name can be used to cross-match against the context table, which provide further details on the context sampled.
Each record in the table of trait data has an associated value and value_type.
Traits are either numeric or categorical. For traits with numerical values, the recorded value has been converted into standardised units and we have check that the value can be converted into a number and lies within the allowable range. For categorical variables, we only include records that are defined in the definitions. Moreover, we use a format whereby
_ for multi-word terms, e.g. semi_deciduousannual biennial for something which is either annual or biennialEach trait measurement also an associated value_type, which gives A categorical variable describing the type of trait value recorded.. Possible values are:
| key | value |
|---|---|
| raw_value | Value is a direct measurement |
| site_min | Value is the minimum of measurements on multiple individuals of the taxon at a single site |
| site_mean | Value is the mean or median of measurements on multiple individuals of the taxon at a single site |
| site_max | Value is the maximum of measurements on multiple individuals of the taxon at a single site |
| multisite_min | Value is the minimum of measurements on multiple individuals of the taxon across multiple sites |
| multisite_mean | Value is the mean or median of measurements on multiple individuals of the taxon across multiple sites |
| multisite_max | Value is the maximum of measurements on multiple individuals of the taxon across multiple sites |
| expert_min | Value is the minimum observed for a taxon across its range or in this particular dataset, as estimated by an expert based on their knowledge of the taxon. Data fitting this category include estimates from flora that represent a taxon’s entire range, and values for categorical variables obtained from a reference book, or identified by an expert. |
| expert_mean | Value is the mean observed for a taxon across its range or in this particular dataset, as estimated by an expert based on their knowledge of the taxon. Data fitting this category include estimates from flora that represent a taxon’s entire range, and values for categorical variables obtained from a reference book, or identified by an expert. |
| expert_max | Value is the maximum observed for a taxon across its range or in this particular dataset, as estimated by an expert based on their knowledge of the taxon. Data fitting this category include estimates from flora that represent a taxon’s entire range, and values for categorical variables obtained from a reference book, or identified by an expert. |
| experiment_min | Value is the minimum of measurements from an experimental study either in the field or a glasshouse |
| experiment_mean | Value is the mean or median of measurements from an experimental study either in the field or a glasshouse |
| experiment_max | Value is the maximum of measurements from an experimental study either in the field or a glasshouse |
| individual_mean | Value is a mean of replicate measurements on an individual (usually for experimental ecophysiology studies) |
| individual_max | Value is a maximum of replicate measurements on an individual (usually for experimental ecophysiology studies) |
| literature_source | Value is a site or multi-site mean that has been sourced from an unknown literature source |
| unknown | Value type is not currently known |
AusTraits does not include intra-individual observations. When multiple measurements per individual are submitted to AusTraits, we take the mean of the values and record the value_type as individual_mean.
Version 3.0.2 of AusTraits contains records for 28640 different taxa. We have aligned taxa with known taxonomic units in the Australian Plant Census (APC) and/or the Australian Plant Names Index (APNI). Of the 28640 taxa included, 27001 are aligned with known taxa.
The traits table reports both the original and the updated taxon name alongside each trait record.
The table taxa lists all taxa in the database, including additional infomration about the taxa.
The table taxanomic_updates provdies details on all taxonomic names changes implemented in aligning with APC and APNI.
For each dataset in the compilation there is the option to list primary and secondary citations. The primary citation The original study in which data were collected. while the secondary citation is A subsequent study where data were compiled or re-analysed and then made available.. These references are included in two places:
Followign is a list of traits included in this version.