Planned intervention: On Thursday 19/09 between 05:30-06:30 (UTC), Zenodo will be unavailable because of a scheduled upgrade in our storage cluster.
Published May 25, 2024 | Version 1.0.1
Dataset Open

TetrapodTraits Database

  • 1. ROR icon Universidade Estadual de Campinas
  • 2. ROR icon Federal University of Paraíba
  • 3. ROR icon Yale University
  • 4. ROR icon Universidade Federal do Ceará
  • 5. ROR icon Universidade Federal de Goiás
  • 6. ROR icon State University of New York
  • 7. ROR icon University of Illinois Urbana-Champaign
  • 8. Universidade de Évora
  • 9. ROR icon Florida International University
  • 10. ROR icon National Institute of Amazonian Research
  • 11. ROR icon Arizona State University
  • 12. ROR icon University of Richmond
  • 13. ROR icon University of Puerto Rico-Mayaguez
  • 14. ROR icon University of Florida
  • 15. ROR icon University of California, Berkeley
  • 16. ROR icon George Washington University

Description

Abstract 

Tetrapods (amphibians, reptiles, birds and mammals) are model systems for global biodiversity science, but continuing data gaps, limited data standardisation, and ongoing flux in taxonomic nomenclature constrain integrative research on this group and potentially cause biassed inference. We combined and harmonised taxonomic, spatial, phylogenetic, and attribute data with phylogeny-based multiple imputation to provide a comprehensive data resource (TetrapodTraits 1.0.0) that includes values, predictions, and sources for body size, activity time, micro- and macrohabitat, ecosystem, threat status, biogeography, insularity, environmental preferences and human influence, for all 33,281 tetrapod species covered in recent fully sampled phylogenies. We assess gaps and biases across taxa and space, finding that shared data missing in attribute values increased with taxon-level completeness and richness across clades. Prediction of missing attribute values using multiple imputation revealed substantial changes in estimated macroecological patterns. These results highlight biases incurred by non-random missingness and strategies to best address them. While there is an obvious need for further data collection and updates, our phylogeny-informed database of tetrapod traits can support a more comprehensive representation of tetrapod species and their attributes in ecology, evolution, and conservation research.

Additional Information: This work is output of the VertLife project. To flag erros, provide updates, or leave other comments, please go to vertlife.org. We aim to develop the database into a living resource at vertlife.org and your feedback is essential to improve data quality and support community use.

Version 1.0.1 (25 May 2024). This minor release addresses a spelling error in the file Tetrapod_360.csv. The error involves replacing white-space characters with underscore characters in the field Scientific.Name to match the spelling used in the file TetrapodTraits_1.0.0.csv. These corrections affect only 102 species considered extinct and 13 domestic species (Bos_frontalis, Bos_grunniens, Bos_indicus, Bos_taurus, Camelus_bactrianus, Camelus_dromedarius, Capra_hircus, Cavia_porcellus, Equus_caballus, Felis_catus, Lama_glama, Ovis_aries, Vicugna_pacos). All extinct and domestic species in TetrapodTraits have their binomial names separated by underscore symbols instead of white space. Additionally, we have added the file GridCellShapefile.zip, which contains the shapefile required to map species presence across the 110 × 110 km equal area grid cells (this file was previously provided through an External Source here).

Version 1.0.0 (19 April 2024). TetrapodTraits, the full phylogenetically coherent database we developed, is being made publicly available to support a range of research applications in ecology, evolution, and conservation and to help minimise the impacts of biassed data in this model system. The database includes 24 species-level attributes linked to their respective sources across 33,281 tetrapod species. Specific fields clearly label data sources and imputations in the TetrapodTraits, while additional tables record the 10K values per missing entry per species.

  1. Taxonomy – includes 8 attributes that inform scientific names and respective higher-level taxonomic ranks, authority name, and year of species description. Field names: Scientific.Name, Genus, Family, Suborder, Order, Class, Authority, and YearOfDescription.
  2. Phylogenetic tree – includes 2 attributes that notify which fully-sampled phylogeny contains the species, along with whether the species placement was imputed or not in the phylogeny. Field names: TreeTaxon, TreeImputed.
  3. Body size – includes 7 attributes that inform length, mass, and data sources on species sizes, and details on the imputation of species length or mass. Field names: BodyLength_mm, LengthMeasure, ImputedLength, SourceBodyLength, BodyMass_g, ImputedMass, SourceBodyMass.
  4. Activity time – includes 5 attributes that describe period of activity (e.g., diurnal, fossorial) as dummy (binary) variables, data sources, details on the imputation of species activity time, and a nocturnality score. Field names: Diu, Noc, ImputedActTime, SourceActTime, Nocturnality.
  5. Microhabitat – includes 8 attributes covering habitat use (e.g., fossorial, terrestrial, aquatic, arboreal, aerial) as dummy (binary) variables, data sources, details on the imputation of microhabitat, and a verticality score. Field names: Fos, Ter, Aqu, Arb, Aer, ImputedHabitat, SourceHabitat, Verticality.
  6. Macrohabitat – includes 19 attributes that reflect major habitat types according to the IUCN classification, the sum of major habitats, data source, and details on the imputation of macrohabitat. Field names: MajorHabitat_1 to MajorHabitat_10, MajorHabitat_12 to MajorHabitat_17, MajorHabitatSum, ImputedMajorHabitat, SourceMajorHabitat. MajorHabitat_11, representing the marine deep ocean floor (unoccupied by any species in our database), is not included here.
  7. Ecosystem – includes 6 attributes covering species ecosystem (e.g., terrestrial, freshwater, marine) as dummy (binary) variables, the sum of ecosystem types, data sources, and details on the imputation of ecosystem. Field names: EcoTer, EcoFresh, EcoMar, EcosystemSum, ImputedEcosystem, SourceEcosystem.
  8. Threat status – includes 3 attributes that inform the assessed threat statuses according to IUCN red list and related literature. Field names: IUCN_Binomial, AssessedStatus, SourceStatus.
  9. RangeSize – the number of 110×110 grid cells covered by the species range map. Data derived from MOL.
  10. Latitude – coordinate centroid of the species range map.
  11. Longitude – coordinate centroid of the species range map.
  12. Biogeography – includes 8 attributes that present the proportion of species range within each WWF biogeographical realm. Field names: Afrotropic, Australasia, IndoMalay, Nearctic, Neotropic, Oceania, Palearctic, Antarctic.
  13. Insularity – includes 2 attributes that notify if a species is insular endemic (binary, 1 = yes, 0 = no), followed by the respective data source. Field names: Insularity, SourceInsularity.
  14. AnnuMeanTemp – Average within-range annual mean temperature (Celsius degree). Data derived from CHELSA v. 1.2.
  15. AnnuPrecip – Average within-range annual precipitation (mm). Data derived from CHELSA v. 1.2.
  16. TempSeasonality –  Average within-range temperature seasonality (Standard deviation × 100). Data derived from CHELSA v. 1.2.
  17. PrecipSeasonality –  Average within-range precipitation seasonality (Coefficient of Variation). Data derived from CHELSA v. 1.2.
  18. Elevation – Average within-range elevation (metres). Data derived from topographic layers in EarthEnv.
  19. ETA50K – Average within-range estimated time to travel to cities with a population >50K in the year 2015. Data from Nelson et al. (2019).
  20. HumanDensity – Average within-range human population density in 2017. Data derived from HYDE v. 3.2.
  21. PropUrbanArea – Proportion of species range map covered by built-up area, such as towns, cities, etc. at year 2017. Data derived from HYDE v. 3.2.
  22. PropCroplandArea – Proportion of species range map covered by cropland area, identical to FAO's category 'Arable land and permanent crops' at year 2017. Data derived from HYDE v. 3.2.
  23. PropPastureArea – Proportion of species range map covered by cropland, defined as Grazing land with an aridity index > 0.5, assumed to be more intensively managed (converted in climate models) at year 2017. Data derived from HYDE v. 3.2.
  24. PropRangelandArea – Proportion of species range map covered by rangeland, defined as Grazing land with an aridity index < 0.5, assumed to be less or not managed (not converted in climate models) at year 2017.  Data derived from HYDE v. 3.2.

File content

All files use UTF-8 encoding.

  • ImputedSets.zip  the phylogenetic multiple imputation framework applied to the TetrapodTraits database produced 10,000 imputed values per missing data entry (= 100 phylogenetic trees x 10 validation-folds x 10 multiple imputations). These imputations were specifically developed for four fundamental natural history traits: Body length, Body mass, Activity time, and Microhabitat. To facilitate the evaluation of each imputed value in a user-friendly format, we offer 10,000 tables containing both observed and imputed data for the 33,281 species in the TetrapodTraits database. Each table encompasses information about the four targeted natural history traits, along with designated fields (e.g., ImputedMass) that clearly indicate whether the trait value provided (e.g., BodyMass_g) corresponds to observed (e.g., ImputedMass = 0) or imputed (e.g., ImputedMass = 1) data. Given that the complete set of 10,000 tables necessitates nearly 17GB of storage space, we have organized sets of 1,000 tables into separate zip files to streamline the download process.
    • ImputedSets_1K.zip, imputations for trees 1 to 10.
    • ImputedSets_2K.zip, imputations for trees 11 to 20.
    • ImputedSets_3K.zip, imputations for trees 21 to 30.
    • ImputedSets_4K.zip, imputations for trees 31 to 40.
    • ImputedSets_5K.zip, imputations for trees 41 to 50.
    • ImputedSets_6K.zip, imputations for trees 51 to 60.
    • ImputedSets_7K.zip, imputations for trees 61 to 70.
    • ImputedSets_8K.zip, imputations for trees 71 to 80.
    • ImputedSets_9K.zip, imputations for trees 81 to 90.
    • ImputedSets_10K.zip, imputations for trees 91 to 100.

  • TetrapodTraits_1.0.0.csv   the complete TetrapodTraits database, with missing data entries in natural history traits (body length, body mass, activity time, and microhabitat) replaced by the average across the 10K imputed values obtained through phylogenetic multiple imputation. Please note that imputed microhabitat (attribute fields: Fos, Ter, Aqu, Arb, Aer) and imputed activity time (attribute fields: Diu, Noc) are continuous variables within the 0-1 range interval. At the user's discretion, the types of microhabitat and activity time can be transformed into binary variables using a predefined threshold (e.g., 0.50), although we recommend utilizing the original imputed values.

  • Tetrapod_360.csv  spatial intersections of the 110 x 110 km quadrats shapefile (GridCellShapefile.zip) with species geographic range maps from https://mol.org.

  • GridCellShapefile.zip – contains grid cell shapefiles with a spatial resolution of 110 km, which are required to map the species listed in the Tetrapod_360.csv file. Please note that due to the limitation on the number of characters in shapefile field names, the field names of gridcells_110km.shp are displayed as ("Cl_I110", "Long", "Lat", "WWF_Rlm", "PrpLndA"). Be aware to rename field names to ("Cell_Id110", "Long", "Lat", "WWF_Realm", "PropLandArea") to match the terminology used for the 110 x 110 km grid cells in other files.

External files

The R-code used for data analysis is available at 10.5281/zenodo.10582069.

Funding

São Paulo Research Foundation (FAPESP) for grants supporting MRM (#2021/11840-6 and #2022/12231-6), LFT (#2016/25358-3), KC (#2020/12558-0), and RZC (#2022/15247-0); Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) for the fellowship to JJMG;  Conselho Nacional de Desenvolvimento Científico - CNPq for research grants in support of FPW (#311504/2020-5) and LFT (#302834/2020-6); U.S. National Science Foundation (NSF) for grants supporting RAP (DEB-1441719), RCKB (DEB-1441652), and WJ (DEB-1441737 and DEB-1441719). WJ also acknowledges support from NASA grants 80NSSC17K0282 and 80NSSC18K0435; and the E.O. Wilson Biodiversity Foundation.

Citation

Moura, M.R., Ceron, K., Guedes, J.J.M., Chen-Zhao, R., Sica, Y.V., Hart, J., Dorman, W., Gonzalez-del-Pliego, P., Ranipeta, A., Catenazzi, A.., Werneck, F.P., Toledo, L.F., Upham, N.S., Tonini, J.F.R., Colston, T.J., Guralnick, R., Bowie, R.C.K., Pyron, R.A., Jetz, W. A phylogeny-informed characterisation of global tetrapod traits addresses data gaps and biases. BioRXiv, 2024, doi: 10.1101/2023.03.04.531098v3

Correspondence to: mariormoura@gmail.com

Files

TetrapodTraits_1.0.0.csv

Files (4.3 GB)

Name Size Download all
md5:ff130865dc41c274d2022a9e0ab35816
1.3 MB Preview Download
md5:e96c78b882fdfa267c229bf7044e2d1d
416.1 MB Preview Download
md5:50f312df1eb775d4a646a3b0bb589e83
415.4 MB Preview Download
md5:23593ec1999cec931908cdfa8e3144df
415.4 MB Preview Download
md5:4af9f9a69a20d6a01f95731f7ec56e6a
415.4 MB Preview Download
md5:e9d51bff0cd5a18d73ec56aeb44403d5
415.4 MB Preview Download
md5:c726e112e33c1d90885f86e84c0bbcdb
415.6 MB Preview Download
md5:938c00082e61914e3dd0dd3b9f522678
416.1 MB Preview Download
md5:51eee5619e87a027da39b0a2bd8d7e4c
416.0 MB Preview Download
md5:3ddba3915ad64be5afa68db113fc5afa
416.1 MB Preview Download
md5:7f29affa0c43643045d1fe7768be722b
416.1 MB Preview Download
md5:c1fabd2ef59a4e345a1d4d3ec1366fea
136.0 MB Preview Download
md5:f30c6272958a58d25578c5dbfb19cc01
30.9 MB Preview Download

Additional details

Related works

Is derived from
Preprint: 10.1101/2023.03.04.531098v3 (DOI)
Workflow: 10.5281/zenodo.10582070 (DOI)