TetrapodTraits Database
Creators
- Moura, Mario R.1, 2, 3
- Ceron, Karoline4
- Guedes, Jhonny J. M.5
- Chen-Zhao, Rosana1
- Sica, Yanina3
- Hart, Julie6, 3
- Dorman, Wendy7, 3
- Portmann, Julia M.3
- Gonzalez-del-Pliego, Pamela8
- Ranipeta, Ajay3
- Catenazzi, Alessandro9
- Werneck, Fernanda10
- Toledo, Luis Felipe1
- Upham, Nathan11
- Tonini, Joao F. R.12
- Colston, Timothy J.13
- Guralnick, Robert14
- Bowie, Rauri C. K.15
- Pyron, R. Alexander16
- Jetz, Walter3
- 1. Universidade Estadual de Campinas
- 2. Federal University of Paraíba
- 3. Yale University
- 4. Universidade Federal do Ceará
- 5. Universidade Federal de Goiás
- 6. State University of New York
- 7. University of Illinois Urbana-Champaign
- 8. Universidade de Évora
- 9. Florida International University
- 10. National Institute of Amazonian Research
- 11. Arizona State University
- 12. University of Richmond
- 13. University of Puerto Rico-Mayaguez
- 14. University of Florida
- 15. University of California, Berkeley
- 16. George Washington University
Description
Abstract
Tetrapods (amphibians, reptiles, birds and mammals) are model systems for global biodiversity science, but continuing data gaps, limited data standardisation, and ongoing flux in taxonomic nomenclature constrain integrative research on this group and potentially cause biassed inference. We combined and harmonised taxonomic, spatial, phylogenetic, and attribute data with phylogeny-based multiple imputation to provide a comprehensive data resource (TetrapodTraits 1.0.0) that includes values, predictions, and sources for body size, activity time, micro- and macrohabitat, ecosystem, threat status, biogeography, insularity, environmental preferences and human influence, for all 33,281 tetrapod species covered in recent fully sampled phylogenies. We assess gaps and biases across taxa and space, finding that shared data missing in attribute values increased with taxon-level completeness and richness across clades. Prediction of missing attribute values using multiple imputation revealed substantial changes in estimated macroecological patterns. These results highlight biases incurred by non-random missingness and strategies to best address them. While there is an obvious need for further data collection and updates, our phylogeny-informed database of tetrapod traits can support a more comprehensive representation of tetrapod species and their attributes in ecology, evolution, and conservation research.
Additional Information: This work is output of the VertLife project. To flag erros, provide updates, or leave other comments, please go to vertlife.org. We aim to develop the database into a living resource at vertlife.org and your feedback is essential to improve data quality and support community use.
Version 1.0.1 (25 May 2024). This minor release addresses a spelling error in the file Tetrapod_360.csv. The error involves replacing white-space characters with underscore characters in the field Scientific.Name to match the spelling used in the file TetrapodTraits_1.0.0.csv. These corrections affect only 102 species considered extinct and 13 domestic species (Bos_frontalis, Bos_grunniens, Bos_indicus, Bos_taurus, Camelus_bactrianus, Camelus_dromedarius, Capra_hircus, Cavia_porcellus, Equus_caballus, Felis_catus, Lama_glama, Ovis_aries, Vicugna_pacos). All extinct and domestic species in TetrapodTraits have their binomial names separated by underscore symbols instead of white space. Additionally, we have added the file GridCellShapefile.zip, which contains the shapefile required to map species presence across the 110 × 110 km equal area grid cells (this file was previously provided through an External Source here).
Version 1.0.0 (19 April 2024). TetrapodTraits, the full phylogenetically coherent database we developed, is being made publicly available to support a range of research applications in ecology, evolution, and conservation and to help minimise the impacts of biassed data in this model system. The database includes 24 species-level attributes linked to their respective sources across 33,281 tetrapod species. Specific fields clearly label data sources and imputations in the TetrapodTraits, while additional tables record the 10K values per missing entry per species.
- Taxonomy – includes 8 attributes that inform scientific names and respective higher-level taxonomic ranks, authority name, and year of species description. Field names: Scientific.Name, Genus, Family, Suborder, Order, Class, Authority, and YearOfDescription.
- Phylogenetic tree – includes 2 attributes that notify which fully-sampled phylogeny contains the species, along with whether the species placement was imputed or not in the phylogeny. Field names: TreeTaxon, TreeImputed.
- Body size – includes 7 attributes that inform length, mass, and data sources on species sizes, and details on the imputation of species length or mass. Field names: BodyLength_mm, LengthMeasure, ImputedLength, SourceBodyLength, BodyMass_g, ImputedMass, SourceBodyMass.
- Activity time – includes 5 attributes that describe period of activity (e.g., diurnal, fossorial) as dummy (binary) variables, data sources, details on the imputation of species activity time, and a nocturnality score. Field names: Diu, Noc, ImputedActTime, SourceActTime, Nocturnality.
- Microhabitat – includes 8 attributes covering habitat use (e.g., fossorial, terrestrial, aquatic, arboreal, aerial) as dummy (binary) variables, data sources, details on the imputation of microhabitat, and a verticality score. Field names: Fos, Ter, Aqu, Arb, Aer, ImputedHabitat, SourceHabitat, Verticality.
- Macrohabitat – includes 19 attributes that reflect major habitat types according to the IUCN classification, the sum of major habitats, data source, and details on the imputation of macrohabitat. Field names: MajorHabitat_1 to MajorHabitat_10, MajorHabitat_12 to MajorHabitat_17, MajorHabitatSum, ImputedMajorHabitat, SourceMajorHabitat. MajorHabitat_11, representing the marine deep ocean floor (unoccupied by any species in our database), is not included here.
- Ecosystem – includes 6 attributes covering species ecosystem (e.g., terrestrial, freshwater, marine) as dummy (binary) variables, the sum of ecosystem types, data sources, and details on the imputation of ecosystem. Field names: EcoTer, EcoFresh, EcoMar, EcosystemSum, ImputedEcosystem, SourceEcosystem.
- Threat status – includes 3 attributes that inform the assessed threat statuses according to IUCN red list and related literature. Field names: IUCN_Binomial, AssessedStatus, SourceStatus.
- RangeSize – the number of 110×110 grid cells covered by the species range map. Data derived from MOL.
- Latitude – coordinate centroid of the species range map.
- Longitude – coordinate centroid of the species range map.
- Biogeography – includes 8 attributes that present the proportion of species range within each WWF biogeographical realm. Field names: Afrotropic, Australasia, IndoMalay, Nearctic, Neotropic, Oceania, Palearctic, Antarctic.
- Insularity – includes 2 attributes that notify if a species is insular endemic (binary, 1 = yes, 0 = no), followed by the respective data source. Field names: Insularity, SourceInsularity.
- AnnuMeanTemp – Average within-range annual mean temperature (Celsius degree). Data derived from CHELSA v. 1.2.
- AnnuPrecip – Average within-range annual precipitation (mm). Data derived from CHELSA v. 1.2.
- TempSeasonality – Average within-range temperature seasonality (Standard deviation × 100). Data derived from CHELSA v. 1.2.
- PrecipSeasonality – Average within-range precipitation seasonality (Coefficient of Variation). Data derived from CHELSA v. 1.2.
- Elevation – Average within-range elevation (metres). Data derived from topographic layers in EarthEnv.
- ETA50K – Average within-range estimated time to travel to cities with a population >50K in the year 2015. Data from Nelson et al. (2019).
- HumanDensity – Average within-range human population density in 2017. Data derived from HYDE v. 3.2.
- PropUrbanArea – Proportion of species range map covered by built-up area, such as towns, cities, etc. at year 2017. Data derived from HYDE v. 3.2.
- PropCroplandArea – Proportion of species range map covered by cropland area, identical to FAO's category 'Arable land and permanent crops' at year 2017. Data derived from HYDE v. 3.2.
- PropPastureArea – Proportion of species range map covered by cropland, defined as Grazing land with an aridity index > 0.5, assumed to be more intensively managed (converted in climate models) at year 2017. Data derived from HYDE v. 3.2.
- PropRangelandArea – Proportion of species range map covered by rangeland, defined as Grazing land with an aridity index < 0.5, assumed to be less or not managed (not converted in climate models) at year 2017. Data derived from HYDE v. 3.2.
File content
All files use UTF-8 encoding.
- ImputedSets.zip – the phylogenetic multiple imputation framework applied to the TetrapodTraits database produced 10,000 imputed values per missing data entry (= 100 phylogenetic trees x 10 validation-folds x 10 multiple imputations). These imputations were specifically developed for four fundamental natural history traits: Body length, Body mass, Activity time, and Microhabitat. To facilitate the evaluation of each imputed value in a user-friendly format, we offer 10,000 tables containing both observed and imputed data for the 33,281 species in the TetrapodTraits database. Each table encompasses information about the four targeted natural history traits, along with designated fields (e.g., ImputedMass) that clearly indicate whether the trait value provided (e.g., BodyMass_g) corresponds to observed (e.g., ImputedMass = 0) or imputed (e.g., ImputedMass = 1) data. Given that the complete set of 10,000 tables necessitates nearly 17GB of storage space, we have organized sets of 1,000 tables into separate zip files to streamline the download process.
- ImputedSets_1K.zip, imputations for trees 1 to 10.
- ImputedSets_2K.zip, imputations for trees 11 to 20.
- ImputedSets_3K.zip, imputations for trees 21 to 30.
- ImputedSets_4K.zip, imputations for trees 31 to 40.
- ImputedSets_5K.zip, imputations for trees 41 to 50.
- ImputedSets_6K.zip, imputations for trees 51 to 60.
- ImputedSets_7K.zip, imputations for trees 61 to 70.
- ImputedSets_8K.zip, imputations for trees 71 to 80.
- ImputedSets_9K.zip, imputations for trees 81 to 90.
- ImputedSets_10K.zip, imputations for trees 91 to 100.
- TetrapodTraits_1.0.0.csv – the complete TetrapodTraits database, with missing data entries in natural history traits (body length, body mass, activity time, and microhabitat) replaced by the average across the 10K imputed values obtained through phylogenetic multiple imputation. Please note that imputed microhabitat (attribute fields: Fos, Ter, Aqu, Arb, Aer) and imputed activity time (attribute fields: Diu, Noc) are continuous variables within the 0-1 range interval. At the user's discretion, the types of microhabitat and activity time can be transformed into binary variables using a predefined threshold (e.g., 0.50), although we recommend utilizing the original imputed values.
- Tetrapod_360.csv – spatial intersections of the 110 x 110 km quadrats shapefile (GridCellShapefile.zip) with species geographic range maps from https://mol.org.
- GridCellShapefile.zip – contains grid cell shapefiles with a spatial resolution of 110 km, which are required to map the species listed in the Tetrapod_360.csv file. Please note that due to the limitation on the number of characters in shapefile field names, the field names of gridcells_110km.shp are displayed as ("Cl_I110", "Long", "Lat", "WWF_Rlm", "PrpLndA"). Be aware to rename field names to ("Cell_Id110", "Long", "Lat", "WWF_Realm", "PropLandArea") to match the terminology used for the 110 x 110 km grid cells in other files.
External files
The R-code used for data analysis is available at 10.5281/zenodo.10582069.
Funding
São Paulo Research Foundation (FAPESP) for grants supporting MRM (#2021/11840-6 and #2022/12231-6), LFT (#2016/25358-3), KC (#2020/12558-0), and RZC (#2022/15247-0); Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) for the fellowship to JJMG; Conselho Nacional de Desenvolvimento Científico - CNPq for research grants in support of FPW (#311504/2020-5) and LFT (#302834/2020-6); U.S. National Science Foundation (NSF) for grants supporting RAP (DEB-1441719), RCKB (DEB-1441652), and WJ (DEB-1441737 and DEB-1441719). WJ also acknowledges support from NASA grants 80NSSC17K0282 and 80NSSC18K0435; and the E.O. Wilson Biodiversity Foundation.
Citation
Moura, M.R., Ceron, K., Guedes, J.J.M., Chen-Zhao, R., Sica, Y.V., Hart, J., Dorman, W., Gonzalez-del-Pliego, P., Ranipeta, A., Catenazzi, A.., Werneck, F.P., Toledo, L.F., Upham, N.S., Tonini, J.F.R., Colston, T.J., Guralnick, R., Bowie, R.C.K., Pyron, R.A., Jetz, W. A phylogeny-informed characterisation of global tetrapod traits addresses data gaps and biases. BioRXiv, 2024, doi: 10.1101/2023.03.04.531098v3
Correspondence to: mariormoura@gmail.com
Files
TetrapodTraits_1.0.0.csv
Files
(4.3 GB)
Name | Size | Download all |
---|---|---|
md5:ff130865dc41c274d2022a9e0ab35816
|
1.3 MB | Preview Download |
md5:e96c78b882fdfa267c229bf7044e2d1d
|
416.1 MB | Preview Download |
md5:50f312df1eb775d4a646a3b0bb589e83
|
415.4 MB | Preview Download |
md5:23593ec1999cec931908cdfa8e3144df
|
415.4 MB | Preview Download |
md5:4af9f9a69a20d6a01f95731f7ec56e6a
|
415.4 MB | Preview Download |
md5:e9d51bff0cd5a18d73ec56aeb44403d5
|
415.4 MB | Preview Download |
md5:c726e112e33c1d90885f86e84c0bbcdb
|
415.6 MB | Preview Download |
md5:938c00082e61914e3dd0dd3b9f522678
|
416.1 MB | Preview Download |
md5:51eee5619e87a027da39b0a2bd8d7e4c
|
416.0 MB | Preview Download |
md5:3ddba3915ad64be5afa68db113fc5afa
|
416.1 MB | Preview Download |
md5:7f29affa0c43643045d1fe7768be722b
|
416.1 MB | Preview Download |
md5:c1fabd2ef59a4e345a1d4d3ec1366fea
|
136.0 MB | Preview Download |
md5:f30c6272958a58d25578c5dbfb19cc01
|
30.9 MB | Preview Download |
Additional details
Related works
- Is derived from
- Preprint: 10.1101/2023.03.04.531098v3 (DOI)
- Workflow: 10.5281/zenodo.10582070 (DOI)