Published April 19, 2024 | Version v2
Software documentation Open

Data from: A phylogeny-informed characterisation of global tetrapod traits addresses data gaps and biases

  • 1. ROR icon Universidade Estadual de Campinas
  • 2. ROR icon Universidade Federal do Ceará
  • 3. ROR icon Universidade Federal de Goiás
  • 4. ROR icon Yale University
  • 5. ROR icon State University of New York
  • 6. ROR icon University of Illinois Urbana-Champaign
  • 7. Universidade de Évora
  • 8. ROR icon Florida International University
  • 9. ROR icon National Institute of Amazonian Research
  • 10. ROR icon Arizona State University
  • 11. ROR icon University of Richmond
  • 12. ROR icon University of Puerto Rico-Mayaguez
  • 13. ROR icon University of Florida
  • 14. ROR icon University of California, Berkeley
  • 15. ROR icon George Washington University



Raw data and R-code for the research approach on phylogenetic multiple imputation of natural history traits for tetrapod species. In brief, the code computes the phylogenetic filters, reproduces the grid search procedure to tune XGBoost hyperparameters; compute phylogenetic multiple imputations for traits related to body length, body mass, activity time, and microhabitat; and build the figures used in the main text and supplementary material. The original code was executed on a private cloud-processing service, which may limit is reproducibility by external users. However, the code provided here has been adapted to ensure reproducibility on local machines, using a subset of the data. The complete TetrapodTraits database is available at 10.5281/zenodo.10530617.

The modelling framework was built using the directory structure informed in the README.pdf file. The R-code provided have steps designed to replicate the directory structure as informed, but understanding it is a good starting point to navigate the outputs produced.

File content this zip file represents the "Datasets" folder directory, as illustrated in the README.pdf. This zip file contains 17 CSV files, encompassing outputs at the genus- and family-level (measures of trait completeness, relative change in average trait values per taxon), assemblage-level outputs (measures of richness, trait completeness, and relative change in average trait values per 110 x 110 km grid cell), tuned hyperparameters of XGBoost models, summary statistics of model performance across the 10,000 multiple phylogenetic imputations computed, and a table with data source and respective reference type. All csv files are loaded from within the R scripts provided. this zip file represents the "PhylogenySets" folder directory. It includes subsets of 100 fully-sampled phylogenetic tree published for all five major tetrapod groups (Amphibians, Chelonians and CrocrodiliansSquamates, Birds, and Mammals) in the Nexus (.nex) format. All trees were extracted from VertLife, using the Phylogeny subsets tool. this zip file represents the "PhylogeneticFilters" folder directory, as illustrated in the README.pdf. The phylogenetic filters were derived from the fully-sampled phylogenies available for tetrapod species in VertLife. For the computations of the TetrapodTraits database, we derived phylogenetic filters across a subset of 100 phylogenies for each major tetrapod group. The complete set of phylogenetic filters for each tree is available in the RData format, with the full set across the 500 trees encompassing more than 350GB. Due to storage limitations, we provide phylogenetic filters for 10 trees per major tetrapod group. If you are interested in additional sets, please contact the corresponding author. this zip file represents the "Shapefiles" folder directory, as illustrated in the README.pdf. This zip file contains two shapefiles: (i) gridcells_110km.shp represents an equal area grid cell shapefiles at 110 km of spatial resolution; and (ii) wwf_realms.shp represents the major terrestrial biogeographic realms (derived from Please note that due to the limitation on the number of characters in shapefile field names, the field names of gridcells_110km.shp are displayed as ("Cl_I110", "Long", "Lat", "WWF_Rlm", "PrpLndA"). The R code that relies on this file will immediately rename the original field names to ("Cell_Id110", "Long", "Lat", "WWF_Realm", "PropLandArea") to match the terminology used to refer to the 110 x 110 km grid cells.

R-scripts: three files provided.

  1. Moura_et_al_TetrapodTraits_Script1_PhylogeneticMultipleImputations.R: this script computes the phylogenetic filters, the grid search procedure to tune XGBoost hyperparameters, and conducts phylogenetic multiple imputations for traits related to body length, body mass, activity time, and microhabitat.
  2. Moura_et_al_TetrapodTraits_Script2_DataExploration.R: this script extracts patterns of trait completeness and biases in missing values at the level of genus-, family- and assemblages (110 x 110 km grid cells), and computes co-occurrence patterns of data missingness across clades.
  3. Moura_et_al_TetrapodTraits_Script3_Figures.R: this script builds the figures reported in the main text and supplementary material.

Tetrapoda_prunedToFamily.tre: a family-level tree for tetrapod species derived from the combination of the available global phylogenies for major tetrapod groups. The R function tree.merger of the RRphylo package was used to combine a single fully-sampled phylogeny of amphibians, turtles and crocs, squamates, birds, and mammals. This file should be stored in the Working Directory.

TetrapodTraits_datasample.csv: a subset of the TetrapodTraits database including 1,000 mammal species. This file should be stored in the Working Directory.

External files

To fully execute scripts 2 and 3, two additional files are necessary: (i) TetrapodTraits_1.0.0.csv and (ii) Tetrapod_360.csv. These files represent the TetrapodTraits database and the respective spatial intersections of species' geographic range maps across the 110 x 110 km grid cells. Both files are availabe in 10.5281/zenodo.10530618.


São Paulo Research Foundation (FAPESP) for grants supporting MRM (#2021/11840-6 and #2022/12231-6), LFT (#2016/25358-3), KC (#2020/12558-0), and RZC (#2022/15247-0); Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) for the fellowship to JJMG;  Conselho Nacional de Desenvolvimento Científico - CNPq for research grants in support of FPW (#311504/2020-5) and LFT (#302834/2020-6); U.S. National Science Foundation (NSF) for grants supporting RAP (DEB-1441719), RCKB (DEB-1441652), and WJ (DEB-1441737 and DEB-1441719). WJ also acknowledges support from NASA grants 80NSSC17K0282 and 80NSSC18K0435; and the E.O. Wilson Biodiversity Foundation.


Moura, M.R., Ceron, K., Guedes, J.J.M., Chen-Zhao, R., Sica, Y.V., Hart, J., Dorman, W., Gonzalez-del-Pliego, P., Ranipeta, A., Catenazzi, A.., Werneck, F.P., Toledo, L.F., Upham, N.S., Tonini, J.F.R., Colston, T.J., Guralnick, R., Bowie, R.C.K., Pyron, R.A., Jetz, W. A phylogeny-informed characterization of global tetrapod traits addresses data gaps and biases. BioRXiv, 2024, doi: 10.1101/2023.03.04.531098v3

Correspondence to:



Files (37.9 GB)

Name Size Download all
16.9 MB Preview Download
103.5 kB Download
56.7 kB Download
219.4 kB Download
7.0 GB Preview Download
13.5 GB Preview Download
4.7 GB Preview Download
12.7 GB Preview Download
19.2 MB Preview Download
63.6 MB Preview Download
162.9 kB Preview Download
14.8 MB Preview Download
19.7 kB Download
1.0 MB Preview Download

Additional details

Related works

Is derived from
Preprint: 10.1101/2023.03.04.531098v3 (DOI)
Is part of
Dataset: 10.5281/zenodo.10530617 (DOI)