Published April 19, 2024 | Version v2
Software documentation Open

Data from: A phylogeny-informed characterisation of global tetrapod traits addresses data gaps and biases

  • 1. ROR icon Universidade Estadual de Campinas
  • 2. ROR icon Universidade Federal do Ceará
  • 3. ROR icon Universidade Federal de Goiás
  • 4. ROR icon Yale University
  • 5. ROR icon State University of New York
  • 6. ROR icon University of Illinois Urbana-Champaign
  • 7. Universidade de Évora
  • 8. ROR icon Florida International University
  • 9. ROR icon National Institute of Amazonian Research
  • 10. ROR icon Arizona State University
  • 11. ROR icon University of Richmond
  • 12. ROR icon University of Puerto Rico-Mayaguez
  • 13. ROR icon University of Florida
  • 14. ROR icon University of California, Berkeley
  • 15. ROR icon George Washington University

Description

Description

Raw data and R-code for the research approach on phylogenetic multiple imputation of natural history traits for tetrapod species. In brief, the code computes the phylogenetic filters, reproduces the grid search procedure to tune XGBoost hyperparameters; compute phylogenetic multiple imputations for traits related to body length, body mass, activity time, and microhabitat; and build the figures used in the main text and supplementary material. The original code was executed on a private cloud-processing service, which may limit is reproducibility by external users. However, the code provided here has been adapted to ensure reproducibility on local machines, using a subset of the data. The complete TetrapodTraits database is available at 10.5281/zenodo.10530617.

The modelling framework was built using the directory structure informed in the README.pdf file. The R-code provided have steps designed to replicate the directory structure as informed, but understanding it is a good starting point to navigate the outputs produced.

File content

Datasets.zip: this zip file represents the "Datasets" folder directory, as illustrated in the README.pdf. This zip file contains 17 CSV files, encompassing outputs at the genus- and family-level (measures of trait completeness, relative change in average trait values per taxon), assemblage-level outputs (measures of richness, trait completeness, and relative change in average trait values per 110 x 110 km grid cell), tuned hyperparameters of XGBoost models, summary statistics of model performance across the 10,000 multiple phylogenetic imputations computed, and a table with data source and respective reference type. All csv files are loaded from within the R scripts provided.

PhylogenySets.zip: this zip file represents the "PhylogenySets" folder directory. It includes subsets of 100 fully-sampled phylogenetic tree published for all five major tetrapod groups (Amphibians, Chelonians and CrocrodiliansSquamates, Birds, and Mammals) in the Nexus (.nex) format. All trees were extracted from VertLife, using the Phylogeny subsets tool.

PhylogeneticFilters.zip: this zip file represents the "PhylogeneticFilters" folder directory, as illustrated in the README.pdf. The phylogenetic filters were derived from the fully-sampled phylogenies available for tetrapod species in VertLife. For the computations of the TetrapodTraits database, we derived phylogenetic filters across a subset of 100 phylogenies for each major tetrapod group. The complete set of phylogenetic filters for each tree is available in the RData format, with the full set across the 500 trees encompassing more than 350GB. Due to storage limitations, we provide phylogenetic filters for 10 trees per major tetrapod group. If you are interested in additional sets, please contact the corresponding author.

Shapefiles.zip: this zip file represents the "Shapefiles" folder directory, as illustrated in the README.pdf. This zip file contains two shapefiles: (i) gridcells_110km.shp represents an equal area grid cell shapefiles at 110 km of spatial resolution; and (ii) wwf_realms.shp represents the major terrestrial biogeographic realms (derived from https://ecoregions.appspot.com). Please note that due to the limitation on the number of characters in shapefile field names, the field names of gridcells_110km.shp are displayed as ("Cl_I110", "Long", "Lat", "WWF_Rlm", "PrpLndA"). The R code that relies on this file will immediately rename the original field names to ("Cell_Id110", "Long", "Lat", "WWF_Realm", "PropLandArea") to match the terminology used to refer to the 110 x 110 km grid cells.

R-scripts: three files provided.

  1. Moura_et_al_TetrapodTraits_Script1_PhylogeneticMultipleImputations.R: this script computes the phylogenetic filters, the grid search procedure to tune XGBoost hyperparameters, and conducts phylogenetic multiple imputations for traits related to body length, body mass, activity time, and microhabitat.
  2. Moura_et_al_TetrapodTraits_Script2_DataExploration.R: this script extracts patterns of trait completeness and biases in missing values at the level of genus-, family- and assemblages (110 x 110 km grid cells), and computes co-occurrence patterns of data missingness across clades.
  3. Moura_et_al_TetrapodTraits_Script3_Figures.R: this script builds the figures reported in the main text and supplementary material.

Tetrapoda_prunedToFamily.tre: a family-level tree for tetrapod species derived from the combination of the available global phylogenies for major tetrapod groups. The R function tree.merger of the RRphylo package was used to combine a single fully-sampled phylogeny of amphibians, turtles and crocs, squamates, birds, and mammals. This file should be stored in the Working Directory.

TetrapodTraits_datasample.csv: a subset of the TetrapodTraits database including 1,000 mammal species. This file should be stored in the Working Directory.

External files

To fully execute scripts 2 and 3, two additional files are necessary: (i) TetrapodTraits_1.0.0.csv and (ii) Tetrapod_360.csv. These files represent the TetrapodTraits database and the respective spatial intersections of species' geographic range maps across the 110 x 110 km grid cells. Both files are availabe in 10.5281/zenodo.10530618.

Funding

São Paulo Research Foundation (FAPESP) for grants supporting MRM (#2021/11840-6 and #2022/12231-6), LFT (#2016/25358-3), KC (#2020/12558-0), and RZC (#2022/15247-0); Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) for the fellowship to JJMG;  Conselho Nacional de Desenvolvimento Científico - CNPq for research grants in support of FPW (#311504/2020-5) and LFT (#302834/2020-6); U.S. National Science Foundation (NSF) for grants supporting RAP (DEB-1441719), RCKB (DEB-1441652), and WJ (DEB-1441737 and DEB-1441719). WJ also acknowledges support from NASA grants 80NSSC17K0282 and 80NSSC18K0435; and the E.O. Wilson Biodiversity Foundation.

Citation

Moura, M.R., Ceron, K., Guedes, J.J.M., Chen-Zhao, R., Sica, Y.V., Hart, J., Dorman, W., Gonzalez-del-Pliego, P., Ranipeta, A., Catenazzi, A.., Werneck, F.P., Toledo, L.F., Upham, N.S., Tonini, J.F.R., Colston, T.J., Guralnick, R., Bowie, R.C.K., Pyron, R.A., Jetz, W. A phylogeny-informed characterization of global tetrapod traits addresses data gaps and biases. BioRXiv, 2024, doi: 10.1101/2023.03.04.531098v3

Correspondence to: mariormoura@gmail.com

Files

README.pdf

Files (37.9 GB)

Name Size Download all
md5:79edfc52903b1d30f9638d84e90f0562
16.9 MB Preview Download
md5:ab79a56cfcb8dfbf665c3a5e2773115e
103.5 kB Download
md5:025d9e2e9600dd8b67989f1cc11d4133
56.7 kB Download
md5:6e924d7bec2263c0833f1d5d5201adf5
219.4 kB Download
md5:c2a3d88c74808821c6272145c49a3098
7.0 GB Preview Download
md5:a0c612594b136e7d4aa21613179c65b8
13.5 GB Preview Download
md5:871833c2551853429900b884cad08f05
4.7 GB Preview Download
md5:7909e655c47cebd52b044a8622da1f0a
12.7 GB Preview Download
md5:8bacd0c6e0fff324e48148e7d1b0e4ec
19.2 MB Preview Download
md5:5d62516e75820a8d83a532b33b898ee7
63.6 MB Preview Download
md5:3af1c17e3ae397f12fbfbb5f4d40b42b
162.9 kB Preview Download
md5:d52c945b783414437cbc123f0a0f665b
14.8 MB Preview Download
md5:20d8f6061332d61c005a428c6722238c
19.7 kB Download
md5:ac343d7b607acab736da086546d72b85
1.0 MB Preview Download

Additional details

Related works

Is derived from
Preprint: 10.1101/2023.03.04.531098v3 (DOI)
Is part of
Dataset: 10.5281/zenodo.10530617 (DOI)