There is a newer version of the record available.

Published January 30, 2024 | Version v1
Software documentation Open

Data from: A phylogeny-informed characterisation of global tetrapod traits addresses data gaps and biases

  • 1. ROR icon Universidade Estadual de Campinas
  • 2. ROR icon Universidade Federal de Goiás
  • 3. ROR icon Yale University
  • 4. ROR icon State University of New York
  • 5. ROR icon University of Illinois Urbana-Champaign
  • 6. Universidade de Évora
  • 7. ROR icon Florida International University
  • 8. ROR icon National Institute of Amazonian Research
  • 9. ROR icon Arizona State University
  • 10. ROR icon University of Richmond
  • 11. ROR icon University of Puerto Rico-Mayaguez
  • 12. ROR icon University of Florida
  • 13. ROR icon University of California, Berkeley
  • 14. ROR icon George Washington University

Description

Description

Raw data and R-code for the research approach on phylogenetic multiple imputation of natural history traits for tetrapod species. In brief, the code reproduces the grid search procedure to tune XGBoost hyperparameters; compute phylogenetic multiple imputations for traits related to body length, body mass, activity time, and microhabitat; and build the figures used in the main text and supplementary material. The original code was executed on a private cloud-processing service, which may limit is reproducibility by external users. However, the code provided here has been adapted to ensure reproducibility on local machines, using a subset of the data.

The modelling framework was built using the directory structure informed in the README.pdf file. The R-code provided have steps designed to replicate the directory structure as informed, but understanding it is a good starting point to navigate the outputs produced.

File content

Datasets.zip: this zip file represents the "Datasets" folder directory, as illustrated in the README.pdf. This zip file contains 14 CSV files, encompassing outputs at the genus- and family-level (measures of trait completeness, relative change in average trait values per taxon), assemblage-level outputs (measures of richness, trait completeness, and relative change in average trait values per 110 x 110 km grid cell), tuned hyperparameters of XGBoost models, and summary statistics of model performance across the 10,000 multiple phylogenetic imputations computed. All csv files are loaded from within the R scripts provided.

PhylogeneticFilters.zip: this zip file represents the "PhylogeneticFilters" folder directory, as illustrated in the README.pdf. The phylogenetic filters were derived from the fully-sampled phylogenies available for tetrapod species in VertLife. For the computations of the TetrapodTraits database, we derived phylogenetic filters across a subset of 100 phylogenies for each major tetrapod group. The complete set of phylogenetic filters for each tree is available in the RData format, with the full set across the 500 trees encompassing more than 350GB. To address storage limitations in the digital data repository and ensure reproducibility, we provide phylogenetic filters for three mammalian trees. If you are interested in additional sets, please contact the corresponding author.

Shapefiles.zip: this zip file represents the "Shapefiles" folder directory, as illustrated in the README.pdf. This zip file contains two shapefiles: (i) gridcells_110km.shp represents an equal area grid cell shapefiles at 110 km of spatial resolution; and (ii) wwf_realms.shp represents the major terrestrial biogeographic realms (derived from https://ecoregions.appspot.com).

R-scripts: three files provided.

  1. Moura_et_al_TetrapodTraits_Script1_PhylogeneticMultipleImputations.R: this script computes the grid search procedure to tune XGBoost hyperparameters and conducts phylogenetic multiple imputations for traits related to body length, body mass, activity time, and microhabitat.
  2. Moura_et_al_TetrapodTraits_Script2_DataExploration.R: this script extracts patterns of trait completeness and biases in missing values at the level of genus-, family- and assemblages (110 x 110 km grid cells), and computes co-occurrence patterns of data missingness across clades.
  3. Moura_et_al_TetrapodTraits_Script3_Figures.R: this script builds the figures reported in the main text and supplementary material.

Tetrapoda_prunedToFamily.tre: a family-level tree for tetrapod species derived from the combination of the available global phylogenies for major tetrapod groups. The R function tree.merger of the RRphylo package was used to combine a single fully-sampled phylogeny of amphibians, turtles and crocs, squamates, birds, and mammals. This file should be stored in the Working Directory.

TetrapodTraits_datasample.csv: a subset of the TetrapodTraits database including 1,000 mammal species. This file should be stored in the Working Directory.

External files

To fully execute scripts 2 and 3, two additional files are necessary: (i) TetrapodTraits_1.0.0.csv and (ii) Tetrapod_360.csv. These files represent the TetrapodTraits database and the respective spatial intersections of species' geographic range maps across the 110 x 110 km grid cells. For instructions on downloading these files, please refer to the Data Availability section in Moura et al. (2024); note that a temporary embargo may still be in place.

Funding

São Paulo Research Foundation (FAPESP) for grants supporting MRM (#2021/11840-6 and #2022/12231-6), LFT (#2016/25358-3), KC (#2020/12558-0), and RZC (#2022/15247-0); Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) for the fellowship to JJMG;  Conselho Nacional de Desenvolvimento Científico - CNPq for research grants in support of FPW (#311504/2020-5) and LFT (#302834/2020-6); U.S. National Science Foundation (NSF) for grants supporting RAP (DEB-1441719), RCKB (DEB-1441652), and WJ (DEB-1441737 and DEB-1441719). WJ also acknowledges support from NASA grants 80NSSC17K0282 and 80NSSC18K0435.

Citation

Moura, M.R., Ceron, K., Guedes, J.J.M., Chen-Zhao, R., Sica, Y.V., Hart, J., Dorman, W., Gonzalez-del-Pliego, P., Ranipeta, A., Catenazzi, A.., Werneck, F.P., Toledo, L.F., Upham, N.S., Tonini, J.F.R., Colston, T.J., Guralnick, R., Bowie, R.C.K., Pyron, R.A., Jetz, W., A phylogeny-informed characterization of global tetrapod traits addresses data gaps and biases. BioRXiv, 2024, doi: 10.1101/2023.03.04.531098v3

Correspondence to: mariormoura@gmail.com

Files

README.pdf

Files (1.4 GB)

Name Size Download all
md5:91db06acfdbe70aae048029b0573b951
16.8 MB Preview Download
md5:0ffe57caf10293ace9c637e0d9e38509
98.4 kB Download
md5:f0a0bfd1bff89c3f4317cc2f35d664a5
56.7 kB Download
md5:b14748f7d80de3fd4208880eae69d20a
214.1 kB Download
md5:18acac0deef6439e4dcd041d472b59fc
1.4 GB Preview Download
md5:2312f775267512fd2ee8d522a3b0621d
159.1 kB Preview Download
md5:0d53b2d62ade8c6b6ab0d4d382a25ecb
14.7 MB Preview Download
md5:20d8f6061332d61c005a428c6722238c
19.7 kB Download
md5:ac343d7b607acab736da086546d72b85
1.0 MB Preview Download

Additional details

Related works

Is derived from
Preprint: 10.1101/2023.03.04.531098v3 (DOI)