Data from: A phylogeny-informed characterisation of global tetrapod traits addresses data gaps and biases
Creators
-
Moura, Mario R.1
-
Ceron, Karoline2
-
Guedes, Jhonny J. M.3
-
Chen-Zhao, Rosana1
-
Sica, Yanina4
-
Hart, Julie5
-
Dorman, Wendy6
-
Portmann, Julia M.4
-
Gonzalez-del-Pliego, Pamela7
-
Ranipeta, Ajay4
-
Catenazzi, Alessandro8
-
Werneck, Fernanda9
-
Toledo, Luis Felipe1
-
Upham, Nathan10
-
Tonini, Joao F. R.11
-
Colston, Timothy J.12
-
Guralnick, Robert13
-
Bowie, Rauri C. K.14
-
Pyron, R. Alexander15
-
Jetz, Walter4
-
1.
Universidade Estadual de Campinas
-
2.
Universidade Federal do Ceará
-
3.
Universidade Federal de Goiás
-
4.
Yale University
-
5.
State University of New York
-
6.
University of Illinois Urbana-Champaign
- 7. Universidade de Évora
-
8.
Florida International University
-
9.
National Institute of Amazonian Research
-
10.
Arizona State University
-
11.
University of Richmond
-
12.
University of Puerto Rico-Mayaguez
-
13.
University of Florida
-
14.
University of California, Berkeley
-
15.
George Washington University
Description
Description
Raw data and R-code for the research approach on phylogenetic multiple imputation of natural history traits for tetrapod species. In brief, the code computes the phylogenetic filters, reproduces the grid search procedure to tune XGBoost hyperparameters; compute phylogenetic multiple imputations for traits related to body length, body mass, activity time, and microhabitat; and build the figures used in the main text and supplementary material. The original code was executed on a private cloud-processing service, which may limit is reproducibility by external users. However, the code provided here has been adapted to ensure reproducibility on local machines, using a subset of the data. The complete TetrapodTraits database is available at 10.5281/zenodo.10530617.
The modelling framework was built using the directory structure informed in the README.pdf file. The R-code provided have steps designed to replicate the directory structure as informed, but understanding it is a good starting point to navigate the outputs produced.
File content
Datasets.zip: this zip file represents the "Datasets" folder directory, as illustrated in the README.pdf. This zip file contains 17 CSV files, encompassing outputs at the genus- and family-level (measures of trait completeness, relative change in average trait values per taxon), assemblage-level outputs (measures of richness, trait completeness, and relative change in average trait values per 110 x 110 km grid cell), tuned hyperparameters of XGBoost models, summary statistics of model performance across the 10,000 multiple phylogenetic imputations computed, and a table with data source and respective reference type. All csv files are loaded from within the R scripts provided.
PhylogenySets.zip: this zip file represents the "PhylogenySets" folder directory. It includes subsets of 100 fully-sampled phylogenetic tree published for all five major tetrapod groups (Amphibians, Chelonians and Crocrodilians, Squamates, Birds, and Mammals) in the Nexus (.nex) format. All trees were extracted from VertLife, using the Phylogeny subsets tool.
PhylogeneticFilters.zip: this zip file represents the "PhylogeneticFilters" folder directory, as illustrated in the README.pdf. The phylogenetic filters were derived from the fully-sampled phylogenies available for tetrapod species in VertLife. For the computations of the TetrapodTraits database, we derived phylogenetic filters across a subset of 100 phylogenies for each major tetrapod group. The complete set of phylogenetic filters for each tree is available in the RData format, with the full set across the 500 trees encompassing more than 350GB. Due to storage limitations, we provide phylogenetic filters for 10 trees per major tetrapod group. If you are interested in additional sets, please contact the corresponding author.
Shapefiles.zip: this zip file represents the "Shapefiles" folder directory, as illustrated in the README.pdf. This zip file contains two shapefiles: (i) gridcells_110km.shp represents an equal area grid cell shapefiles at 110 km of spatial resolution; and (ii) wwf_realms.shp represents the major terrestrial biogeographic realms (derived from https://ecoregions.appspot.com). Please note that due to the limitation on the number of characters in shapefile field names, the field names of gridcells_110km.shp are displayed as ("Cl_I110", "Long", "Lat", "WWF_Rlm", "PrpLndA"). The R code that relies on this file will immediately rename the original field names to ("Cell_Id110", "Long", "Lat", "WWF_Realm", "PropLandArea") to match the terminology used to refer to the 110 x 110 km grid cells.
R-scripts: three files provided.
- Moura_et_al_TetrapodTraits_Script1_PhylogeneticMultipleImputations.R: this script computes the phylogenetic filters, the grid search procedure to tune XGBoost hyperparameters, and conducts phylogenetic multiple imputations for traits related to body length, body mass, activity time, and microhabitat.
- Moura_et_al_TetrapodTraits_Script2_DataExploration.R: this script extracts patterns of trait completeness and biases in missing values at the level of genus-, family- and assemblages (110 x 110 km grid cells), and computes co-occurrence patterns of data missingness across clades.
- Moura_et_al_TetrapodTraits_Script3_Figures.R: this script builds the figures reported in the main text and supplementary material.
Tetrapoda_prunedToFamily.tre: a family-level tree for tetrapod species derived from the combination of the available global phylogenies for major tetrapod groups. The R function tree.merger of the RRphylo package was used to combine a single fully-sampled phylogeny of amphibians, turtles and crocs, squamates, birds, and mammals. This file should be stored in the Working Directory.
TetrapodTraits_datasample.csv: a subset of the TetrapodTraits database including 1,000 mammal species. This file should be stored in the Working Directory.
External files
To fully execute scripts 2 and 3, two additional files are necessary: (i) TetrapodTraits_1.0.0.csv and (ii) Tetrapod_360.csv. These files represent the TetrapodTraits database and the respective spatial intersections of species' geographic range maps across the 110 x 110 km grid cells. Both files are availabe in 10.5281/zenodo.10530618.
Funding
São Paulo Research Foundation (FAPESP) for grants supporting MRM (#2021/11840-6 and #2022/12231-6), LFT (#2016/25358-3), KC (#2020/12558-0), and RZC (#2022/15247-0); Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) for the fellowship to JJMG; Conselho Nacional de Desenvolvimento Científico - CNPq for research grants in support of FPW (#311504/2020-5) and LFT (#302834/2020-6); U.S. National Science Foundation (NSF) for grants supporting RAP (DEB-1441719), RCKB (DEB-1441652), and WJ (DEB-1441737 and DEB-1441719). WJ also acknowledges support from NASA grants 80NSSC17K0282 and 80NSSC18K0435; and the E.O. Wilson Biodiversity Foundation.
Citation
Moura, M.R., Ceron, K., Guedes, J.J.M., Chen-Zhao, R., Sica, Y.V., Hart, J., Dorman, W., Gonzalez-del-Pliego, P., Ranipeta, A., Catenazzi, A.., Werneck, F.P., Toledo, L.F., Upham, N.S., Tonini, J.F.R., Colston, T.J., Guralnick, R., Bowie, R.C.K., Pyron, R.A., Jetz, W. A phylogeny-informed characterization of global tetrapod traits addresses data gaps and biases. BioRXiv, 2024, doi: 10.1101/2023.03.04.531098v3
Correspondence to: mariormoura@gmail.com
Files
README.pdf
Files
(37.9 GB)
Name | Size | Download all |
---|---|---|
md5:79edfc52903b1d30f9638d84e90f0562
|
16.9 MB | Preview Download |
md5:ab79a56cfcb8dfbf665c3a5e2773115e
|
103.5 kB | Download |
md5:025d9e2e9600dd8b67989f1cc11d4133
|
56.7 kB | Download |
md5:6e924d7bec2263c0833f1d5d5201adf5
|
219.4 kB | Download |
md5:c2a3d88c74808821c6272145c49a3098
|
7.0 GB | Preview Download |
md5:a0c612594b136e7d4aa21613179c65b8
|
13.5 GB | Preview Download |
md5:871833c2551853429900b884cad08f05
|
4.7 GB | Preview Download |
md5:7909e655c47cebd52b044a8622da1f0a
|
12.7 GB | Preview Download |
md5:8bacd0c6e0fff324e48148e7d1b0e4ec
|
19.2 MB | Preview Download |
md5:5d62516e75820a8d83a532b33b898ee7
|
63.6 MB | Preview Download |
md5:3af1c17e3ae397f12fbfbb5f4d40b42b
|
162.9 kB | Preview Download |
md5:d52c945b783414437cbc123f0a0f665b
|
14.8 MB | Preview Download |
md5:20d8f6061332d61c005a428c6722238c
|
19.7 kB | Download |
md5:ac343d7b607acab736da086546d72b85
|
1.0 MB | Preview Download |
Additional details
Related works
- Is derived from
- Preprint: 10.1101/2023.03.04.531098v3 (DOI)
- Is part of
- Dataset: 10.5281/zenodo.10530617 (DOI)