Date: 25 August, 2024

Dataset Title: Latitudinal gradients in host-associated microbial diversity and interaction types

Dataset Creators: Iris A Holmes, José G Martínez-Fonseca, Rudolf von May, Briana A Sealey, Peter A Cerda, Maggie R Grundler, Erin P Westeen, Daniel Nondorf, Joanna G Larson, Christopher R Myers, Tory A Hendry

Dataset Contact: Iris Holmes, iah6@cornell.edu


Research Overview: In this project, we compared cloacal microbiome communities from reptiles from six localities that spanned a gradient in host taxonomic diversity. We found that diversity of the bacterial communities was relatively stable across locations, but that host-specific taxa were overrepresented in high host diversity communities. ASVs in high host diversity communities were more likely to have strong interactions with other bacterial lineages across multiple hosts. Host generalist bacteria were more likely to be from a generalist phylum.

Methodology: We used a 16S rRNA metabarcoding approach to quantify bacterial community diversity. We processed the output fastq files using qiime2, and processed the output files using the qiime2R package. We performed our analyses in R. Processed input files and scripts are described below.  


This upload contains:

Processed data files:
A) Table S1: table of hosts and associated metadata with the following columns: 1) sample name, 2) UMMZ voucher accession number, where available. Voucher specimens accessioned by the Museo de San Marcos are denoted by 'MUSM', 3) locality name, 4) host order, 5) host family, 6) host genus, 7) host species, 8) sample latitude (where available), 9) sample longitude (where available), 10) sampling year, 11) sampling month, 12) sampling day (where available)
2) lg_tab.txt: a table of hosts (rows) and ASVs (columns) populated by read counts
3) lg_asv.tre: phylogenetic tree of ASVs generated by 'sepp' insertion tree in qiime2
4) lg_host.tre: phylogenetic tree of hosts downsampled from Tonini et al. 2016
5) lat.grad.tax.match.csv: matching of host species in our dataset to nearest neighbor representative species in the Tonini tree
6) six files titled [locality]_hostxotu, containing standardized host and ASV matrices for each locality. Populated by read counts. This is the consistent subsampled dataset used in our diversity analyses


B) Script 1 (lg_diversity_ISME.R): contains analyses for the "Microbial diversity and host specialization across sites" section of the paper. Generates figures S1-S4, host richness data for Table 1, and figures 1-4.

Sections include:
- input data processing, including generation of the subsampled data and generation of Figure S1
- plotting parameters, including locality and bacteria phylum colors
- generating consistent subsampled datasets ([locality]_hostxotu files)
- making Figure S2, the phylum breakdown of the subsampled dataset ASVs
- making Figure S3 (host diversity map) and the host diversity portion of Table 1
- getting the sampled host taxonomic diversity for Table 1
- creating the function to get host taxonomic clustering
- making Figure 1 and Figure S4, showing prevalence vs. host specialization
- doing the PERMANOVA analysis and making Figure 2
- making Figure 3, species accumulation curves
- generating Shannon diversity metrics, Faith's phylogenetic diversity metrics, and host phylogenetic clustering metrics, and comparing values across localities. Making Figure 4
- writing out tables S2 and S3 (per locality and locality pairwise comparison results)

C) Script 2: (lg_interactions_ISME.R): contains analyses for the "Microbial interactions and host specialization" portion of the paper. Generates figures 5-7.

Sections include:
- input data processing and plotting parameters
- creating a second function to get host taxonomic clustering, adapted to the bipartite module format
- getting cooccurrence results for 50 subsampled datasets for each locality, along with matched null datasets
- processing outputs to find the mean numbers of positive and negatively cooccurring pairs, the prevalence of the ASVs in the interacting pairs, the phylogenetic distance between the interacting pairs, and the phylogenetic clustering of the hosts occupied by both members of the pair. These analyses are also applied to the null datasets and the prevalence-matched datasets for the host and ASV phylogenetic analyses. 
- statistical tests comparing values across localities for the real data, and between real and null datasets within localities
- Code for Figure 5 (number of pairs and ASV prevalence)
- Code for Figure 6 (host and ASV phylogenetic clustering)
- get phyla for ASVs in interacting pairs, test for excess of Proteobacteria-Proteobacteria pairs
- Code for Figure 7 (comparison of numbers of pairs in different phylogenetic categories)
- code to get bipartite modules
- processing outputs to get number of modules, number of ASVs in modules, and number of hosts in modules (real and null data) and statistics to compare real to null datasets and across localities in real data
- processing outputs to get ASV phylogenetic distance and host phylogenetic clustering and statistics to compare real to null datasets and across localities in real data
- writing out tables S2 and S3 (per locality and locality pairwise comparison results)