Summary of Dryad submission for: Bolnick et al 2014 ÒMajor Histocompatibility Complex class IIb polymorphism influences gut microbiota composition and diversityÓ Molecular Ecology The following data files are provided in this dryad submission: Filename: cedar_prot_dist_nostops.csv Description: A .csv file containing a matrix of pairwise protein distances between MHC alleles. Columns and rows are identified by MHC protein motifs based on MHC class II beta exon II sequence (unique motifs identified by the letter P followed by a number). Elements of the matrix represent the pairwise difference between two motifsÕ amino acid sequences. Motifs with stop codons were removed in advance and are not considered here. Filename: dna2prot.csv Description: A .csv file containing the translation between unique MHC DNA sequence identifier (ÔalleleÕ) and the unique protein sequence (ÔprotIDÕ). For example, MHC allele M195 corresponds to motif P2. Filename: microbiome_MHC_AA_data_nostops.csv Description: A .csv file containing MHC protein motif presence/absence for each individual fish. Colums are: Nreads: Number of 454 sequence reads per individual fish Fish_id: the identifier for the individual fish NDNAseq: the number of unique MHC DNA sequences identified by STC Nmotifs: the number of unique protein motif sequences identified Columns labeled 173 through 95571: columns indicating the presence/absence of particular MHC DNA sequences in individual fish. 0 indicates absence, 1 indicates presence Columns labeled P1 through P125: columns indicating the presence/absence of particular MHC amino acid sequence motifs in individual fish. 0 indicates absence, 1 indicates presence N_aa: Number of unique amino acide sequences avg_dis: mean pairwise protein sequence divergence among all motifs within an individual fish. Mean pairwise protein divergences are calculated from the file cedar_prot_dist_nostops.csv Filename: TAs.aln Description: An alignment of all MHC DNA sequences inferred by STC to be present in stickleback from this sample. Filename: TAs_prot.aln Description: An alignment of all MHC amino acid motif sequences inferred by STC to be present in stickleback from this sample. Filename: Wildstickle_chao1.txt Description: Rarefaction curves for microbial alpha diversity, as measured by the metric chao1, for each individual fish (columns D onwards) at various levels of rarefaction. Columns include: sequences_per_sample: The number of sequence reads subsampled per individual fish for rarefaction to control for sequence depth. This ranges from 10 up to 10,000. Iteration: for each individual fish, we randomly drew a given number of sequences per sample for 10 replicate rarefaction events. Iteration simply identifies which iteration of the sub-sampling is being used. Columns labeled Stickleback#.### indicate the individual fish being examined. The first integer indicates the library prep a given fish was in. The numbers after the decimal indicate the individual fish ID number, to match to MHC and metadata. Samples with insufficient sequencing depth to allow a high rarefaction count are given ÔNAÕ values for those high rarefaction depths. Numbers under a given column indicate that individual fishÕs microbial alpha diversity at a given rarefaction depth, for a given iteration. Filename: Wildstickle_metadata.txt Description: The meta-data table (tab delimited text) for gut microbiota samples of Cedar Lake stickleback. Columns include: SampleID: Stickleback#.### indicates the individual fish being examined. The first integer indicates the library prep a given fish was in. The numbers after the decimal indicate the individual fish ID number, to match to MHC and metadata. ID: The ID number after the decimal in SampleID Barcode: the unique barcode used for each individual during library prep for sequencing the gut microbiota Strip: The plate number used in library preparation for sequencing gut microbiota Tube: The tube used in pooling library preparation for sequencing gut microbiota Mass: fish mass in grams SL: Standard Length (mm) from upper mandible to the end of the spinal cord. Nitrogen: d15N stable isotope ratio as a measure of diet Carbon: d13C stable isotope ratio as a measure of diet Bodywidth: Body width in mm, measured at the operculum Sex: M / F indicates male/female Openbuccal: Length of the buccal cavity (mm) Hyoidlength: length of the hyoid (mm) Gapewidth: Width of the gape, closed (mm) GRN: number of gill rakers GRL: gill raker length (ocular micrometer units, used only with correlational PCA so units are irrelevant for our analyses) lnMass: log Mass (see above) lnSL: log Standard length (see above) lnBWL: log body width (see above) lnBuccal: log buccal length (see above) lnHL: log hyoid length (see above) lnGW: log gape width (see above) lnRL: log raker length (see above) mass_resid: residuals of log mass regressed on log SL with sex as a factor and a sex*SL interaction. BW_resid: residuals of log buccal width regressed on log SL HL_resid: residuals of log hyoid length regressed on log SL GW_resid: residuals of log gape width regressed on log SL RL_resid: residuals of log raker length regressed on log SL Description: species identity. Filename: Wildstickle_PD_whole_tree.txt Description: Rarefaction curves for microbial phylogenetically weighted alpha diversity (PD), for each individual fish (columns D onwards) at various levels of rarefaction. Columns include: sequences_per_sample: The number of sequence reads subsampled per individual fish for rarefaction to control for sequence depth. This ranges from 10 up to 10,000. Iteration: for each individual fish, we randomly drew a given number of sequences per sample for 10 replicate rarefaction events. Iteration simply identifies which iteration of the sub-sampling is being used. Columns labeled Stickleback#.### indicate the individual fish being examined. The first integer indicates the library prep a given fish was in. The numbers after the decimal indicate the individual fish ID number, to match to MHC and metadata. Samples with insufficient sequencing depth to allow a high rarefaction count are given ÔNAÕ values for those high rarefaction depths. Numbers under a given column indicate that individual fishÕs microbial alpha diversity at a given rarefaction depth, for a given iteration. Filename: Wildstickle_richness.txt Description: Rarefaction curves for microbial OTU richness (number of OTUs per fish), for each individual fish (columns D onwards) at various levels of rarefaction. Columns include: sequences_per_sample: The number of sequence reads subsampled per individual fish for rarefaction to control for sequence depth. This ranges from 10 up to 10,000. Iteration: for each individual fish, we randomly drew a given number of sequences per sample for 10 replicate rarefaction events. Iteration simply identifies which iteration of the sub-sampling is being used. Columns labeled Stickleback#.### indicate the individual fish being examined. The first integer indicates the library prep a given fish was in. The numbers after the decimal indicate the individual fish ID number, to match to MHC and metadata. Samples with insufficient sequencing depth to allow a high rarefaction count are given ÔNAÕ values for those high rarefaction depths. Numbers under a given column indicate that individual fishÕs microbial alpha diversity at a given rarefaction depth, for a given iteration. Filename: Wildstickle_uwPCoAEigen.txt Description: A tab delimited text file containing the eigen values (row named eigvals), and percent variation explained for individual PCoA axes (row named precvar), and the cumulative percent variation explained for PCoA axes 1 through I (row named cumulativevar). These data are provided for unweighted PCoA axies 1 through 183. Filename: Wildstickleotus.txt Description: A tab delimited text file containing the counts of individual microbial OTUs (rows) for individual fish (columns). Microbial OTUs taxonomic affiliations are listed in the column labeled ÔtaxonomyÕ. Individual fish IDs (columns 2 onwards) are formatted to list the species, (Stickleback) followed by a number (1 or 2) indicating the library, followed by a decimal and another number. The numbers after the decimal indicate the fish ID that matches the MHC and other filesÕ Fish identification columns. Entries in each cell correspond to the number of DNA sequence reads obtained that have at least 97% similarity to the OTU, using closed-reference OTU picking as described in the paperÕs methods. Filename: WildstickleuwPCoAscores.txt Description: A tab delimited text file containing the unweighted PCoA scores obtained from UniFrac distances of stickleback gut microbiota from Cedar Lake. Columns include the individual FishID, and PCoA axes 1 through 183. Individual fish IDs in the FishID column are formatted to list the species, (Stickleback) followed by a number (1 or 2) indicating the library, followed by a decimal and another number. The numbers after the decimal indicate the fish ID that matches the MHC and other filesÕ Fish identification columns. Entries in each cell represent that fishÕs PCoA score for that PCoA axis. Note that there are PCoA scores for more fish than were used in this study, as only a subset of the sample were used for MHC genotyping, but we provide all samples for context. Filename: WildsticklewPCoAEigen.txt Description: A tab delimited text file containing the eigen values (row named eigvals), and percent variation explained for individual PCoA axes (row named precvar), and the cumulative percent variation explained for PCoA axes 1 through I (row named cumulativevar). These data are provided for weighted PCoA axies 1 through 183. Note that unlike the file Wildstickle_uwPCoAEigen.txt, this contains weighted PCoA as opposed to unweighted PCoA eigenvalues. Filename: WildsticklewPCoAscores.txt Description: A tab delimited text file containing the weighted PCoA scores obtained from UniFrac distances of stickleback gut microbiota from Cedar Lake. Note that unlike the file WildstickleuwPCoAscores.txt , this contains weighted PCoA as opposed to unweighted PCoA scores. Columns include the individual FishID, and PCoA axes 1 through 183. Individual fish IDs in the FishID column are formatted to list the species, (Stickleback) followed by a number (1 or 2) indicating the library, followed by a decimal and another number. The numbers after the decimal indicate the fish ID that matches the MHC and other filesÕ Fish identification columns. Entries in each cell represent that fishÕs PCoA score for that PCoA axis. Note that there are PCoA scores for more fish than were used in this study, as only a subset of the sample were used for MHC genotyping, but we provide all samples for context. Associated datasets stored elsewhere: The raw sequence data for gut microbiota sequencing, and host metadata, can be accessed via accession numbers ERP006029 and ERP006030 at the European Bioinformatics Institute, http://www.ebi.ac.uk. Whom to contact with questions: Dr. Daniel Bolnick Department of Integrative Biology University of Texas at Austin Austin TX 78712, USA danbolnick@utexas.edu 512-471-2824