Molecular exploration of fossil eggshell uncovers hidden lineage of giant extinct bird ________ ________ ________ ________ ________ ________ ________ ________ ________ ________ #Supplementary Data 10 Genetic_data_minus_filtered_reads: Filtered reads can be found on NCBI's Short Read Archive. The remaining folders contain the reference used for mapping, mapping files, reconstructed mitochondrial genomes, mitochondrial alignments used for phylogenetic analysis, phylogenetic analyses, and molecular dating analyses. > 1_Filtered_reads_[See SRA]: This is a placeholder folder to indicate that filtered reads for each eggshell specimen (by MB#; see publication for metadata) have been deposited on NCBI's Short Read Archive under the BioProject PRJNA880433 and the Submission number SUB12049986. > 2a_Reference: This folder contains the reference mitochondrial genome (.fasta) used to map reads to from other specimens in order to reconstruction their mitochondrial genomes. > 2b_Mapping_files: This folder contains the mapping files from mapping filtered reads to the reference mitochondrial genome. > *.bam: the mapping file from the first round of mapping to the reference mitochondrial genome (see folder 2a_Reference) > *.fasta: the consensus sequence from the first round of mapping > *_sorted.bam: the mapping file after sorting reads using Picard > *_nodupes.bam: the mapping file after removing duplicates using Picard > *_metrics.txt: the output metrics file from Picard after removing duplicates > 3_Mitochondrial_genomes_[See_Also_GenBank] > *_MT_CONSENSUS_6OCT17.fasta: the final reconstructed mitochondrial genome for each eggshell specimen (by AD#; see publication for metadata). Annotated versions are also available on GenBank under the accession numbers OP413790-OP413810. > Excluded_samples: samples with very poor mitochondrial recovery (low-quality/high-missingness genomes) which were not used for downstream analyses > 4_Alignments: DNA sequence alignments used to reconstruct phylogeny. Note that the suffix gblocks indicates that unstable positions have been removed using gblocks (see publication). > 17tax_m1_nuclear.phy: alignment of nuclear protein coding genes for 17 taxa described in Grealy et al. 2017 partitioned by the codon (first codon position) > 17tax_m2_nuclear.phy: alignment of nuclear protein coding genes for 17 taxa described in Grealy et al. 2017 partitioned by the codon (second codon position) > 17tax_m3_nuclear.phy: alignment of nuclear protein coding genes for 17 taxa described in Grealy et al. 2017 partitioned by the codon (third codon position) > 35tax_all_partitions_MAY18_gblocks.phy: alignment of mitochondrial genomes for 35 taxa (all elephant birds, kiwis, and Casuariiformes) with all partitions concatenated together > 35tax_loops_MAY18_gblocks.phy: alignment of mitochondrial RNA genes for 35 taxa (all elephant birds, kiwis, and Casuariiformes) partitioned by loops > 35tax_m1_MAY18_gblocks.phy: alignment of mitochondrial protein coding genes for 35 taxa (all elephant birds, kiwis, and Casuariiformes) partitioned by first codon position > 35tax_m2_MAY18_gblocks.phy: alignment of mitochondrial protein coding genes for 35 taxa (all elephant birds, kiwis, and Casuariiformes) partitioned by second codon position > 35tax_m3_MAY18_gblocks.phy: alignment of mitochondrial protein coding genes for 35 taxa (all elephant birds, kiwis, and Casuariiformes) partitioned by third codon position > 35tax_stems_MAY18_gblocks.phy: alignment of mitochondrial RNA genes for 35 taxa (all elephant birds, kiwis, and Casuariiformes) partitioned by stems > 57tax_all_partitions_MAY18_gblocks.phy: alignment of mitochondrial genomes for all 57 taxa (all elephant birds, palaeognaths and neognaths) with all partitions concatenated together > 57tax_loops_MAY18_gblocks.phy: alignment of mitochondrial RNA genes for all 57 taxa (all elephant birds, palaeognaths and neognaths) partitioned by loops > 57tax_m1_MAY18_gblocks.phy: alignment of mitochondrial protein-coding genes for all 57 taxa (all elephant birds, palaeognaths and neognaths) partitioned by first codon position > 57tax_m2_MAY18_gblocks.phy: alignment of mitochondrial protein-coding genes for all 57 taxa (all elephant birds, palaeognaths and neognaths) partitioned by second codon position > 57tax_m3_MAY18_gblocks.phy: alignment of mitochondrial protein-coding genes for all 57 taxa (all elephant birds, palaeognaths and neognaths) partitioned by third codon position > 57tax_stems_MAY18_gblocks.phy: alignment of mitochondrial RNA genes for all 57 taxa (all elephant birds, palaeognaths and neognaths) partitioned by stems > 5_Phylogeny_generation > ModelTest: the output of ModelTest for each mitochondrial partition (RNA loops, RNA stems, codon 1, codon 2, codon 3) > 34tax_gblocks_oct17.nex: the input nexus file used for ModelTest. > *model.scores: output of Paup, input for ModelTest > *modelfit.log: ModelTest log file > *modeltest: ModelTest output > modelblock.nex: input model block to ModelTest > MrBayes: input and output files of interest are described below; the remainder are other files generated by the output of the MrBayes analysis. > infile.nex and 32tax_MAY18_gblocks.nex: input alignment file used for MrBayes. 32 taxa are included rather than 35 (one Casuariforme outgroup rather than four). > *32tax_MAY18_gblocks_EB2MT.con.tre: the consensus tree output from MrBayes > RaxML: input and output files of interest are described below; the remainder are other files generated by the output of the RaxML analysis. > *RAxML_bipartitions.35tax_MAY18_gblocks.tre: the output consensus tree from RAxML used to construct Figure 2 in the manuscript. > 35tax_MAY18_gblocks.phy: the input alignment file used for RAxML. > 35tax_nt_MAY18_part.txt: the input partition file used for RAxML. > 6_Molecular_dating > **37tax_m3ry_nuc_oct17_auto_TM_TRIMMED_TO_ONE_EB_PER_TAXON_FOR_ASR.trees: the molecular dated tree below, but trimmed to include only the best elephant bird specimen per clade (four) to use for ancestral state reconstruction. See Supplementary Figure 12I. > **37tax_oct17: molecular dating analysis using 37 taxa (the two best mitochondrial genomes of each elephant bird taxa) > **37tax_TM_correlated: input tree topology has tinamous and moas (TM) as the deepest among notopalaeognathae; autocorrelated rates model used > **Run1: molecular dating analysis (Matt Phillips) > **37tax_m3ry_nuc_oct17_auto_TM.trees: molecular dated tree used for Figure 2b of the manuscript; output of MCMCtree. Tree topology and dates were also used for ancestral state reconstruction (see Supplementary Figure 11 and 12III). > 37tax_m3ry_nuc_oct17_TM_tre: input tree topology to MCMCtree > 37tax_m3ry_nuc_oct17.phy: input alignment file > in(TM).BV: input .BV file to MCMCtree > mcmctree.ctl: input MCMCtree control file > Run2 and Run3: the same analysis as Run1 but repeated two more times by Alicia Grealy. Input files are as above except the initial input specified in the mcmctree.ctl file was usedata = 3 to generate an out.BV. This was then copied to the folder "part2" and renamed "in.BV" and the 'usedata' was changed to '2' in the mcmctree.ctl file to re-run MCMCtree--FigTree.tre is the resulting molecular dated tree. > 37tax_RH_correlated: input tree topology has rheas (RH) as the deepest among notopalaeognathae; autocorrelated rates model used (Matt Phillips). Files are described as above. > 37tax_RH_independent: input tree topology has rheas (RH) as the deepest among notopalaeognathae; independent rates model used (Matt Phillips). Files are described as above. > 37tax_TM_independent: input tree topology has tinamous and moas (TM) as the deepest among notopalaeognathae; independent rates model used (Matt Phillips). Files are described as above. > 40tax_3eggs_per_taxon_for_ASR: conducted using three eggshell specimens per taxon (one for A. Hildebrandti) for use in ancestral state reconstruction (see "EBeggall.txt" and "Newwholetree.alleggs.nex" below. See also Supplementary Figure 12II. Files are described as above. ________ ________ ________ ________ ________ ________ ________ ________ ________ ________ #Supplementary Data 11 Ancestral_state_reconstruction: input files used to conduct various ancestral state reconstruction analyses, and the R code used to conduct those analyses: > EBboneonly.txt: the minimum, maximum and average body mass (g) of bone specimens (taken from Hansford and Turvey 2018, and where n08AEP07 is "Vorombe titan", n07AEP05 is "Aepyornis maximus", and n07AEP06 is "Mullerornis modestus". Note this analysis does not appear in the Supplementary Information. > EBeggall.txt: eggshell thickness (um) for different eggshell specimens (MB#; see published article for metadata)--these include approximately three specimens per taxon representing approximately the minimum, maximum, and average thickness for those taxa. See Supplementary Figure 12II. > EBeggonly.txt: eggshell thickness (um) for different eggshell specimens (MB#; see published article for metadata)--these include one specimen per taxon representing the actual thickness for that specimen ("Eggshell_thickness_real") and the average thickness for the taxon ("Eggshell_thickness_average"). See Supplementary Figure 12I and Figure 2b. > Newwholetree.alleggs.nex: a phylogenetic tree from Legendre et al. 2020 that includes three specimens per taxon whose thicknesses represent approximately the minimum, maximum, and average for those taxa (corresponding to EBeggall.txt). Topology and dates come from "40tax_3eggs_per_taxon_for_ASR" (see above). See Supplementary Figure 12II. > Newwholetree.bones.nex: a phylogenetic tree fro Legendre et al. 2020 that includes only bone specimens (corresponding to EBboneonly.txt). Topology and dates come from "37tax_m3ry_nuc_oct17_auto_TM_TRIMMED_TO_ONE_EB_PER_TAXON_FOR_ASR.trees" (see above) but bone specimens from each clade were substituted for eggshell specimens. Note this analysis does not appear in the Supplementary Information. > Newwholetree.eggs.nex: a phylogenetic tree from Legendre et al. 2020 that includes one specimen per taxon (corresponding to EBeggonly.txt). Topology and dates come from "37tax_m3ry_nuc_oct17_auto_TM_TRIMMED_TO_ONE_EB_PER_TAXON_FOR_ASR.trees" (see above). See Supplementary Figure 12I and Figure 2b. > Palaeognathdata_maleBM_nonewdata.txt: eggshell thickness (um), male body mass (g) and egg mass (g) for the outgroup taxa in the phylogenetic tree. > Supplementary Code 1* : R code used to conduct ancestral state reconstruction analysis ________ ________ ________ ________ ________ ________ ________ ________ ________ ________ #Supplementary Data 12 Micro_CT_raw_data: Results from the micro-CT scans of eggshell specimens for each of the following taxa: > Mullerornis > Northern_Aepyornis > Southern_Aepyornis_thick > Southern_Aepyornis_thin Subfolders within each of these folders that are labelled with an "AD" prefix refer to the eggshell specimen number (see the published article for metadata). Within each specimen folder is the processed results for the micro-CT scans for that specimen: > ROI Selection.tif : the region of interest measured for features > snapshot*.tif: translucent snapshots of the specimen from different angles > Pore structure*.tif: snapshots of just the pores from different angles > Inner surface*.tif: a snapshot of the inner surface of the eggshell specimen > Outer surface*.tif: a snapshot of the outer surface of the eggshell specimen > *_1.11mm_2__rec_voi_*_i2d.csv: statistics for the volume of interest > *_1.11mm_2__rec_voi_.ctan.csv: settings used and results for features in the region of interest