README
This DRYAD package contains the input files and data used for the project Andrew J. Mason, Felipe G. Grazziotin, Hussam Zaher, Alan R. Lemmon, Emily Moriarty Lemmon, and Christopher Parkinson (2019) Reticulate evolution in Nuclear Middle America causes discordance in the phylogeny of palm-pitvipers (Viperidae: Bothriechis).

Please cite this publication, the DRYAD accession, and the SRA numbers for specimens when relevant.

This dryad package contains 838 files in addition to this README placed in 9 directories and two subdirectories. These files account for the input data for all analyses.

If errors, inconsistencies, or problems are found please contact Andrew Mason at ajmason@clemson.edu

******************************************************************************************

The directory structure and files contained therein are as follows:

Directories:
	Bothriechis_alignments_and_RAXML_Commands
	Bothriechis_BioGeoBEARS_analyses
	Bothriechis_Phylonet
	Bothriechis_SNaQ
	Bothriechis_Treemix
	Crotalinae_Alignments_and_RAXML_Commands
	Crotalinae_ConcatenatedTree_and_RAXML_Commands
	Dtest
	MCMCTree_Dating
		Branchlength_estimation
		Dating
		
Directory and File descriptions:

1) Bothriechis_alignments_and_RAXML_Commands - This directory contains the files relevant to the generation of RAXML trees that were used in discordant analyses of the Bothriechis dataset. Specifically: a shell script detailing the commands used for raxml and nucleotide alignments for each locus. Descriptions are as follows:
	(a) Files L2-L512_Bothriechis_final.fas contain nucleotide alignments of Bothriechis specimens, each corresponding to one anchored locus. Each of these alignments was processed using the pipeline described in Breinholt et al. (2017) "Resolving relationships among the megadiverse butterflies and moths with a novel pipeline for Anchored Phylogenomics".
	(b) RAXML_Bothrichis_Commands.sh - contains commands given on the Clemson University Palmetto cluster to run RAXML on each of the anchored loci.
	
2) Bothriechis_BioGeoBEARS_analyses - This directory contains the initial time calibrated tree, regional occupation of Bothriechis species, and the dispersal matrix used to weight dispersal probabilities, as well as the raw R code used. File Descriptions are as follows:
	(a) FigTree_edited.tre - Time calibrated tree from MCMCTree dating (see files below), with modifications to allow it to be effectively read into and manipulated in R.
	(b) Bothri_Biogeo.R - R script that details all commands used in BioGeoBEARS analyses
	(c) Bothri_dispersal_4states.matrix - regional dispersal modifiers used for BiogeoBEARS analyses
	(d) Bothri_region.data - Regional occurrences for Bothriechis species. Region A corresponds to northern South America, Region B to the southern Middle American Isthmus, Region C to the Chortis block, and Region D to northern Middle America.
	
3) Bothriechis_Phylonet - This directory contains the input files for phylonet analyses. The three files (Phylonet_STnet_0reticulation_500.nexus, Phylonet_STnet_1reticulation_500.nexus, and Phylonet_STnet_2reticulation_500.nexus) each contain the input gene trees, as well as the specific commands given to phylonet allowing for zero, one, and two reticulations, respectively.

4) Bothriechis_SNaQ - This directory contains the input files for SNaQ analyses, specifically a Bothriechis_SNaQ_commands.txt file, a SNaQ_tree.tre file, and a StartingTree_Species.tre file. File descriptions are as follows:
	(a) Bothriechis_SNaQ_commands.txt - the commands used in the Julia coding language to allow SNaQ to infer networks with varying numbers of reticulations
	(b) SNaQ_trees.tree - a file of unrooted genetrees obtained through RAXML analyses of each anchored locus for the Bothriechis only dataset (see the Bothriechis_alignments_and_RAXML_Comands above)
	(c) StartingTree_Species.tre - a newick starting tree given as input to SNaQ
	
5) Bothriechis_Treemix - This directory contains the input files for Treemix analyses including a Bothri_Pops file and a Bothri_Treemix.gz file. These files correspond to
	(a) Bothri_Pops - Population/species designations required by Treemix.
	(b) Bothri_Treemix.gz - a g-zip compressed file of the tree-mix formated SNP data for 368 SNPs extracted from independent anchored loci

6) Crotalinae_Alignments_and_RAXML_Commands - This directory contains the files used to generate RAXML gene trees for each anchored locus for the Crotalinae dataset. Specifically it contains an Astral_commands.txt file, a RAXML_Crotalinae_Commands.sh file, and nucleotide alignments for each locus. File descriptions are as follows:
	(a) Astral_commands.txt - a file containing the commands passed to ASTRAL to generate a coalescence based species tree using a file of concatenated (as in into one file) RAXML gene trees
	(b) RAXML_Crotalinae_Commands.sh - contains commands given on the Clemson University Palmetto cluster to run RAXML on each of the anchored loci.
	(c) Files L2-L512_Crotalinae_final.fas contain nucleotide alignments of Crotalinae specimens, each corresponding to one anchored locus. Each of these alignments was processed using the pipeline described in Breinholt et al. (2017) "Resolving relationships among the megadiverse butterflies and moths with a novel pipeline for Anchored Phylogenomics".
	
7) Crotalinae_ConcatenatedTree_and_RAXML_Commands - contains the nucleotide alignment and file of the RAxML commands used to generate a species tree from the concatenated gene alignments

8) Dtest - This directory contains the input files used to calculate D-statistics for all possible combinations of Bothriechis populations. Our method was based on the usage of an Rpackage that can calculate a Dstatistic based on an input fasta file containing SNPs and the populations of interest. To create fastas that represented all possible combinations we use the provided python script MakeDstatFastas.py, which is described below. Files descriptions are as follows:
	(a) Bothri_comparisons.csv - comma delimited file where each row specifies a comparison of four populations for which we wanted to calculate D-statistics.
	(b) ABBA-BABA.R - R code used to calculate a d-statistic for each D-statistic fasta (generated after running the MakeDstatFastas.py script)
	(c) MakeDstatFastas.py - python script to write a fasta files for population comparisons specified in a comparisons.csv file and an initial SNP fasta of all taxa/populations. Files are output into a Fastas directory and are named by the population comparison. Running "python MakeDstatFastas.py -h" will provide the options necessary to run the snp
	(d) All_SNP.fasta - Fasta file of all SNPs extracted from aligned anchored loci for the Bothriechis specific dataset.
	
9) MCMCTree_Dating - This directory contains the information and data used in MCMCTree dating analyses. It contains a Banchlength_estimation directory used to estimate branchlengths with MCMCtree and BaseML, and a Dating directory for estimating node ages. These directories and their contents are described here:
	(a) Branchlength_estimation - contains files used in the initial estimation of branch lengths for the tree
		(i)  45Taxa_concat.tre - phylip formatted file containing the topology of the best tree recovered from RAXML estimation of a concatenated alignment of anchored loci for the 45 terminals used in dating analyses and the calibration distributions passed to mcmctree.
		(ii) MCMCTree_seqs.phy - phylip formatted alignment of anchored loci partitioned based on the recommendation of PartitionFinder2.py.
		(iii) mcmctree.ctl - control file passed to mcmctree to start analyses.
	(b) Dating
		(i)  45Taxa_concat.tre - phylip formatted file containing the topology of the best tree recovered from RAXML estimation of a concatenated alignment of anchored loci for the 45 terminals used in dating analyses and the calibration distributions passed to mcmctree.
		(ii) MCMCTree_seqs.phy - phylip formatted alignment of anchored loci partitioned based on the recommendation of PartitionFinder2.py.
		(iii) mcmctree.ctl - control file passed to mcmctree to start analyses.
		(iv) in.BV - branchlength and substitution parameters recovered in Branchlength estimation and passed to mcmctree to determine dates.
	
	
	
	
	
	
	