Title: Behavioural and morphological precursors predict the origin of parental care
Authors: Alexandre V. Palaoro Daniel S. Caetano and Glauco Machado
Contact about code and analyses: dcaetano@towson.edu

Extract the .zip file to find the information we refer to below.

## Code and data to run the phylogenetic comparative analyses.

Here you will find the code and the data to run the analyses. Please note that the code is not fully automatized and require using multiple scrips in order to do each step of the analyses. Also note that parameters were optimized to the server that we used to run all code and might not be the best for your particular case.

Each of the folders are dedicated to a different scenario:

 - biology_root_D_test_dated : has the posterior predictive test using the dated super-tree and the root state fixed as parental care absent.
 - biology_root_D_test_unit_branch_length : has the posterior predictive test using the super-tree with all branch lengths equal to 1 and the root state fixed as parental care absent.

 - free_root_D_test_dated : has the posterior predictive test using the dated super-tree and the root state as estimated by the model (with prior of 50/50).
 - free_root_D_test_unit_branch_lengths : has the posterior predictive test using the super-tree with all branch lengths equal to 1 and the root state as estimated by the model (with prior of 50/50).

 - compute_marginal_asr_prec : has code to compute the marginal ancestral reconstruction given the parameter estimates.
 - data : has information on the data and the analyses we used to date the super-tree

## ##################################################################################

## About the D test folders.

Each D test folder has a similar structure. So here we are going to only explain the "biology_root_D_test_dated". The same logic applies to the other "*D_test*" folders.

Before starting the analyses the "Prepare_dated_tree.R" script will prepare the super-tree for the analyses making sure the tip labels match the data and performing other important checks. This script will produce the "BEAST_MCC_dated_tree.nex" which is already available in this folder.

The "run_pipeline.R" script shows the correct order to run the scripts in this folder.

First, the "run_simmap_mcmc_care_females.R", "run_simmap_mcmc_care_males.R", "run_simmap_mcmc_precursor_females.R", and "run_simmap_mcmc_precursor_males.R" are used to independently estimate the distribution of stochastic mappings for the candidate precursors for both maternal and paternal care and the presence and absence of maternal and paternal care. The results from these analyses will be used to conduct the posterior predictive simulations.

Second, the "make_DD_test_females_informed_root.R" and "make_DD_test_males_informed_root.R" will read the results from the stochastic mapping simulations and perform the simulations for the posterior predictive test.

Note that the "Dtest_function_informed_root.R" script has all the custom functions used in this analysis scenario and is called by the other scripts.

Finally, the "Read_Dtest_results.R" can be used to read the results of the posterior predictive test and extract the percent of associations and the p values for the analyses.

## ##################################################################################

## About the data folder.

The "data" directory has the super-tree used in this analyses and the code to make the secondary calibration of the tree.

The "BEAST_calibration_results" has the .xml files to run BEAST with the exact configuration that we used and the posterior distribution of trees.
 
The "get_calibrated_starting_tree" directory has the script and data necessary to generate the starting tree for the BEAST 2 analyses. Note that divergence time estimation in BEAST 2 often requires an starting point tree with node ages within the prior distribution for all dated nodes to make sure that the likelihood of the model can be computed at the first step of the MCMC chain.

Inside the "get_calibrated_starting_tree" directory you will find the "generate_BEAST_calibration_block.R" which generates the "calibration_info_BEAST.txt" file with the information for the xml BEAST 2 configuration file to date the nodes. This is a simple R script to facilitate the generation of a large number of priors.

The "generate_empty_data_matrix.R" script will produce an empty data matrix with tip labels and number of species matching the super-tree and in the correct format to import to BEAST 2.

Finally, the "get_chronos_calibration.R" will use the information from the "calibration_table_complete.csv" table to produce a penalized likelihood estimation of the calibrated super-tree (saved as "start_calib_tree_solved.tre"). Note that we need to produce a binary super-tree because BEAST 2 will not work with the mulifurcating tree. The original super-tree can be found in the "start_calib_tree.tre" file.