There is a newer version of the record available.

Published December 12, 2022 | Version v1
Working paper Open

The Effect of Mutation Subtypes on the Allele Frequency Spectrum and Population Genetics Inference

  • 1. University of Michigan
  • 2. University of Texas

Description

Analysis for paper: Effect of Mutation Subtypes on the Allele Frequency Spectrum and Population Genetics Inference

Step 1) Use tar -xvzf afs_analysis.tar to unzip file. Major files in directories include:

a) ./data

  • ./data/mst_sfs/: Text file for each of 96 subtypes. File all_subtypes_afs.txt contains all 96 AFS in one file
  • ./gw_sites_info.txt: Text file containing each variant info genome wide used in analysis
  • ./all_MST_100kb_counts.txt: Text file containing data for 100Kb window analysis (in each window: D, mst counts, etc)
  • ./data/DaDi/bootstrap_SFS: 100 bootstrapped AFS for each subtype
  • ./data/simulated_neutral/all_*.txt: Simulated neutral AFS using theta estimates for A_C.AAA and C_T.ACG

b) ./output

  • sigma_mat.txt: Large sigma matrix used for covariance calculations in D-2 statistic
  • ./output/D2_stat/by_subtype: Output of D-2 statistic for each subtypes AFS
  • ./output/DaDi: Output from running dadi 10 times on each subtypes AFS.

c) ./to_submit_scripts

  • entire_analysis.R: R script containing entire pipeline sourcing each relevant analysis file
  • afs_paper_functions.R: R script with functions used

Step 2) Install software needed to run all analysis completely from scratch.

Step 3) Run analysis by going through entire_analysis.R in sections.

  • Create AFS for each of 96 mutation subtypes. Outputs separate txt file for each and a single with all 96 subtypes.
  • Create dataframe of genome wide AFS statistics for each of 96 subtypes. Includes D, D-2, % singles, doubles, triples, etc
  • AFS heterogeneity across subtypes and signals of recurrent mutations and gene conversion shaping the AFS
    • Signals of mutation rate/recurrent mutations on AFS. Look at singleton/doubleton ratio by mutation rate
    • Signals of gBGC on AFS. Split into WS, SW, and indifferent mutation types. Compare D and D-2 across subtypes
  • DaDi Analysis: Effect of genome wide AFS heterogeneity across subtypes on demographic inference
  • Regional 100Kb Analysis: Effect of subtype composition and local genomic factors on the regional AFS

Files

Files (781.0 MB)

Name Size Download all
md5:68147fbbfbd249b8f05baa8432904b72
781.0 MB Download