README Below is a description of the directory structure of the uploaded data, scripts and outputs. There are three input files (ASE_data_frame.csv,compiled_ddPCR_results.csv, dilution_data.csv). Input data is described in detail below, the methods of production are described in the associated manuscript. The file "Howe_et_al_Rscript.Rmd" contains all code used to produce the results and figures within Howe et al. 2020. It is written in RMarkdown format, so that a more readable html output is produced. A more complete description of its contents is given within the file. In order to run the file, a number of R packages may need to be installed. The HTML output is also uploaded so that the results can be seen without needing to run the script. The output files produced include a 'figures.pdf', where the figures produced for the manuscript are saved, and two supplementary tables (S4 and S5). Files uploaded: input_data: |- ASE_data_frame.csv |- compiled_ddPCR_results.csv |- dilution_data.csv scripts: |-Howe_et_al_Rscript.Rmd outputs: |- figures.pdf * |- table_s4.csv * |- table_s5.csv * |-Howe_et_al_Rscript.html * * indicates files generated by R script, they are also uploaded here to aid in usability. Description of files and variables: ASE_data_frame.csv #data for allele-specific expression Qiye Li and Zongji Wang of BGI conducted the processing of the raw RNA and DNA data, to output a table of SNPs where the relative frequency of any given SNP-allele differed significantly between a paired DNA and RNA sample. This output table is used as the input here. Queries regarding the production of this table should be directed to them. Columns: - Chr #scaffold locus located upon - Locus #locus of SNP on scaffold - Reference # Base for reference allele - Caste #Caste of ants collected for sample - Colony #ant colony sample was taken from - Region #region within the gene that the SNP was located in - Gene #name of gene - BaseDNA1 # Base (A,T,C or G) of first DNA SNP-allele (order of alleles is arbitrary) - CountDNA1 #number of reads with BaseDNA1 that map to this locus - BaseDNA2 # Base (A,T,C or G) of second DNA SNP-allele (order of alleles is arbitrary) - CountDNA2 #number of reads with BaseDNA2 that map to this locus - BaseDNA3 # Base (A,T,C or G) of third DNA SNP-allele (order of alleles is arbitrary) - CountDNA3 #number of reads with BaseDNA3 that map to this locus - BaseRNA1 # Base (A,T,C or G) of first RNA SNP-allele (order of alleles is arbitrary) - CountRNA1 #number of reads with BaseRNA1 that map to this locus - BaseRNA2 # Base (A,T,C or G) of second RNA SNP-allele (order of alleles is arbitrary) - CountRNA2 #number of reads with BaseRNA2 that map to this locus - BaseRNA3 # Base (A,T,C or G) of third DNA SNP-allele (order of alleles is arbitrary) - CountRNA3 #number of reads with BaseRNA3 that map to this locus - BaseRNA4 # Base (A,T,C or G) of fourth DNA SNP-allele (order of alleles is arbitrary) - CountRNA4 #number of reads with BaseRNA4 that map to this locus - DNAreadsTotal #total number of DNA reads aligned to this locus - RNAreadsTotal #total number of RNA reads aligned to this locus - DNAProp #proportion of BaseDNA1 of DNAreadsTotal - RNAProp #proportion of BaseRNA1 of RNAreadsTotal compiled_ddPCR_results.csv #data for gene specific ASE using generated from ddPCR experiments. Generated using DNA and RNA isolated from pooled bodies and heads of individuals collected from A. echinatior colonies Columns: - Sample #description of Sample, - Target # gene that is targetted - Ch1+Ch2+ # number of droplets positive in both channel 1 and channel 2, - Ch1+Ch2- # number of droplets positive for channel 1, but negative for 2 - Ch1-Ch2+ # number of droplets positive for channel 2, but negative for 1 - Ch1-Ch2- # number of droplets positive for neither channel 1 nor 2 - fraction # Fraction of template that in channel 1. (Ch1+Ch2+ + Ch1+Ch2-)/( Ch1+Ch2+ + Ch1+Ch2- + Ch1-Ch2-) - min #lower bound of 95% confidence interval of fraction, based on poisson distribution of templates among droplets - max #upper bound of 95% confidence interval of fraction, based on poisson distribution of templates among droplets #notes: unfortunately data for number of droplets for MRJP3 is unavailable due to recording error, but fractions and CIs are present dilution_data.csv #Data for determining relative concentration of genes in samples of mixed DNA Columns: - Sample #sample name, contains concentration info - Target #gene name - Concentration # concentration of template per uL, based on number of positive droplets - CopiesPer20uLWell # total number of template copies estimated per well - PoissonConfMax # upper limit of 95% CI based on poisson distribution of templates among droplets - PoissonConfMin # lower limit of 95% CI based on poisson distribution of templates among droplets - Positives #number of positive droplets - Negatives #number of negative droplets Howe_et_al_Rscript.Rmd #R markdown script that runs all analyses presented in the manuscript Howe_et_al_Rscript.html #output of RMarkdown script in html format figures.pdf #figures output from the RScript table_s4.csv #supplementary table 4: the number of loci consistent with imprinting at different levels of alpha. Produced by Howe_et_al_Rscript.Rmd, but uploaded so script does not need to be run to generate. Columns: - Gene #name of gene - Chr #chromosome where gene lies - alpha_0 #number of loci consistent at alpha = 0 - alpha_02 #number of loci consistent at alpha = 0.2 - alpha_08 #number of loci consistent at alpha = 0.8 - alpha_1 #number of loci consistent at alpha = 1 - total #total number of loci with ASE in the gene table_s5.csv #supplementary table 5: the number of samples that were inconsistent with imprinting for each gene at varying levels of alpha. Produced by Howe_et_al_Rscript.Rmd, but uploaded so script does not need to be run to generate. Columns: - Gene #name of gene - Chr #chromosome where gene lies - Locus #location on chromosome - Complete_Paternal #number of samples inconsistent with exclusively patrigenic expression - Biased_Paternal #number of samples inconsistent with biased patrigenic expression - Biased_Maternal #number of samples inconsistent with biased matrigenic expression - Complete_Maternal #number of samples inconsistent with exclusively matrigenic expression