Published 2025 | Version v2

Uncovering methylation-dependent genetic effects on regulatory element function in diverse genomes

  • 1. ROR icon Vanderbilt University

Contributors

  • 1. ROR icon Broad Institute
  • 2. ROR icon Vanderbilt University

Description

This project contains datasets related to:

Uncovering methylation-dependent genetic effects on regulatory element function in diverse genomes 
Rachel M. Petersen, Christopher M. Vockley, Amanda J. Lea

A preprint of this work can be found here: https://www.biorxiv.org/content/10.1101/2024.08.23.609412v1

Specifically, the data provided here are:

1) replicateinfo.txt contains metadata for each mSTARR-seq replicate, including replicate number, pool number, sample type (DNA vs RNA) and methylation status

2) rnadnacounts_400bpwin.txt contains a count matrix with the number of DNA and RNA reads falling within each 400 bp genomic window for each replicate. Columns are replicate names, rows are genomic windows.

3) Joint_genotyping.vcf contains results from joint genotyping analysis using DNA sequences generated in the current study from 25 individuals accessed through the 1000 Genomes Project.

4) ASE_data.zip contains

  • ASE_totalcounts.txt: counts matrix of the total number of DNA and RNA reads in each replicate for each variant
  • ASE_refcounts.txt: counts matrix of the number of DNA and RNA reads for the reference allele in each replicate for each variant
  • ASE_mashr_inputsites.txt: sites that were tested for methylation-dependent allele-specific expression using mashr
  • WASP_ASE_sites.txt: variant sites that were retained after using the WASP mappability pipeline (Van De Geijn et al. 2015)

5) model_results.zip contains 

  • model1_methonly_results.txt: results from linear modeling to identify windows with regulatory function in the methylated condition
  • model1_unmethonly_results.txt: results from linear modeling to identify windows with regulatory function in the methylated condition
  • model2_mashr_results.txt: results from mashr analysis to identify windows with methylation-dependent regulatory function
  • ASE_meth_results.txt: results from allele specific expression analysis to identify ASE in the methylated condition
  • ASE_unmeth_results.txt: results from allele specific expression analysis to identify ASE in the unmethylated condition
  • ASE_mashr_results.txt: results from mashr analysis to identify sites with methylation-dependent ASE

6) Comparison_datasets.zip contains

  • Johnston_eLife_mSTARR_counts_K562.txt: counts matrix from Johnston et al. 2024, adapted to use 200 bp windows. Original dataset can be found here: https://zenodo.org/records/7949036#.ZGZ5UnbMJq9
  • Lea_eLife_mSTARR_counts.txt: counts matrix from Lea et al. 2018

7) GWAS_EWAS_overlap_files.zip contains

  • GWAShits_siteformat.txt: GWAS associations accessed through the NHGRI-EBI catalog in March 2024, formatted for use in R
  • EWAS_Atlas_associations.tsv: EWAS associations accessed through the EWAS Open Platform Data Hub in March 2024
  • EWAS_Atlas_probe_annotations.tsv: genomic locations of EWAS probes 
  • ASE_mashr_GWASOverlap.bed: methylation-dependent genetic effect sites that are located within 400 bp of a GWAS hit (results of bedtools intersect)
  • ASE_mashr_EWASOverlap.bed: methylation-dependent genetic effect sites that are located within 400 bp of a EWAS hit (results of bedtools intersect)
  • blood_gwas_overlaps.rds: methylation-dependent genetic effects sites that are located within 400 bp of a GWAS hit for 20 quantitative immune-related blood traits from Pan-UK Biobank.

Files

ASE_data.zip

Files (937.6 MB)

Name Size
md5:155d88fbe4d5a535d211a54961d42d1e
2.8 MB Preview Download
md5:0f5d15bf316c41d6809565a5ed9ca2fe
29.2 MB Preview Download
md5:d908bc36955671aaa6bfcd3197cc687e
75.2 MB Preview Download
md5:641476a8fd973bfc949a6e936203ad5e
736.8 MB Download
md5:a22fb5b3380bb851ed93164469717d06
47.0 MB Preview Download
md5:2ad1e0f563623b243dd029755fd66bd3
499 Bytes Preview Download
md5:e71a0072fd210635f05f553b3e065ccf
46.5 MB Preview Download

Additional details

Dates

Submitted
2025-04-01
R1 submission to Genome Research