PMD hypomethylation human (hg19) neural network scores
Contributors
Data manager:
Description
Global loss of DNA methylation in mammalian genomes occurs cumulatively as a mitotic process during aging and cancer, primarily in Partially Methylated Domains (PMDs). It has been shown that local sequence context (100bp) has a strong effect on the rate of demethylation of individual CpG dinucleotides within PMDs. Here, we train a deep learning model to characterize this sequence dependence further, finding that methylation loss can be predicted from a CpG’s 150bp sequence context alone with an AUC of 0.95. We use re-methylation rates of newly synthesized DNA to show that CpGs with fast-loss sequence context are inefficiently re-methylated. Interestingly, we find that the 10% of CpGs predicted to have the “slowest” rate of loss lose almost no DNA methylation in healthy cell types. These same slow-loss CpGs lose a significant amount of DNA methylation in cancer, suggesting that they could be responsible for deregulation of genes and transposable elements that are associated with DNA hypomethylation in cancer.
This directory contains the Nov. 18, 2020 version of the human (hg19) CpG hypomethylation Neural network scores in a single tab-delimited (bedgraph) file:
multitissue-nn-scores.allCGs.0based.hg19.bedgraph.gz
with the following columns:
1: chromosome (hg19)
2: start coord (hg19, 0-based)
3: end coord (hg19, 0-based)
4: multi-tissue NN score (0-1). Close to 0 is classified as slow-loss CpG, close to 1 is classified as fast loss CpG5: Num CpGs in 150 bp window (including central CpG, so minimum is 1).
The full version of the NN scores with additional details are in the file zhou-bian.allCGs.1based.hg19.tsv.gz
Each row is a CG which provides (1) chromosome, (2) the corresponding C coordinate on the forward (watson) strand of the reference genome in one-based coordinates, (3) Neural network score, (4) number of CpGs within the 150bp sequence centered on this CpG, including the center CpG, (5) CpG is within a CpG island (0, no; 1, yes), CpG is within ENCODE blacklist (0, no; 1, yes)
Here the CpG islands are the union set of Irizarry (Irizarry et al. 2009, Nat Genet), Takai-Jones (Takai et al. 2002, PNAS), Gardner-Gardin CGIs (Gardner-Gardin et al. 1987, J Mol Biol.). The blacklist was downloaded from https://github.com/Boyle-Lab/Blacklist/tree/master/lists.
Additional files are included here:
zhou_pmds.0based.hg19.bed.gz: Input PMD CpGs from the Zhou (multi-tissue) dataset
bian_pmds.crc01.0based.hg19.bed.gz: Input PMD CpGs from the Bian (intra-tumor) dataset
zhou_bian_train_test_data.tar.gz: All training and test CpGs, including labels and sequence windows.
Files
Files
(2.2 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:1f7b977f3a003fdc58e2f4b34f56107a
|
32.5 MB | Download |
|
md5:d7244cc8c4f888437c108748d34b9386
|
19.2 MB | Download |
|
md5:e1a0953b4937fde7c62bc2d1ee0fe0c1
|
16.1 MB | Download |
|
md5:308d38ec9d5be6351d2faad3afa02779
|
14.9 MB | Download |
|
md5:d6900d8a706295fa8f368fdbdbbc0c47
|
432.4 MB | Download |
|
md5:6fd670dbeb70a54940817cd4919e583b
|
694.0 MB | Download |
|
md5:d67cffd08da657b3b259095a54979389
|
911.9 MB | Download |
|
md5:648d6689cdbd8eb76b2c6d8774b52fb3
|
38.9 MB | Download |
Additional details
Related works
- Is derived from
- Software: https://github.com/methylgrammarlab/pmd_hypometh_classifier (URL)