There is a newer version of the record available.

Published April 22, 2022 | Version Sep. 2020 version
Dataset Open

PMD hypomethylation human (hg19) neural network scores

  • 1. Hebrew University

Contributors

Data manager:

Description

Global loss of DNA methylation in mammalian genomes occurs cumulatively as a mitotic process during aging and cancer, primarily in Partially Methylated Domains (PMDs). It has been shown that local sequence context (100bp) has a strong effect on the rate of demethylation of individual CpG dinucleotides within PMDs. Here, we train a deep learning model to characterize this sequence dependence further, finding that methylation loss can be predicted from a CpG’s 150bp sequence context alone with an AUC of 0.95. We use re-methylation rates of newly synthesized DNA to show that CpGs with fast-loss sequence context are inefficiently re-methylated. Interestingly, we find that the 10% of CpGs predicted to have the “slowest” rate of loss lose almost no DNA methylation in healthy cell types. These same slow-loss CpGs lose a significant amount of DNA methylation in cancer, suggesting that they could be responsible for deregulation of genes and transposable elements that are associated with DNA hypomethylation in cancer.

This directory contains the Sep. 20, 2020 version of the human (hg19) CpG hypomethylation Neural network scores in one gzip-compressed tsv file per chromosome.

The Sep. 2020 Neural network score provides a prediction of the probability of each sequence to be a fast hypomethylation CpG, which was produced by a neural network model that used two independent input training datasets.

Files included in this directory:

  - chr*. tsv.gz: Neural network score of each CpG in each chromosome, using hg19 coordinates. chrX and chrY are omitted.

 
Each row is a CG which provides (1) chromosome, (2) the corresponding C coordinate on the forward (watson) strand of the reference genome in one-based coordinates, (3) Neural network score, (4) number of CpGs within the 150bp sequence centered on this CpG, including the center CpG, (5) CpG is within a CpG island (0, no; 1, yes), CpG is within ENCODE blacklist (0, no; 1, yes)

 Here the CpG islands are the union set of Irizarry (Irizarry et al. 2009, Nat Genet), Takai-Jones (Takai et al. 2002, PNAS), Gardner-Gardin CGIs (Gardner-Gardin et al. 1987, J Mol Biol.). The blacklist was downloaded from https://github.com/Boyle-Lab/Blacklist/tree/master/lists.

Files

Files (327.1 MB)

Name Size Download all
md5:1819352f78425aa7565f7029d5b1d27b
27.8 MB Download
md5:65dd5147ce775b4d71d654f1f384153a
16.7 MB Download
md5:3b23535b1f0de83f18e2fe12bd569ac0
15.8 MB Download
md5:71c170eed032b9c3edf7e76b7f42c04c
15.8 MB Download
md5:44c9a036ff2b45203de81a30d29afdbb
10.0 MB Download
md5:80b12309e85e9e9323ed782dfcdd18fc
10.6 MB Download
md5:96561e7c1813a5464d8cde6ee423eed4
10.7 MB Download
md5:971f090fa5bf243ee33ad0dfadfd042c
13.2 MB Download
md5:0d5c34b1dd85071afdf1f0e028872ce4
13.8 MB Download
md5:ab1e2b26a4f8fecae11513bed5c6b0cd
8.4 MB Download
md5:7b7be61daa6acee62b402776a63f2ff9
12.3 MB Download
md5:ee21a853b6b97306f3533932e00a46a7
26.6 MB Download
md5:284a576ce0cebdded5fad83564e8ea10
8.7 MB Download
md5:35aa79a29cbf0bbc33184fada8d3d786
4.7 MB Download
md5:f6b0ddd36b4a7503682ed45f9b87053f
7.0 MB Download
md5:b3b27e44de0fde65fef57fb2c8e16793
20.1 MB Download
md5:4184393751df611044833a4f42d4e15c
18.2 MB Download
md5:7b150b393672675736a21d026a264baa
18.5 MB Download
md5:a1bdc162ceb19fc81974a3ffe8dc8eb0
18.1 MB Download
md5:d7244cc8c4f888437c108748d34b9386
19.2 MB Download
md5:e1a0953b4937fde7c62bc2d1ee0fe0c1
16.1 MB Download
md5:308d38ec9d5be6351d2faad3afa02779
14.9 MB Download

Additional details

Related works