There is a newer version of the record available.

Published October 24, 2025 | Version v1
Dataset Open

Dataset for "KnowYourCG: Facilitating base-level sparse methylome interpretation" --- hg38

Authors/Creators

  • 1. ROR icon Children's Hospital of Philadelphia

Description

KYCG Knowledgebase Sets (hg38)

Overview

This repository contains comprehensive knowledgebase sets for the KnowYourCG (KYCG) framework, designed for functional DNA methylation analysis at base-level resolution. These databases enable rapid enrichment testing and interpretation of diverse methylation datasets, including sparse sequencing data (low-pass, single-cell), 5-hydroxymethylation (5hmC) profiles, spatial methylomes, and array-based EWAS datasets.

Citation: Goldberg DC, Fu H, Atkins D, Moyer E, Lee CN, Deng Y, Zhou W. (2025). KnowYourCG: Facilitating base-level sparse methylome interpretation. Science Advances 11(43). DOI: 10.1126/sciadv.adw3027

Reference Coordinates

cpg_nocontig.cr

  • Complete reference coordinates for all CpG sites in hg38 (excluding contigs)
  • Essential baseline for enrichment testing and coordinate mapping

I. Sequence Features

  • nFlankCG.20220321.cm - CpG count in flanking regions (standard window)
  • nFlankCG50.20231025.cm - CpG count within 50bp flanking regions
  • nFlankCG100.20231025.cm - CpG count within 100bp flanking regions
  • Tetranuc2.20220321.cm - Four-base sequence context surrounding CpG sites
  • CGI.20220904.cm - CpG island annotations
  • rmsk1.20220307.cm + .idx - RepeatMasker annotations (class 1)
  • rmsk2.20220321.cm + .idx - RepeatMasker annotations (class 2)

II. Genomic Features

  • Chromosome.20221129.cm - Basic chromosome annotations
  • ChromosomeXY.20230901.cm - Sex chromosome-specific features
  • Centromere.20221129.cm - Centromeric regions
  • Win100k.20220228.cm - 100kb genomic window annotations
  • ABCompartment.20220911.cm - A/B compartment annotations (open/closed chromatin)
  • PMD.20220911.cm - Partially Methylated Domains
  • CTCFbind.20220911.cm - CTCF binding sites (chromatin loop anchors)
  • ChromHMM.20220303.cm - Standard ChromHMM state annotations
  • ChromHMMfullStack.20230515.cm - Comprehensive ChromHMM states across multiple cell types
  • REMCChromHMM.20220911.cm - Roadmap Epigenomics ChromHMM states
  • HM.20221013.cm + .idx - Comprehensive histone modification marks (H3K4me3, H3K27ac, H3K9me3, H3K27me3, etc.)
  • MetagenePC.20220911.cm + .idx - Positional information relative to gene features (promoters, gene bodies, 3'UTRs)
  • TFBS.20220921.Part1.cm + .idx - TFBS collection Part 1
  • TFBS.20220921.Part2.cm + .idx - TFBS collection Part 2
  • TFBSrm.20221005.cm + .idx - Roadmap Epigenomics TFBS (~1,188 transcription factors)
  • RoadMapPosGeneExpCpG.20220814.cm - CpGs positively correlated with gene expression
  • RoadMapNegGeneExpCpG.20220814.cm - CpGs negatively correlated with gene expression

III. Trait Associates

  • TiSigBLUEPRINT.20221209.cm + .idx - Hematopoietic cell type signatures (blood lineages)
  • TiSigBrain.20221209.cm + .idx - Brain cell type signatures (neurons, glia)
  • TiSigLoyfer.20221209.cm + .idx - Broad tissue and cell type atlas
  • ImprintingDMR.20220818.cm - Genomically imprinted differentially methylated regions
  • IntermediateMeth.20221121.cm - CpGs with intermediate methylation levels (25-75%)
  • IntermediateMethS.20221121.cm - Stable intermediate methylation sites
  • XCILinkedWGBS.20221121.cm - X-chromosome inactivation-associated CpGs
  • XCILinkedWGBSSorted.20221121.cm - Sorted XCI-linked sites

IV. Technical Associates

  • Blacklist.20220304.cm - Problematic genomic regions for filtering (high coverage artifacts, repeats)

Resources

Documentation:

Downloads:

Funding: NIH/NIGMS 5R35GM146978

Files

Files (380.8 MB)

Name Size Download all
md5:90f1f9dc9ebecbbaa7374578eb324193
9.8 kB Download
md5:5d48b97a5d98ac8533b8dc64ef2b5c19
3.1 kB Download
md5:eb50d3b6180c29ab8211e9ee71f562f0
347 Bytes Download
md5:741817a0082a3cedc3d0a729d826cfbb
206.7 kB Download
md5:05b4eb34d1f150fe6b1392ac5f19bb6a
529.4 kB Download
md5:f84d7ece52eed5f83193824365b31fac
7.1 MB Download
md5:9c9b292a1d735a6678735c023f2a6f9b
298 Bytes Download
md5:2022aa05b649669a8ceff11fc39b615b
99 Bytes Download
md5:a2b95dfe540c9a0d0c9b6a668d39a2fa
30.3 MB Download
md5:122772e0f91e0ca9b67ecdc980e1387b
163.0 kB Download
md5:0c03acdc3ece2646e4eeb93c01e5ef75
7.5 MB Download
md5:7f85db8f030ab9aed392b0466193dfef
1.7 kB Download
md5:18e8e6193490dcbfc7ebeb0861f34584
907 Bytes Download
md5:42cc6b515ecefae9bf54c33d59db46ca
26.7 kB Download
md5:28a5c318474b3b69525a07bb8488e0bf
18.5 kB Download
md5:009833d8a7bb18f46119f1978d91430b
3.0 MB Download
md5:d229ad93b186b1ef25e6aa2c5e6ab10c
712 Bytes Download
md5:0a07b27eb1975b0c3c509ddbe26981bd
11.5 MB Download
md5:506d2accb9175e9ab8b714bc2d9d7d49
10.1 MB Download
md5:572fb0311f9bb13c3fd087b9804a771e
7.6 MB Download
md5:a826de908298e80a3dd89f86aca0bdb7
18.3 kB Download
md5:aeb9de1f624ca7dd38cbaa547bcc4a62
490.8 kB Download
md5:975e3006fe3be9bc126a27bc31ccfe08
4.6 MB Download
md5:8bd4095d38d0100b168849b7a5f140e1
361 Bytes Download
md5:4408ae42796acbd0bd39ca2eec160d56
5.7 MB Download
md5:af07e45ef029f0bc74d1c1e180846bc0
1.2 kB Download
md5:cf2f707d1d4b80067ff614cb7ab8751d
463.3 kB Download
md5:f2783610e674103366a58f55ed899990
570.1 kB Download
md5:cc1f47634c1cdd6e130d11c118d32dbe
9.1 MB Download
md5:6a47951f90de4d33d051fca1dc7cad6a
48.0 MB Download
md5:aaeb4cd0c34e5bc12d21f1953256007a
13.8 kB Download
md5:a2f3e269313dc35d515499ef7a993d2d
43.0 MB Download
md5:107ade1ee5bbb3925ffa726bc139038d
13.0 kB Download
md5:6ef6168b7d52f71e7276910cbd0554da
82.2 MB Download
md5:8ab435b6d153f6fcd9cc343c17556c80
23.7 kB Download
md5:e92e9d5be2471dc7335319e7200f4adc
26.9 MB Download
md5:342986b147135752133ef91364848788
22.4 kB Download
md5:ac4868ae97a3fa0190279e6e7871d8e1
52.4 MB Download
md5:d81a5f10dff90786e5298bbdcfe6745d
13.4 kB Download
md5:ed694945023f072e798243880459bbad
28.8 MB Download
md5:abeb6166698da4063a054dba57ade57e
16.4 kB Download
md5:a211f4873295189178b2af294e9a348e
169.1 kB Download
md5:990f6f305aadf8053d960004a6b79372
12.6 kB Download
md5:8e1b64ca0d4ef11ff3a92b8a5d32911c
16.7 kB Download

Additional details

Related works

Is supplement to
Dataset: 10.1126/sciadv.adw3027 (DOI)