Published June 1, 2024 | Version v1
Dataset Open

RepliChrom: Interpretable machine learning predicts cancer-associated enhancer-promoter interactions using DNA replication timing

  • 1. ROR icon Nanyang Technological University
  • 2. University of Electronic Science and Technology of China,
  • 3. ROR icon University of Electronic Science and Technology of China

Description

This dataset accompanies the study "RepliChrom: Interpretable machine learning predicts cancer-associated enhancer-promoter interactions using DNA replication timing". The study introduces RepliChrom, a computational framework designed to predict enhancer–promoter interactions (EPIs) by leveraging multi-scale replication timing (RT) signals. This approach addresses the fundamental challenge of distinguishing gene targets regulated by distal enhancers from those activated by proximal transcriptional activity-a key problem in understanding the causal basis of complex diseases.

Despite recent advances in high-throughput technologies such as Hi-C, ChIA-PET, and Hi-TrAC that allow genome-wide reconstruction of 3D chromatin architecture, the role of DNA replication timing in mediating these spatial interactions remains underexplored. RepliChrom fills this gap by using cell-type-specific RT profiles as predictive features for chromatin interaction inference.

To support model development, training, and evaluation, we provide a comprehensive multi-omics dataset covering six human cell lines (K562, GM12878, HeLaS3, IMR90, NHEK, and HUVEC ), encompassing:

Hi-C datasets: Processed chromatin interaction loops used to define positive and negative enhancer–promoter interaction pairs.

ChIA-PET datasets: Interaction data anchored around transcription factor binding, including POLR2A and CTCF, used for model validation across different interaction types.

Hi-TrAC datasets: Targeted chromatin accessibility-derived interaction data, offering complementary validation of the model on alternate platforms.

Replication Timing (RT) data: Processed RT signal profiles for each cell line, used to extract multi-scale temporal features as inputs for RepliChrom.

These datasets enable reproducibility of the model training process and serve as benchmark resources for future research into DNA replication–mediated regulation of 3D genome architecture.

Included Files

Hi-C_datasets.zip (2.23 MB): Processed Hi-C interaction pairs for six cell types.

ChIA-PET_datasets.zip (818.18 KB): CTCF and POLR2A ChIA-PET interactions across multiple lines.

Hi-TrAC_datasets.zip (196 bytes): Hi-TrAC-based chromatin interaction training datasets across multiple lines.

Cellline_RT_data.zip (86.17 MB): Replication timing signal data across multiple human cell types for multi-scale replication timing feature extraction.

Usage
All datasets are intended for academic, non-commercial use. The provided files can be directly used to reproduce the training and evaluation of RepliChrom, and may also support broader applications in enhancer–promoter modeling, replication-timing analysis, and 3D genomics studies. For detailed usage instructions and code implementation, please refer to the GitHub repository: https://github.com/DaoFuying/RepliChrom

Files

Cellline_RT_data.zip

Files (89.2 MB)

Name Size Download all
md5:c567bde62e931b12c9cdf92b8063e7dd
86.2 MB Preview Download
md5:2b72b8daee1f773f6e677a1c25b556f5
818.2 kB Preview Download
md5:3d05dc281d21e80f99c7108b129ef53b
2.2 MB Preview Download
md5:0546039ddbe4f37896fba353883f8bb3
196 Bytes Preview Download

Additional details

References

  • Rao SS, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, Sanborn AL, Machol I, Omer AD, Lander ES, Aiden EL. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014 Dec 18;159(7):1665-80. doi: 10.1016/j.cell.2014.11.021. Epub 2014 Dec 11. Erratum in: Cell. 2015 Jul 30;162(3):687-8. PMID: 25497547; PMCID: PMC5635824.
  • Liu, S., Cao, Y., Cui, K. et al. Hi-TrAC reveals division of labor of transcription factors in organizing chromatin loops. Nat Commun 13, 6679 (2022). https://doi.org/10.1038/s41467-022-34276-8
  • Tang Z, Luo OJ, Li X, Zheng M, Zhu JJ, Szalaj P, Trzaskoma P, Magalska A, Wlodarczyk J, Ruszczycki B et al. 2015. CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription. Cell 163: 1611-1627.
  • Sanyal, A., Lajoie, B., Jain, G. et al. The long-range interaction landscape of gene promoters. Nature 489, 109–113 (2012). https://doi.org/10.1038/nature11279
  • Ryba T, Battaglia D, Chang BH, Shirley JW, Buckley Q, Pope BD, Devidas M, Druker BJ, Gilbert DM. 2012. Abnormal developmental control of replication-timing domains in pediatric acute lymphoblastic leukemia. Genome Res 22: 1833-1844.