RepliChrom: Interpretable machine learning predicts cancer-associated enhancer-promoter interactions using DNA replication timing
Authors/Creators
Description
This dataset accompanies the study "RepliChrom: Interpretable machine learning predicts cancer-associated enhancer-promoter interactions using DNA replication timing". The study introduces RepliChrom, a computational framework designed to predict enhancer–promoter interactions (EPIs) by leveraging multi-scale replication timing (RT) signals. This approach addresses the fundamental challenge of distinguishing gene targets regulated by distal enhancers from those activated by proximal transcriptional activity-a key problem in understanding the causal basis of complex diseases.
Despite recent advances in high-throughput technologies such as Hi-C, ChIA-PET, and Hi-TrAC that allow genome-wide reconstruction of 3D chromatin architecture, the role of DNA replication timing in mediating these spatial interactions remains underexplored. RepliChrom fills this gap by using cell-type-specific RT profiles as predictive features for chromatin interaction inference.
To support model development, training, and evaluation, we provide a comprehensive multi-omics dataset covering six human cell lines (K562, GM12878, HeLaS3, IMR90, NHEK, and HUVEC ), encompassing:
Hi-C datasets: Processed chromatin interaction loops used to define positive and negative enhancer–promoter interaction pairs.
ChIA-PET datasets: Interaction data anchored around transcription factor binding, including POLR2A and CTCF, used for model validation across different interaction types.
Hi-TrAC datasets: Targeted chromatin accessibility-derived interaction data, offering complementary validation of the model on alternate platforms.
Replication Timing (RT) data: Processed RT signal profiles for each cell line, used to extract multi-scale temporal features as inputs for RepliChrom.
These datasets enable reproducibility of the model training process and serve as benchmark resources for future research into DNA replication–mediated regulation of 3D genome architecture.
Included Files
Hi-C_datasets.zip (2.23 MB): Processed Hi-C interaction pairs for six cell types.
ChIA-PET_datasets.zip (818.18 KB): CTCF and POLR2A ChIA-PET interactions across multiple lines.
Hi-TrAC_datasets.zip (196 bytes): Hi-TrAC-based chromatin interaction training datasets across multiple lines.
Cellline_RT_data.zip (86.17 MB): Replication timing signal data across multiple human cell types for multi-scale replication timing feature extraction.
Usage
All datasets are intended for academic, non-commercial use. The provided files can be directly used to reproduce the training and evaluation of RepliChrom, and may also support broader applications in enhancer–promoter modeling, replication-timing analysis, and 3D genomics studies. For detailed usage instructions and code implementation, please refer to the GitHub repository: https://github.com/DaoFuying/RepliChrom
Files
Cellline_RT_data.zip
Additional details
Software
- Repository URL
- https://github.com/DaoFuying/RepliChrom
References
- Rao SS, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, Sanborn AL, Machol I, Omer AD, Lander ES, Aiden EL. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014 Dec 18;159(7):1665-80. doi: 10.1016/j.cell.2014.11.021. Epub 2014 Dec 11. Erratum in: Cell. 2015 Jul 30;162(3):687-8. PMID: 25497547; PMCID: PMC5635824.
- Liu, S., Cao, Y., Cui, K. et al. Hi-TrAC reveals division of labor of transcription factors in organizing chromatin loops. Nat Commun 13, 6679 (2022). https://doi.org/10.1038/s41467-022-34276-8
- Tang Z, Luo OJ, Li X, Zheng M, Zhu JJ, Szalaj P, Trzaskoma P, Magalska A, Wlodarczyk J, Ruszczycki B et al. 2015. CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription. Cell 163: 1611-1627.
- Sanyal, A., Lajoie, B., Jain, G. et al. The long-range interaction landscape of gene promoters. Nature 489, 109–113 (2012). https://doi.org/10.1038/nature11279
- Ryba T, Battaglia D, Chang BH, Shirley JW, Buckley Q, Pope BD, Devidas M, Druker BJ, Gilbert DM. 2012. Abnormal developmental control of replication-timing domains in pediatric acute lymphoblastic leukemia. Genome Res 22: 1833-1844.