Predicting A/B compartments from histone modifications using deep learning
Authors/Creators
Description
This dataset contains the preprocessed training data for CoRNN (Compartment prediction using Recurrent Neural Networks), a deep learning method that predicts chromatin A/B compartments from histone modification ChIP-seq data.
Dataset Description
The dataset includes histone modification enrichment data and corresponding A/B compartment annotations for six human cell lines (GM12878, K562, IMR90, HUVEC, HMEC, NHEK) at 100kb resolution. The data comprises:
- Input features: Six histone modification ChIP-seq signals (H3K27ac, H3K36me3, H3K4me1, H3K4me3, H3K9me3, H3K27me3)
- Target labels: A/B compartment assignments derived from Hi-C eigenvalue analysis
- Additional features: Mean eigenvector values for cross-cell-type training
Data Format
Data is organized in the 6_cell_input_updated/6_cell_input_updated_100kb/ directory structure, with separate files for each cell line containing genomic bins with histone modification signals and compartment labels.
Usage Instructions
To use this dataset:
- Download and extract the dataset to your local machine
- Visit the GitHub repository at https://github.com/rsinghlab/CoRNN for:
- Complete installation instructions and dependencies
- Detailed usage examples and command-line arguments
- Model training and evaluation scripts
- Cross-validation and grid search utilities
- Basic training example:
python code/hm2ab.py --data_dir "data/6_cell_input_updated/6_cell_input_updated_100kb/" --task "cla" --model "gru" --epoch 10 --resolution "100kb" --cross_validation True --add_mean_evec True
Citation
If you use this dataset, please cite our paper:
Zheng, S., Thakkar, N., Harris, H.L., et al. "Predicting A/B compartments from histone modifications using deep learning." iScience 27, 109367 (2024). https://doi.org/10.1016/j.isci.2024.109367
Related Resources
- GitHub Repository: https://github.com/rsinghlab/CoRNN
- Paper: https://www.sciencedirect.com/science/article/pii/S2589004224007922
For questions about implementation, training procedures, or model architecture, please refer to the GitHub repository documentation and example scripts.
Files
data.zip
Files
(680.7 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:8f7eaac882c87ab6a11def5495af672b
|
680.7 MB | Preview Download |
Additional details
Identifiers
- PMID
- 38646172
Software
- Repository URL
- https://github.com/rsinghlab/CoRNN