Published May 17, 2024 | Version v1
Dataset Open

Predicting A/B compartments from histone modifications using deep learning

Authors/Creators

Description

This dataset contains the preprocessed training data for CoRNN (Compartment prediction using Recurrent Neural Networks), a deep learning method that predicts chromatin A/B compartments from histone modification ChIP-seq data.

Dataset Description

The dataset includes histone modification enrichment data and corresponding A/B compartment annotations for six human cell lines (GM12878, K562, IMR90, HUVEC, HMEC, NHEK) at 100kb resolution. The data comprises:

  • Input features: Six histone modification ChIP-seq signals (H3K27ac, H3K36me3, H3K4me1, H3K4me3, H3K9me3, H3K27me3)
  • Target labels: A/B compartment assignments derived from Hi-C eigenvalue analysis
  • Additional features: Mean eigenvector values for cross-cell-type training

Data Format

Data is organized in the 6_cell_input_updated/6_cell_input_updated_100kb/ directory structure, with separate files for each cell line containing genomic bins with histone modification signals and compartment labels.

Usage Instructions

To use this dataset:

  1. Download and extract the dataset to your local machine
  2. Visit the GitHub repository at https://github.com/rsinghlab/CoRNN for:
    • Complete installation instructions and dependencies
    • Detailed usage examples and command-line arguments
    • Model training and evaluation scripts
    • Cross-validation and grid search utilities
  3. Basic training example:
     
    python code/hm2ab.py --data_dir "data/6_cell_input_updated/6_cell_input_updated_100kb/" 
    --task "cla" --model "gru" --epoch 10 --resolution "100kb" 
    --cross_validation True --add_mean_evec True

Citation

If you use this dataset, please cite our paper:

Zheng, S., Thakkar, N., Harris, H.L., et al. "Predicting A/B compartments from histone modifications using deep learning." iScience 27, 109367 (2024). https://doi.org/10.1016/j.isci.2024.109367

Related Resources

For questions about implementation, training procedures, or model architecture, please refer to the GitHub repository documentation and example scripts.

Files

data.zip

Files (680.7 MB)

Name Size Download all
md5:8f7eaac882c87ab6a11def5495af672b
680.7 MB Preview Download

Additional details

Identifiers

Software