Published May 17, 2024 | Version v1

Predicting A/B compartments from histone modifications using deep learning

Authors/Creators

Description

This dataset contains the preprocessed training data for CoRNN (Compartment prediction using Recurrent Neural Networks), a deep learning method that predicts chromatin A/B compartments from histone modification ChIP-seq data.

Dataset Description

The dataset includes histone modification enrichment data and corresponding A/B compartment annotations for six human cell lines (GM12878, K562, IMR90, HUVEC, HMEC, NHEK) at 100kb resolution. The data comprises:

  • Input features: Six histone modification ChIP-seq signals (H3K27ac, H3K36me3, H3K4me1, H3K4me3, H3K9me3, H3K27me3)
  • Target labels: A/B compartment assignments derived from Hi-C eigenvalue analysis
  • Additional features: Mean eigenvector values for cross-cell-type training

Data Format

Data is organized in the 6_cell_input_updated/6_cell_input_updated_100kb/ directory structure, with separate files for each cell line containing genomic bins with histone modification signals and compartment labels.

Usage Instructions

To use this dataset:

  1. Download and extract the dataset to your local machine
  2. Visit the GitHub repository at https://github.com/rsinghlab/CoRNN for:
    • Complete installation instructions and dependencies
    • Detailed usage examples and command-line arguments
    • Model training and evaluation scripts
    • Cross-validation and grid search utilities
  3. Basic training example:
     
    python code/hm2ab.py --data_dir "data/6_cell_input_updated/6_cell_input_updated_100kb/" 
    --task "cla" --model "gru" --epoch 10 --resolution "100kb" 
    --cross_validation True --add_mean_evec True

Citation

If you use this dataset, please cite our paper:

Zheng, S., Thakkar, N., Harris, H.L., et al. "Predicting A/B compartments from histone modifications using deep learning." iScience 27, 109367 (2024). https://doi.org/10.1016/j.isci.2024.109367

Related Resources

For questions about implementation, training procedures, or model architecture, please refer to the GitHub repository documentation and example scripts.

Files

data.zip

Files (680.7 MB)

Name Size
md5:8f7eaac882c87ab6a11def5495af672b
680.7 MB Preview Download

Additional details

Identifiers

Software