Published January 12, 2022 | Version 1.0.3
Dataset Open

The adapted Activity-By-Contact model for enhancer-gene assignment and its application to single-cell data

  • 1. Institute of Cardiovascular Regeneration, Goethe University and University Hospital Frankfurt
  • 2. School of Computation, Information and Technology, Technical University of Munich

Description

In our work, we implemented the ABC-model and could show that one assay for measuring the openness of enhancers is sufficient. Further, we propose a generalised calculation of the ABC-score, which describes enhancer activity in a gene-specific manner, and which includes all TSS, without requiring any additional data. We combined our implementation of the ABC-score with an approach to quantify TF binding affinity into STARE: a framework to derive TF affinities to genes. STARE was also designed for potential application on single-cell data. You can find the code in our GitHub repository and more details in our publication.

We provide the data for the validation of our ABC-implementation on two CRISPR-screens. We also provide the results of our analysis of single-cell data of the human heart with STARE. All data is in hg19.

Content:

  • CRISPRi_screens: One file for each CRISPRi-screen with interactions that were used to plot precision-recall curves, containing columns for different ABC scoring versions.
  • Enformer: Similar to the CRISPRi_screens, but containing columns for different calculations for Enformer's predicted expression change upon in silico mutagenesis of the enhancer region.
  • K562_CandidateEnhancer: K562 enhancer with the 4th column for enhancer activity, one file for each activity representation that was measured.
  • K562_ABC_Predictions: Regular ABC-scores and generalised ABC-scores for each activity measurement. The files contain all scored interactions for a 10MB window, without any cut-off. We also included the results of the implementation of the ABC-score of Fulco et al. (2019).
  • STARE_Hocker_*: Whole STARE output for human heart single-cell data, one for regular ABC, generalised ABC, generalised ABC with average Hi-C matrix and one based on co-accessibility analysis. All approaches were run with a 5 MB window (except for GeneralisedABC500kb), the ABC-based runs with a score cut-off of 0.02. Each folder contains two subdirectories, one for the ABC-scoring and one for the Gene-TF affinity matrices. The 'ABC_output' also contains a GeneInfo file for each cell type, summarising different attributes per gene.
  • INVOKE_Hocker_*: Folder with the input and output of INVOKE (see https://github.com/schulzlab/tepic), based on the STARE runs. CS genes stands for cell type-specific genes, defined as genes with a z-score across cell types of ≥ 2 and TPM ≥ 0.5. The INVOKE commands were as follows:
    • Rscript INVOKE.R --dataDir=<TF-Gene matrix> --outDir=<out_path> --response=Expression --regularization=E --performance=TRUE --outerCV=10 --seed=1234

Importantly, the results are based on data from the following publications:

  • CRISPRi-screens:
    • Gasperini, Molly, Andrew J. Hill, José L. McFaline-Figueroa, Beth Martin, Seungsoo Kim, Melissa D. Zhang, Dana Jackson, et al. “A Genome-Wide Framework for Mapping Gene Regulation via Cellular Genetic Screens.” Cell 176, no. 1–2 (January 2019): 377-390.e19. https://doi.org/10.1016/j.cell.2018.11.029.
    • Schraivogel, Daniel, Andreas R. Gschwind, Jennifer H. Milbank, Daniel R. Leonce, Petra Jakob, Lukas Mathur, Jan O. Korbel, Christoph A. Merten, Lars Velten, and Lars M. Steinmetz. “Targeted Perturb-Seq Enables Genome-Scale Genetic Screens in Single Cells.” Nature Methods 17, no. 6 (June 2020): 629–35. https://doi.org/10.1038/s41592-020-0837-5.

    • Fulco, Charles P., Joseph Nasser, Thouis R. Jones, Glen Munson, Drew T. Bergman, Vidya Subramanian, Sharon R. Grossman, et al. “Activity-by-Contact Model of Enhancer–Promoter Regulation from Thousands of CRISPR Perturbations.” Nature Genetics 51, no. 12 (December 2019): 1664–69. https://doi.org/10.1038/s41588-019-0538-0.

  • Enformer model: Avsec, Žiga, Vikram Agarwal, Daniel Visentin, Joseph R. Ledsam, Agnieszka Grabska-Barwinska, Kyle R. Taylor, Yannis Assael, John Jumper, Pushmeet Kohli, and David R. Kelley. “Effective Gene Expression Prediction from Sequence by Integrating Long-Range Interactions.” Nature Methods 18, no. 10 (October 2021): 1196–1203. https://doi.org/10.1038/s41592-021-01252-x.
  • K562 predictions and average Hi-C matrix: Fulco, Charles P., Joseph Nasser, Thouis R. Jones, Glen Munson, Drew T. Bergman, Vidya Subramanian, Sharon R. Grossman, et al. “Activity-by-Contact Model of Enhancer–Promoter Regulation from Thousands of CRISPR Perturbations.” Nature Genetics 51, no. 12 (December 2019): 1664–69. https://doi.org/10.1038/s41588-019-0538-0.
  • Hi-C matrix for K562 predictions: Rao, S. et al. (2014). A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping. Cell, 159(7), 1665–1680
  • STARE and INVOKE runs: Hocker, J. D. et al. (2021). Cardiac cell type–specific gene regulatory programs and disease risk association. Science Advances, 7(20), eabf1444
  • H3K27ac HiChIP for STARE runs: Anene-Nzelu, C. G. et al. (2020). Assigning Distal Genomic Enhancers to Cardiac Disease–Causing Genes. Circulation, 142(9), 910–912
  • INVOKE software: Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction Schmidt et al., Nucleic Acids Research 2016; doi: 10.1093/nar/gkw1061

 

Files

CRISPRi_screens.zip

Files (22.9 GB)

Name Size Download all
md5:16a78a74ce98f20d7cf442c1198c4a8d
3.6 MB Preview Download
md5:3db69dc7ad4042cfbc60f283aabbcdd7
633.0 kB Preview Download
md5:93e0ea135e2779dda638094939eb31f9
1.5 GB Preview Download
md5:1513ca9e25b2dbf79e563c0efe749c42
6.6 MB Preview Download
md5:5c582d9de247c04d2aa121c2664704c7
141.2 MB Preview Download
md5:a47deb15dff2f8ad959ed342b2dacbdf
5.9 GB Preview Download
md5:ce95fb2d0fd4aada5f0fbdedbdeba621
6.2 GB Preview Download
md5:fca832889496b7372fb957a1029b928a
834.4 MB Preview Download
md5:1fcb70aa2f2dc782c2708e820e4a6291
2.1 GB Preview Download
md5:c0bcf8e8c5bea88c4ce4dbadd152972c
2.0 GB Preview Download
md5:d72631309629709b562ec51cd158c7e1
2.0 GB Preview Download
md5:5658da027a8d6b8a95c7e61b29be769c
2.3 GB Preview Download

Additional details

Related works

Is cited by
Preprint: 10.1101/2022.01.28.478202 (DOI)