The adapted Activity-By-Contact model for enhancer-gene assignment and its application to single-cell data
Authors/Creators
- 1. Institute of Cardiovascular Regeneration, Goethe University and University Hospital Frankfurt
- 2. School of Computation, Information and Technology, Technical University of Munich
Description
In our work, we implemented the ABC-model and could show that one assay for measuring the openness of enhancers is sufficient. Further, we propose a generalised calculation of the ABC-score, which describes enhancer activity in a gene-specific manner, and which includes all TSS, without requiring any additional data. We combined our implementation of the ABC-score with an approach to quantify TF binding affinity into STARE: a framework to derive TF affinities to genes. STARE was also designed for potential application on single-cell data. You can find the code in our GitHub repository and more details in our publication.
We provide the data for the validation of our ABC-implementation on two CRISPR-screens. We also provide the results of our analysis of single-cell data of the human heart with STARE. All data is in hg19.
Content:
- CRISPRi_screens: One file for each CRISPRi-screen with interactions that were used to plot precision-recall curves, containing columns for different ABC scoring versions.
- Enformer: Similar to the CRISPRi_screens, but containing columns for different calculations for Enformer's predicted expression change upon in silico mutagenesis of the enhancer region.
- K562_CandidateEnhancer: K562 enhancer with the 4th column for enhancer activity, one file for each activity representation that was measured.
- K562_ABC_Predictions: Regular ABC-scores and generalised ABC-scores for each activity measurement. The files contain all scored interactions for a 10MB window, without any cut-off. We also included the results of the implementation of the ABC-score of Fulco et al. (2019).
- STARE_Hocker_*: Whole STARE output for human heart single-cell data, one for regular ABC, generalised ABC, generalised ABC with average Hi-C matrix and one based on co-accessibility analysis. All approaches were run with a 5 MB window (except for GeneralisedABC500kb), the ABC-based runs with a score cut-off of 0.02. Each folder contains two subdirectories, one for the ABC-scoring and one for the Gene-TF affinity matrices. The 'ABC_output' also contains a GeneInfo file for each cell type, summarising different attributes per gene.
- INVOKE_Hocker_*: Folder with the input and output of INVOKE (see https://github.com/schulzlab/tepic), based on the STARE runs. CS genes stands for cell type-specific genes, defined as genes with a z-score across cell types of ≥ 2 and TPM ≥ 0.5. The INVOKE commands were as follows:
- Rscript INVOKE.R --dataDir=<TF-Gene matrix> --outDir=<out_path> --response=Expression --regularization=E --performance=TRUE --outerCV=10 --seed=1234
Importantly, the results are based on data from the following publications:
- CRISPRi-screens:
- Gasperini, Molly, Andrew J. Hill, José L. McFaline-Figueroa, Beth Martin, Seungsoo Kim, Melissa D. Zhang, Dana Jackson, et al. “A Genome-Wide Framework for Mapping Gene Regulation via Cellular Genetic Screens.” Cell 176, no. 1–2 (January 2019): 377-390.e19. https://doi.org/10.1016/j.cell.2018.11.029.
-
Schraivogel, Daniel, Andreas R. Gschwind, Jennifer H. Milbank, Daniel R. Leonce, Petra Jakob, Lukas Mathur, Jan O. Korbel, Christoph A. Merten, Lars Velten, and Lars M. Steinmetz. “Targeted Perturb-Seq Enables Genome-Scale Genetic Screens in Single Cells.” Nature Methods 17, no. 6 (June 2020): 629–35. https://doi.org/10.1038/s41592-020-0837-5.
-
Fulco, Charles P., Joseph Nasser, Thouis R. Jones, Glen Munson, Drew T. Bergman, Vidya Subramanian, Sharon R. Grossman, et al. “Activity-by-Contact Model of Enhancer–Promoter Regulation from Thousands of CRISPR Perturbations.” Nature Genetics 51, no. 12 (December 2019): 1664–69. https://doi.org/10.1038/s41588-019-0538-0.
- Enformer model: Avsec, Žiga, Vikram Agarwal, Daniel Visentin, Joseph R. Ledsam, Agnieszka Grabska-Barwinska, Kyle R. Taylor, Yannis Assael, John Jumper, Pushmeet Kohli, and David R. Kelley. “Effective Gene Expression Prediction from Sequence by Integrating Long-Range Interactions.” Nature Methods 18, no. 10 (October 2021): 1196–1203. https://doi.org/10.1038/s41592-021-01252-x.
- K562 predictions and average Hi-C matrix: Fulco, Charles P., Joseph Nasser, Thouis R. Jones, Glen Munson, Drew T. Bergman, Vidya Subramanian, Sharon R. Grossman, et al. “Activity-by-Contact Model of Enhancer–Promoter Regulation from Thousands of CRISPR Perturbations.” Nature Genetics 51, no. 12 (December 2019): 1664–69. https://doi.org/10.1038/s41588-019-0538-0.
- Hi-C matrix for K562 predictions: Rao, S. et al. (2014). A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping. Cell, 159(7), 1665–1680
- STARE and INVOKE runs: Hocker, J. D. et al. (2021). Cardiac cell type–specific gene regulatory programs and disease risk association. Science Advances, 7(20), eabf1444
- H3K27ac HiChIP for STARE runs: Anene-Nzelu, C. G. et al. (2020). Assigning Distal Genomic Enhancers to Cardiac Disease–Causing Genes. Circulation, 142(9), 910–912
- INVOKE software: Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction Schmidt et al., Nucleic Acids Research 2016; doi: 10.1093/nar/gkw1061
Files
CRISPRi_screens.zip
Files
(22.9 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:16a78a74ce98f20d7cf442c1198c4a8d
|
3.6 MB | Preview Download |
|
md5:3db69dc7ad4042cfbc60f283aabbcdd7
|
633.0 kB | Preview Download |
|
md5:93e0ea135e2779dda638094939eb31f9
|
1.5 GB | Preview Download |
|
md5:1513ca9e25b2dbf79e563c0efe749c42
|
6.6 MB | Preview Download |
|
md5:5c582d9de247c04d2aa121c2664704c7
|
141.2 MB | Preview Download |
|
md5:a47deb15dff2f8ad959ed342b2dacbdf
|
5.9 GB | Preview Download |
|
md5:ce95fb2d0fd4aada5f0fbdedbdeba621
|
6.2 GB | Preview Download |
|
md5:fca832889496b7372fb957a1029b928a
|
834.4 MB | Preview Download |
|
md5:1fcb70aa2f2dc782c2708e820e4a6291
|
2.1 GB | Preview Download |
|
md5:c0bcf8e8c5bea88c4ce4dbadd152972c
|
2.0 GB | Preview Download |
|
md5:d72631309629709b562ec51cd158c7e1
|
2.0 GB | Preview Download |
|
md5:5658da027a8d6b8a95c7e61b29be769c
|
2.3 GB | Preview Download |
Additional details
Related works
- Is cited by
- Preprint: 10.1101/2022.01.28.478202 (DOI)