Code - Exonic enhancers are a widespread class of dual-function regulatory elements
Authors/Creators
Description
This is the Github repository of ExonEnhancer.
Full Changelog: https://github.com/benoitballester/ExonEnhancer/commits/v2
This repository contains the scripts used for data curation, analyses and figures generation [Zenodo Data] in our manuscript: "Exonic enhancers are a widespread class of dual-function regulatory elements."
In this study, we explore the role of exonic regions in gene regulation in four species. We identify and characterize many protein-coding exons as candidate exonic enhancers (cEEs), a previously underappreciated class of cis-regulatory elements embedded within exons. By integrating TF ChIP-seq, chromatin accessibility data (DNase-seq/ATAC-seq), high-throughput enhancer-reporter assays (STARR-seq, luciferase), and CRISPR-based validations, we show that exonic enhancers (EEs) play crucial roles in gene regulatory networks while retaining their protein-coding function.
Supplementary Data is available on Zenodo.
Key Findings
-
Identification of EEs Across Multiple Species
- Systematic discovery of EEs using TF ChIP-seq, chromatin accessibility, and STARR-seq data.
- Many protein-coding exons exhibit enhancer activity.
-
Dual Coding and Regulatory Roles
- EEs retain protein-coding functions while simultaneously acting as cis-regulatory elements.
- Both synonymous and nonsynonymous variants can disrupt EE activity and downstream gene expression.
-
Long-Range Interactions and Target Gene Regulation
- Promoter capture Hi-C and eQTL analyses confirm interactions between EEs and gene promoters.
- CRISPR-based inactivation of EEs demonstrates regulatory effects on both host and distal target genes.
-
Clinical and Evolutionary Implications
- Pan-cancer (TCGA) analyses show that mutations in EEs correlate with altered gene expression and clinical outcomes.
- Evolutionary conservation indicates that EEs are functionally constrained yet contribute to species-specific regulatory innovations.
Repository Contents
Below is a brief description of each folder, organized by thematic for easy navigation and reproducibility :
-
EE_selection/
Scripts used to define candidate exonic enhancers (cEEs) based on TF ChIP-seq peaks and additional filtering criteria. -
Control_selection/
Procedures for generating negative/positive control sets, ensuring unbiased comparisons with cEEs. -
Chromatin_accessibility/
Scritps to curate DNase-seq, ATAC-seq and histones marks datasets assessing open chromatin in exons and cEEs across multiple species. -
Conservation_and_structure/
Scripts for multi-species and pairwise alignment, phyloP scores, AlphaFold predictions, MobiDB disorder rates and gene-age analyses for cEEs. -
Randomisation/
Scripts used to validate enrichment within cEEs through randomization and permutation tests. -
TFBS_in_EE/
Motif analysis pipeline (e.g., JASPAR-based TFBS predictions) overlapping with candidate exonic enhancers. -
STARR-seq_experiment/
Analysis scripts for STARR-seq data, including the reads pipeline and SNPs analysis. -
STARR-seq_catalog/
Data curation of STARR-seq peaks from public data sources, as well as cEEs biotype signature definition. -
G-quadruplex/
Scripts examining G4-forming sequences (G-quadruplex) in cEEs vs. control exons. -
Interaction_data/
Integration of promoter capture Hi-C, eQTL (GTEx), and ENCODE-rE2G resources to identify robust cEE–target gene interactions. -
gnomADv3_analysis/
Variant filtering/annotation pipelines and constraint analyses for common variants in gnomAD v3 that intersect with candidate exonic enhancers. -
GWAS_analysis/
Overlaps of known GWAS loci with cEEs to reveal potential trait- and disease-associated variants within coding enhancer regions. -
PanCancer_analysis/
Curation and analysis scripts (differential expression and survival analysis) for the TCGA PanCanAtlas. -
Genes_specificity/
Scripts used to infer the cEEs host/target genes expression tendencies. -
CRISPRi_plots/
Scripts used to generate the CRISPRi figures. -
UCSC_trackhub/
Configurations for easily visualizing cEEs, TF binding, and variant positions in the UCSC Genome Browser.
Usage & Reproducibility
- Environment Requirements
- Scripts were primarily run on Python (>= 3.9) and R (>= 4.0).
- Required dependencies include common genomic libraries (e.g., Bioconductor packages in R like
GenomicRanges) and Python packages likepandas,numpy, andpybedtools.
How to Cite
If you use this dataset, software, or any derived resources, please cite:
Mouren, J.-C., Torres, M., van Ouwerkerk, A., Manosalva, I., Gallardo, F., Spicuglia, S., & Ballester, B.
“Supplementary Data — Exonic enhancers are a widespread class of dual-function regulatory elements.”
Zenodo. https://doi.org/10.5281/zenodo.15079251
And the corresponding manuscript:
Mouren, J.-C., Torres, M., van Ouwerkerk, A., Manosalva, I., Gallardo, F., Spicuglia, S., & Ballester, B.
“Exonic enhancers are a widespread class of dual-function regulatory elements.” (Manuscript in preparation.)
Contact
For questions regarding the data, scripts, or methods, please contact:
Jean-Christophe Mouren & Benoit Ballester
Aix Marseille University, INSERM, TAGC, UMR 1090
Emails:
Files
benoitballester/ExonEnhancer-v2.zip
Files
(5.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:f874777eb112eae11cb859cf61f1b811
|
5.5 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/benoitballester/ExonEnhancer/tree/v2 (URL)
Software
- Repository URL
- https://github.com/benoitballester/ExonEnhancer