Iterative improvement of deep learning models using synthetic regulatory genomics
- 1. Institute for Systems Genetics, NYU Grossman School of Medicine
- 2. Department of Pathology, NYU Grossman School of Medicine
Description
Model weights for three fine-tuned Enformer models trained on experimentally evaluated synthetic constructs delivered to genomic loci. We developed a fine-tuning strategy to improve performance by incorporating synthetic regulatory genomics datasets. We added a new independent output layer that uses the baseline Enformer feature extraction trunk to predict our synthetic assays expression data. The new output layer is composed of a self-attention layer to capture relevant features independently of position and a dense layer to combine the resulting signal into a single prediction value. We evaluated three configurations of our new output self-attention layer: SingleHead 64/64, SingleHead 64/128, and MultiHead 64/64. SingleHead 64/64 applies a single projection of 64 key and value matrices, SingleHead 64/128 applies a single projection of 64 key and 128 value matrices, and MultiHead 64/64 applies four independent projections of 64 key and value matrices. Under `finetune_weights.zip` are included the final interaction weights of all three configurations.
We also included weights for a new output head trained to predict CAGE signal for five mESC assays under `mESC_cage_weights.zip`. Previously published mESC CAGE tracks CNhs14104 and CNhs14109 were taken from the FANTOM website at https://fantom.gsc.riken.jp/5/datafiles/reprocessed/mm10_latest/basic/mouse.timecourse.hCAGE/ (Fraser et al. 2015), and GSM3852792, GSM3852793, and GSM3852794 were taken from GEO series GSE132191 (Bonetti et al. 2020).
Files
finetune_weights.zip
Files
(2.6 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:004a43fc16c6d584ce277fcacd851b69
|
2.6 GB | Preview Download |
|
md5:9637d6f770f6a86b317f1c6c2b86bb46
|
58.6 kB | Preview Download |
Additional details
Related works
- Is supplement to
- Preprint: 10.1101/2025.02.04.636130 (DOI)
Funding
- National Institutes of Health
- CEGS: Center for Synthetic Regulatory Genomics - Renewal 5RM1HG009491-07
- National Institutes of Health
- Dissection of noncoding repeats in psychiatric genetics using synthetic regulatory genomics - Resubmission 1R01MH136353-01A1
Software
- Repository URL
- https://github.com/mauranolab/finetune-enformer
- Programming language
- Python