Designing realistic regulatory DNA with autoregressive language models
Creators
Description
This repository contains code used to perform all experiments reported in the paper "Designing realistic regulatory DNA with autoregressive language models" by Avantika Lal, David Garfield, Tommaso Biancalani, and Gokcen Eraslan, along with trained model weights and synthetic regulatory elements designed by various methods.
The folder structure is:
- yeast_promoters: Notebooks and models related to the experiments on yeast promoter sequence generation.
- human_enhancers: Notebooks and models related to the experiments on human enhancer sequence generation.
- other_human_models: Notebooks related to the additional models used to validate synthetic human enhancers.
- scripts: Python scripts and functions used in both experiments.
The trained regLM models are:
- yeast_promoters/04_reglm/yeast_reglm.ckpt : regLM model trained on yeast promoter sequences
- human enhancers/04_reglm/human_reglm.ckpt : regLM model trained on human enhancer sequences
The trained regression models are found in the following folders:
- yeast_promoters/02_regression_paired/ : sequence-to-expression regression models for yeast promoters trained on the same data as the regLM model
- yeast_promoters/03_regression_separate/: sequence-to-expression regression models for yeast promoters trained on the separate data from the regLM model
- human enhancers/02_regression_paired/ : sequence-to-expression regression models for human enhancers trained on the same data as the regLM model
- human enhancers/03_regression_separate/: sequence-to-expression regression models for human enhancers trained on the separate data from the regLM model
Code to train, load and test these models is available in the experimental folders.
Files
Files
(15.6 GB)
Name | Size | Download all |
---|---|---|
md5:8ded4a12cc6e99df7057db4245e1450a
|
15.4 GB | Download |
md5:fb54644e7fa0b5ca6ff157bc72cd223e
|
10.9 kB | Download |
md5:04c18bf45a8ef0a72d3404fc45428847
|
7.0 kB | Download |
md5:527f7fdc351ffca94c2a718b1a05add5
|
270.5 MB | Download |