Published 2025 | Version v3
Peer review Open

Designing synthetic regulatory elements using DNA-Diffusion, a generative AI framework

  • 1. EDMO icon Harvard University
  • 2. ROR icon Massachusetts General Hospital
  • 3. Broad Institute

Description

File: supplementary_tables.zip

Manuscript Supplementary tables a-g

 

Supplementary Data 1

File: JASPAR_motifs.txt.zip

Description: JASPAR MOODS  Motif Matches by Sequence ID and Cell Type in Training, Validation, Test, and Generated Sets (with Positions and Scores)

 

Supplementary Data 2

File: Co_occurence_motifs_JASPAR.zip

Description: MOODS Motifs Co-occurrence for Training, Test, Validation and Generated sequences separated by cell-type (GM12878, HepG2, K562)

 

Supplementary Data 3

File: Enformer_and_Chrombpnet_predictions_enhancer_promoters_different_loci.zip

Description: Tables containing Enformer and ChromBPNet (DNase, CAGE, H3K4me3) predictions for enhancer and genes promoter in different genomic loci.

 

Supplementary Data 4

File: EXTRASEQ_counts.tab.zip

Description: EXTRA-Seq barcode raw counts for mRNA and DNA for five different replicates

 

Supplementary Data 5

File: DNA-Diffusion.pt

DNA-Diffusion PyTorch weights  (see: https://github.com/pinellolab/DNA-Diffusion ) 

 

Supplementary Data 6

File: mpra_model_best-epoch_07_val_MalinoisMPRA_mean_SpearmanR_0_73185.ckpt

Weights of the predictor MPRA model trained with the Malinois MPRA data.

 

Supplementary Data 7

File: DeepMEL_training.zip

Weights and code to train and sample deepMEL trained in  the DHS index data for GM12878, K562, and GM12878

 

Supplementary Data 8

File: wgan_training.zip

Weights and code to train the different WGAN in different cell types. DeepMEL was trained in  the DHS index data. One model per cell:  GM12878, K562, and GM12878

 

Supplementary Data 9

File: CODA_training.zip

Weights and code to train and sample CODA trained in the DHS index data for GM12878, K562, and GM12878

 

Supplementary Data 10

File: EXTRASEQ_AND_STARRSEQ_R_mpra_voom_scripts.zip

Scripts to process EXTRA-Seq and STARR-Seq data from DNA and mRNA counts used to compute the  mRNA/DNA log fold change using the different experiment replicates.

 

Supplementary Data 11

File: DNA-Diffusion-0.0.2.zip

DNA-Diffusion library source code

 

Supplementary Data 12

MPRA_predictions.txt

 

MPRA predictions for the different cell types

 

 

 

Files

Co_occurence_motifs_JASPAR.zip

Files (7.0 GB)

Name Size Download all
md5:0a26b747e73e4717662a887168d2c574
2.8 GB Download
md5:6fc0e2019d889e82dcecb153b401c604
105.5 MB Preview Download
md5:0013ccde5e121d73a341e6cd5f303f61
956.3 MB Preview Download
md5:7c9b8947a9d7ffb22d8719e9817ff6a4
38.1 MB Preview Download
md5:b6e54edfc308e6834c7f4e2dd63bce9e
48.8 MB Preview Download
md5:6724df51f2007b6170dcdc11a3fb9714
1.5 GB Download
md5:77db5157ea111f07446107e5c84de17c
426.2 MB Preview Download
md5:7138ed833870202b7af0cb50bb66214a
5.5 kB Preview Download
md5:9530e11640d92f90fd2739ce80dbc66a
11.1 MB Preview Download
md5:20617d74f3debb68b40837c0f01502c3
482.6 MB Preview Download
md5:b700ce077ca24608456403a472cfd0a5
142.8 MB Preview Download
md5:9b11aa5dc72a58e97d1853a64f6e3067
59.9 kB Download
md5:608184500c533d0295a3ccc48726a405
411.0 MB Preview Download

Additional details

Software

Repository URL
https://github.com/pinellolab/DNA-Diffusion
Programming language
Python , Jupyter Notebook
Development Status
Active