Published May 15, 2023 | Version 1.0
Dataset Open

Datasets for manuscript - Dirichlet diffusion score model for biological sequence generation.

  • 1. University of Texas Southwestern Medical Center

Description

This repository holds the trained Dirichlet Diffusion Score models for various datasets.

best_models.tar.gz

It also contains all input data required to train your own models with scripts provided via github repository.

data.tar.gz

This archive contains the following folders: 

  • satnet_sudoku contains dataset with sudoku examples which we used for evaluation of sudoku model.
  • promoter_design contains dataset used for training promoter design model as well as Sei model weights. Please, read provided readme file before using it for training scripts. 

Notes

This work is supported by Cancer Prevention and Research Institute of Texas grant RR190071, NIH grant DP2GM146336, and the UT Southwestern Endowed Scholars Program.

Files

Files (10.4 GB)

Name Size Download all
md5:8a4940b23e275e52fe237babddc8e18f
199.8 MB Download
md5:aa53f13ddcf1474962548bc02cbb9b93
10.2 GB Download