Published March 11, 2022 | Version v1
Dataset Open

Data for "Generative and interpretable machine learning for aptamer design and analysis of in vitro sequence selection"

Description

Once decompressed, the file contains a folder which contains:

  • The files "s100_Nth.fasta" (where "N" is 5, 6, 7 or 8), which are the output of the SELEX experiment described in the paper with DOI: 10.1002/cbic.201900265. They are standard fasta files, and the descriptor of each sequence is of the form "seqX-Y", where "X" is an increasing label, and "Y" is the number of times "seqX" has been obtained (number of counts of "seqX").
  • The file "Aptamer_Exp_Results.csv", which contains the sequences tested experimentally for the paper "Generative and interpretable machine learning for aptamer design and analysis of in vitro sequence selection" (preprint available at https://doi.org/10.1101/2022.03.12.484094), with the following experimental results for each sequence: (i) whether the sequence was able to bind thrombin ('B' for binders, 'NB' for non-binders); (ii) the thrombin exosite used for binding ('I' for exosite I, 'II' for exosite II, 'n/a' for sequences not tested).

Examples of usage of the data are available at https://github.com/adigioacchino/RBMsForAptamers.

Notes

If these data are used for academic research, please consider citing the following papers: https://doi.org/10.1101/2022.03.12.484094; https://doi.org/10.1002/cbic.201900265.

Files

Files (41.2 MB)

Name Size Download all
md5:3a0360e94ca423599a1a93be1d2a0942
41.2 MB Download