Published August 19, 2023 | Version 1.1.0
Dataset Open

UTexas Aptamer Dataset

Description

The deposited dataset is a snapshot of the data in the active and growing UTexas Aptamer Database, https://sites.utexas.edu/aptamerdatabase/. This dataset is a collection of aptamer data that has been extracted from the literature every year since the inception of aptamer selections and includes multiple aptamer sequences from a given paper (as opposed to just sequences with the tightest binding). In all, the collection includes 1,415 aptamer sequences from 489 papers published over the last few decades (1990-2022). Since our dataset includes multiple sequences that emerged from a given selection experiment, it of necessity includes sequences that may not have been individually tested for binding activity, similar to the inclusion of all rRNA sequences in a metagenomic analysis of an environmental sample. By taking this metagenomic approach, we provide informaticians with a much wider range of sequences for subsequent analysis while still providing tools to find high-affinity aptamers for future use.

For each aptamer sequence, the dataset includes information about the aptamer publication (i.e., year of publication, DOI, full citation, and corresponding author(s)), the aptamer target, as well as the following information about the specific aptamer: nucleic acid composition, name assigned in the original publication, sequence, GC percentage, sequence length, binding affinity (Kd), binding/selection buffer, application as quoted in the referenced paper (e.g., drug delivery, biosensor, etc.), original nucleic acid pool used in the aptamer selection, post-selection modifications (if any), additional information, and our internally assigned serial number. We used simple Excel formulas for each aptamer record to calculate the GC content and length of each aptamer sequence.

 

1.1.0 Version:

  • Added 25+  aptamer sequences.
  • Added the "Parent sequence serial number" data field/column.
  • Fixed "Application as quoted in the referenced paper" data formatting/alignment error.

Notes

This research was funded by The University of Texas Freshman Research Initiative, which was supported by the Howard Hughes Medical Institute (#52008124, concluded in 2021), and the College of Natural Sciences. Ali Akari's work was partially supported by the UT-Austin S-STEM Sophomore Scholars Program, which is funded by the National Science Foundation Award (#1742548).

Files

Files (1.1 MB)

Name Size Download all
md5:65b1430f6bad16bec09e72ebf202a824
511.0 kB Download
md5:39e4838d14d7837be021f24daa87021d
579.8 kB Download