UTexas Aptamer Dataset

Askari, Ali; Kota, Sumedha; Ferrell, Hailey; Swamy, Shriya; Goodman, Kayla S.; Okoro, Christine C.; Spruell Crenshaw, Isaiah C.; Hernandez, Daniela K.; Oliphant, Taylor E.; Badrayani, Akshata A.; Ellington, Andrew D.; Stovall, Gwendolyn M.

doi:10.5281/zenodo.8387047

Published August 19, 2023 | Version 1.1.0

Dataset Open

UTexas Aptamer Dataset

1. Freshman Research Initiative, The University of Texas at Austin
2. Institute for Molecular Biosciences, The University of Texas at Austin

Contributors

Data collectors:

1. Freshman Research Initiative, The University of Texas at Austin

The deposited dataset is a snapshot of the data in the active and growing UTexas Aptamer Database, https://sites.utexas.edu/aptamerdatabase/. This dataset is a collection of aptamer data that has been extracted from the literature every year since the inception of aptamer selections and includes multiple aptamer sequences from a given paper (as opposed to just sequences with the tightest binding). In all, the collection includes 1,415 aptamer sequences from 489 papers published over the last few decades (1990-2022). Since our dataset includes multiple sequences that emerged from a given selection experiment, it of necessity includes sequences that may not have been individually tested for binding activity, similar to the inclusion of all rRNA sequences in a metagenomic analysis of an environmental sample. By taking this metagenomic approach, we provide informaticians with a much wider range of sequences for subsequent analysis while still providing tools to find high-affinity aptamers for future use.

For each aptamer sequence, the dataset includes information about the aptamer publication (i.e., year of publication, DOI, full citation, and corresponding author(s)), the aptamer target, as well as the following information about the specific aptamer: nucleic acid composition, name assigned in the original publication, sequence, GC percentage, sequence length, binding affinity (K_d), binding/selection buffer, application as quoted in the referenced paper (e.g., drug delivery, biosensor, etc.), original nucleic acid pool used in the aptamer selection, post-selection modifications (if any), additional information, and our internally assigned serial number. We used simple Excel formulas for each aptamer record to calculate the GC content and length of each aptamer sequence.

1.1.0 Version:

Added 25+ aptamer sequences.
Added the "Parent sequence serial number" data field/column.
Fixed "Application as quoted in the referenced paper" data formatting/alignment error.

Notes

This research was funded by The University of Texas Freshman Research Initiative, which was supported by the Howard Hughes Medical Institute (#52008124, concluded in 2021), and the College of Natural Sciences. Ali Akari's work was partially supported by the UT-Austin S-STEM Sophomore Scholars Program, which is funded by the National Science Foundation Award (#1742548).

Files

Files (1.1 MB)

Name	Size	Download all
UTexas Aptamer Database dataset.xlsx md5:65b1430f6bad16bec09e72ebf202a824	511.0 kB	Download
UTexas Aptamer Database dataset_Sept2023.xlsx md5:39e4838d14d7837be021f24daa87021d	579.8 kB	Download

	All versions	This version
Views	2,536	2,415
Downloads	1,317	1,270
Data volume	787.8 MB	760.2 MB

UTexas Aptamer Dataset

Creators

Contributors

Data collectors:

Description

Notes

Files

Files (1.1 MB)