Published October 26, 2023 | Version v1
Poster Open

Protein identification with structural barcode readout from nanopores using machine learning

Description

Protein analysis and identification are crucial to understanding biological processes and disease causation. This work focuses on identifying the protein using nanopore-sensing technology, which have various applications in healthcare, such as disease diagnosis and precision medicine. The main concept used in nanopore sensing is that when a molecule passes through the nanopore, the current across the nanopore, changes due to the blockage caused by the molecule. This change in current contains the information related to the molecule translocating.  Instead of analysing the protein, DNA barcodes designed to be protein-specific are used as surrogates. DNA barcodes consist of a DNA scaffold onto which nanostructures are embedded. The identification task is completed in two steps: First, each DNA translocation event is detected and extracted, and afterward, the extracted single-translocation events are analysed, and the barcode is inferred. This work focuses on decoding the barcode for each translocation event. Two different approaches, the hidden Markov Model (HMM) and transformer network, are tested on the events generated from the synthetic DNA barcode generator. There are 8 different barcode combination possibles. In case of HMM, firstly sequences of states of the nanopore were decoded from the current signals and then the barcodes were inferred. In case of transformer, no separate steps were taken, barcode was directly inferred from the current signals. Both methods performed well with 99.9% F1 score. In future steps, we plan to test these two methods with real nanopore events and see their efficacy.

Files

fears_2023_poster.pdf

Files (2.6 MB)

Name Size Download all
md5:882d2ee896bc195852c67fd94396fdb0
2.6 MB Preview Download