Comprehensive benchmark and architectural analysis of deep learning models for Nanopore sequencing basecalling

Pagès-Gallego, M.; de Ridder, J.

doi:10.5281/zenodo.7657037

Published February 20, 2023 | Version 1.0.0

Software Open

Comprehensive benchmark and architectural analysis of deep learning models for Nanopore sequencing basecalling

1. UMC Utrecht

Nanopore-based DNA sequencing relies on basecalling the electric current signal. Basecalling1
requires neural networks to achieve competitive accuracies. To improve sequencing accuracy further,2
new models are continuously proposed. However, benchmarking is currently not standardized, and3
evaluation metrics and datasets used are defined on a per publication basis, impeding progress in4
the field. To standardize the process of benchmarking, we unified existing benchmarking datasets5
and defined a rigorous set of evaluation metrics. We benchmarked the latest seven basecaller models6
and analyzed their deep learning architectures. Our results show that overall Bonito has the best7
architecture for basecalling. We find, however, that species bias in training can have a large impact8
on performance. Our comprehensive evaluation of 90 novel architecture demonstrates that different9
models excel at reducing different types of errors and using RNNs (LSTM) and a CRF decoder are10
the main drivers of high performing models.

Files

basecalling_architectures-main.zip

Files (98.6 MB)

Name	Size	Download all
basecalling_architectures-main.zip md5:04948567d5fa8f46d4287d15cbcf8239	89.4 MB	Preview Download
nanopore_benchmark-main.zip md5:a9d461e7489fdd1a119e9b8f1093dfb0	9.2 MB	Preview Download

	All versions	This version
Views	245	241
Downloads	81	81
Data volume	4.1 GB	4.1 GB

Comprehensive benchmark and architectural analysis of deep learning models for Nanopore sequencing basecalling

Creators

Description

Files

basecalling_architectures-main.zip

Files (98.6 MB)