Campolina: A Deep Neural Framework for Accurate Segmentation of Nanopore Signals

Bakić, Sara; Friganovic, Kresimir; Hooi, Bryan; Sikic, Mile

doi:10.5281/zenodo.15626806

Published June 9, 2025 | Version v1

Model Open

Campolina: A Deep Neural Framework for Accurate Segmentation of Nanopore Signals

1. National University of Singapore
2. Genome Institute of Singapore
3. University of Zagreb
4. University of Zagreb, Faculty of Electrical Engineering and Computing

This repository contains trained models for Campolina, a deep learning framework for event border detection in nanopore sequencing signals. We provide models trained for R10.4.1 (R10_model.pth) and R9.4.1 (R9_model.pth) nanopore versions.

Campolina Framework Overview:

a) Feature Extraction Pipeline:
Raw signal data is split into non-overlapping chunks of fixed length (L = 6000 samples). Each chunk is z-normalized and augmented with four point-wise statistical descriptors:
1. Rolling mean
2. Rolling standard deviation
3. First-order difference
4. Rolling-window t-statistic
This results in a 5-channel input per chunk.
b) Model Architecture:
The 5-channel input is processed by a convolutional neural network comprising six convolutional blocks with increasing channel sizes (32, 64, 64, 128, 128). The first block uses a kernel size of 3, while all others use a kernel size of 31. All activations are GELU. A linear classification head outputs a logit (non-normalized probability) per sample point, indicating the likelihood of an event border. The model outputs a sequence of shape (6000 × 1).

Training Objective:

Campolina is trained to identify signal positions that correspond to event borders. Ground truth event borders are generated through a two-step process:

The signal is basecalled using Dorado’s “super accurate” basecaller and aligned to a reference using minimap2. This produces an initial segmentation.
The initial borders and k-mer model are refined using Remora, producing high-accuracy event border annotations.

Loss Function:

The model is trained using a composite loss function:

Focal Loss: Classification loss that handles class imbalance due to the rarity of event borders.
Huber Loss: Constrains the total number of predicted event borders.
Consecutive Loss: Prevents over-prediction of consecutive borders, which can occur due to variable translocation speed in nanopore sequencing.

Files

Files (7.7 MB)

Name	Size	Download all
R10_model.pth md5:011f49d42edff9a4d7e701c796baaff7	3.8 MB	Download
R9_model.pth md5:a1f99e6e605089883914344ffbd1f776	3.8 MB	Download

Additional details

Created: 2025-06-09

	All versions	This version
Views	167	167
Downloads	11	11
Data volume	42.2 MB	42.2 MB

Campolina: A Deep Neural Framework for Accurate Segmentation of Nanopore Signals

Creators

Description

Files

Files (7.7 MB)

Additional details

Dates