Campolina: A Deep Neural Framework for Accurate Segmentation of Nanopore Signals
Description
This repository contains trained models for Campolina, a deep learning framework for event border detection in nanopore sequencing signals. We provide models trained for R10.4.1 (R10_model.pth) and R9.4.1 (R9_model.pth) nanopore versions.
Campolina Framework Overview:
-
a) Feature Extraction Pipeline:
Raw signal data is split into non-overlapping chunks of fixed length (L = 6000 samples). Each chunk is z-normalized and augmented with four point-wise statistical descriptors:-
Rolling mean
-
Rolling standard deviation
-
First-order difference
-
Rolling-window t-statistic
This results in a 5-channel input per chunk.
-
-
b) Model Architecture:
The 5-channel input is processed by a convolutional neural network comprising six convolutional blocks with increasing channel sizes (32, 64, 64, 128, 128). The first block uses a kernel size of 3, while all others use a kernel size of 31. All activations are GELU. A linear classification head outputs a logit (non-normalized probability) per sample point, indicating the likelihood of an event border. The model outputs a sequence of shape (6000 × 1).
Training Objective:
Campolina is trained to identify signal positions that correspond to event borders. Ground truth event borders are generated through a two-step process:
-
The signal is basecalled using Dorado’s “super accurate” basecaller and aligned to a reference using
minimap2. This produces an initial segmentation. -
The initial borders and k-mer model are refined using Remora, producing high-accuracy event border annotations.
Loss Function:
The model is trained using a composite loss function:
-
Focal Loss: Classification loss that handles class imbalance due to the rarity of event borders.
-
Huber Loss: Constrains the total number of predicted event borders.
-
Consecutive Loss: Prevents over-prediction of consecutive borders, which can occur due to variable translocation speed in nanopore sequencing.
Files
Files
(7.7 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:011f49d42edff9a4d7e701c796baaff7
|
3.8 MB | Download |
|
md5:a1f99e6e605089883914344ffbd1f776
|
3.8 MB | Download |
Additional details
Dates
- Created
-
2025-06-09