Published June 9, 2025 | Version v1
Model Open

Campolina: A Deep Neural Framework for Accurate Segmentation of Nanopore Signals

  • 1. ROR icon National University of Singapore
  • 2. ROR icon Genome Institute of Singapore
  • 3. University of Zagreb
  • 4. University of Zagreb, Faculty of Electrical Engineering and Computing

Description

This repository contains trained models for Campolina, a deep learning framework for event border detection in nanopore sequencing signals. We provide models trained for R10.4.1 (R10_model.pth) and R9.4.1 (R9_model.pth) nanopore versions.

Campolina Framework Overview:

  • a) Feature Extraction Pipeline:
    Raw signal data is split into non-overlapping chunks of fixed length (L = 6000 samples). Each chunk is z-normalized and augmented with four point-wise statistical descriptors:

    1. Rolling mean

    2. Rolling standard deviation

    3. First-order difference

    4. Rolling-window t-statistic

    This results in a 5-channel input per chunk.

  • b) Model Architecture:
    The 5-channel input is processed by a convolutional neural network comprising six convolutional blocks with increasing channel sizes (32, 64, 64, 128, 128). The first block uses a kernel size of 3, while all others use a kernel size of 31. All activations are GELU. A linear classification head outputs a logit (non-normalized probability) per sample point, indicating the likelihood of an event border. The model outputs a sequence of shape (6000 × 1).

Training Objective:

Campolina is trained to identify signal positions that correspond to event borders. Ground truth event borders are generated through a two-step process:

  1. The signal is basecalled using Dorado’s “super accurate” basecaller and aligned to a reference using minimap2. This produces an initial segmentation.

  2. The initial borders and k-mer model are refined using Remora, producing high-accuracy event border annotations.

Loss Function:

The model is trained using a composite loss function:

  • Focal Loss: Classification loss that handles class imbalance due to the rarity of event borders.

  • Huber Loss: Constrains the total number of predicted event borders.

  • Consecutive Loss: Prevents over-prediction of consecutive borders, which can occur due to variable translocation speed in nanopore sequencing.

Files

Files (7.7 MB)

Name Size Download all
md5:011f49d42edff9a4d7e701c796baaff7
3.8 MB Download
md5:a1f99e6e605089883914344ffbd1f776
3.8 MB Download

Additional details

Dates

Created
2025-06-09