Published March 20, 2026 | Version v1.0.0
Software Open

UKBOL sample and sequence data pre-processing

  • 1. ROR icon Natural History Museum

Description

This repository provides a standard operating procedure (SOP) and associated scripts for the pre-processing of raw genome skim data generated from museum specimens, developed for the UK Barcode of Life (UKBOL) project at the Natural History Museum, London. The workflow takes as input sample metadata and non-base-called paired-end short-read sequencing data from the Element Biosciences AVITI platform, and proceeds through basecalling and demultiplexing, sequence data transfer and quality assurance, lane merging, and samples sheet preparation, culminating in the recovery and validation of barcoding genes using BeeGees and the assembly and validation of mitochondrial genomes using a modified skim2mito pipeline.

Utility scripts are provided for automated sample sheet generation, NCBI taxonomic identifier resolution, and data transfer with integrity verification. The SOP is designed for execution on high-performance computing infrastructure using SLURM job scheduling and conda-managed environments, with optional data transfer to the Crop Diversity (Gruffalo) cluster. This resource supports scalable, reproducible genomic processing of natural history collections material for biodiversity barcoding initiatives.

Files

museomics/sample-preprocessing-v1.0.0.zip

Files (20.2 kB)

Name Size Download all
md5:f654908a9a66b5f68f8ca93a84b4b3aa
20.2 kB Preview Download

Additional details

Related works