There is a newer version of the record available.

Published November 21, 2025 | Version v1
Journal article Open

Sample-specific haplotype-resolved isoform characterization via long-read RNA-seq-based proteogenomics

  • 1. ROR icon University of Zurich
  • 2. ROR icon University of Virginia
  • 3. ROR icon SIB Swiss Institute of Bioinformatics

Description

Bottom-up mass spectrometry (MS) supports disease research through the detection of protein isoforms. Recently, personalized proteomes have enabled more sensitive MS searches. Long-read RNA-seq (lrRNA-seq) data can be leveraged to create a sample-specific proteome in proteogenomics approaches to integrate genomic variation and alternative splicing events. We benchmark several algorithms for variant phasing on PacBio lrRNA-seq data and show that incorporating lrRNA-seq-based phased variants can increase peptide and protein isoform detection within MS-based searches. For this purpose, we develop a pipeline that constructs haplotype-resolved sample-specific proteomes, followed by MS search and annotation. Our workflow can be applied to samples containing matched MS and lrRNA-seq. We apply our workflow on a WTC11 sample and a ten-day osteoblast differentiation, highlighting the applicability of our work for both singular samples and more complicated experimental designs. We show that searching against sample-specific haplotype-resolved proteomes enables better detection and characterization of protein isoforms and supports the detection of linked variants. Consistent with previous work, genetic variation was consistently a much greater contributor to proteomic complexity than alternative splicing in our considered WTC11 sample. Our open-source Snakemake pipeline strives to support research and applications of haplotype-resolved MS searching based on lrRNA-seq data.

Files

Files (28 Bytes)

Name Size Download all
md5:f660a5f797bf42a4242953a71854ef42
28 Bytes Download