Sample-specific haplotype-resolved isoform characterization via long-read RNA-seq-based proteogenomics

Wissel, David; Sheynkman, Gloria; Robinson, Mark

doi:10.5281/zenodo.17674330

Published November 21, 2025 | Version v1

Journal article Open

Sample-specific haplotype-resolved isoform characterization via long-read RNA-seq-based proteogenomics

1. University of Zurich
2. University of Virginia
3. SIB Swiss Institute of Bioinformatics

Bottom-up mass spectrometry (MS) supports disease research through the detection of protein isoforms. Recently, personalized proteomes have enabled more sensitive MS searches. Long-read RNA-seq (lrRNA-seq) data can be leveraged to create a sample-specific proteome in proteogenomics approaches to integrate genomic variation and alternative splicing events. We benchmark several algorithms for variant phasing on PacBio lrRNA-seq data and show that incorporating lrRNA-seq-based phased variants can increase peptide and protein isoform detection within MS-based searches. For this purpose, we develop a pipeline that constructs haplotype-resolved sample-specific proteomes, followed by MS search and annotation. Our workflow can be applied to samples containing matched MS and lrRNA-seq. We apply our workflow on a WTC11 sample and a ten-day osteoblast differentiation, highlighting the applicability of our work for both singular samples and more complicated experimental designs. We show that searching against sample-specific haplotype-resolved proteomes enables better detection and characterization of protein isoforms and supports the detection of linked variants. Consistent with previous work, genetic variation was consistently a much greater contributor to proteomic complexity than alternative splicing in our considered WTC11 sample. Our open-source Snakemake pipeline strives to support research and applications of haplotype-resolved MS searching based on lrRNA-seq data.

Files

Files (28 Bytes)

Name	Size	Download all
lr-haplotype-proteogenomics md5:f660a5f797bf42a4242953a71854ef42	28 Bytes	Download

	All versions	This version
Views	86	30
Downloads	24	8
Data volume	33.3 GB	224 Bytes

Sample-specific haplotype-resolved isoform characterization via long-read RNA-seq-based proteogenomics

Authors/Creators

Description

Files

Files (28 Bytes)