Published February 24, 2026 | Version v4
Journal article Open

Sample-specific haplotype-resolved isoform characterization via long-read RNA-seq-based proteogenomics

  • 1. ROR icon University of Zurich
  • 2. ROR icon University of Virginia
  • 3. ROR icon SIB Swiss Institute of Bioinformatics

Description

Bottom-up mass spectrometry (MS) supports disease research through the detection of protein isoforms. Recently, personalized proteomes have enabled more sensitive MS searches. Long-read RNA-seq (lrRNA-seq) data can be leveraged to create a sample-specific proteome in proteogenomics approaches to integrate genomic variation and alternative splicing events. We benchmark several algorithms for variant phasing on PacBio lrRNA-seq data and show that incorporating lrRNA-seq-based phased variants can increase peptide and protein isoform detection within MS-based searches. For this purpose, we develop a pipeline that constructs haplotype-resolved sample-specific proteomes, followed by MS search and annotation. Our workflow can be applied to samples containing matched MS and lrRNA-seq. We apply our workflow on a WTC11 sample and a ten-day osteoblast differentiation, highlighting the applicability of our work for both singular samples and more complicated experimental designs. We show that searching against sample-specific haplotype-resolved proteomes enables better detection and characterization of protein isoforms and supports the detection of linked variants. Consistent with previous work, genetic variation was consistently a much greater contributor to proteomic complexity than alternative splicing in our considered WTC11 sample. Our open-source Snakemake pipeline strives to support research and applications of haplotype-resolved MS searching based on lrRNA-seq data.

Files

Files (4.8 GB)

Name Size Download all
md5:51118e4d427f297dea5fa0c5542a1227
875.1 MB Download
md5:8157eb1056df6c5fa432deafbad73ed4
526.7 MB Download
md5:eb28a5d4cc264ef5dac8c8f82b842031
71.1 MB Download
md5:8cfba4d82630c41923b10226e780831d
665.8 MB Download
md5:fc35b400821a5520f908ad14763e492b
2.6 GB Download