Sample-specific haplotype-resolved isoform characterization via long-read RNA-seq-based proteogenomics
Authors/Creators
Description
Bottom-up mass spectrometry (MS) supports disease research through the detection of protein isoforms. Recently, personalized proteomes have enabled more sensitive MS searches. Long-read RNA-seq (lrRNA-seq) data can be leveraged to create a sample-specific proteome in proteogenomics approaches to integrate genomic variation and alternative splicing events. We benchmark several algorithms for variant phasing on PacBio lrRNA-seq data and show that incorporating lrRNA-seq-based phased variants can increase peptide and protein isoform detection within MS-based searches. For this purpose, we develop a pipeline that constructs haplotype-resolved sample-specific proteomes, followed by MS search and annotation. Our workflow can be applied to samples containing matched MS and lrRNA-seq. We apply our workflow on a WTC11 sample and a ten-day osteoblast differentiation, highlighting the applicability of our work for both singular samples and more complicated experimental designs. We show that searching against sample-specific haplotype-resolved proteomes enables better detection and characterization of protein isoforms and supports the detection of linked variants. Consistent with previous work, genetic variation was consistently a much greater contributor to proteomic complexity than alternative splicing in our considered WTC11 sample. Our open-source Snakemake pipeline strives to support research and applications of haplotype-resolved MS searching based on lrRNA-seq data.
Files
Files
(28 Bytes)
| Name | Size | Download all |
|---|---|---|
|
md5:f660a5f797bf42a4242953a71854ef42
|
28 Bytes | Download |