Published December 11, 2024 | Version v1
Dataset Open

Neoantigen-2-Non-Reference-Database-Generation

  • 1. ROR icon University of Minnesota

Description

Proteogenomics leverages mass spectrometry (MS)-based proteomics data alongside genomics and transcriptomics data to identify neoantigens—unique peptide sequences arising from tumor-specific mutations. In the initial section of this tutorial, we will construct a customized protein database (FASTA) using RNA-sequencing files (FASTQ) derived from tumor samples. Following this, we will conduct sequence database searches using the resultant FASTA file and MS data to identify peptides corresponding to novel proteoforms, specifically focusing on potential neoantigens. We will then assign genomic coordinates and annotations to these identified peptides and visualize the data, assessing both spectral quality and genomic localization. In this framework, Proteogenomics incorporates RNA-Seq data to generate tailored protein sequence databases, enabling the identification of protein sequence variants, including neoantigens, through mass spectrometry analysis.

Files

Files (12.7 GB)

Name Size Download all
md5:a74384ae158f05e60a02eae5dd536eae
9.9 MB Download
md5:b537c06a615019071e49d6e9168919f4
1.4 GB Download
md5:3f83752652d089948f8f0457eba846e2
38.4 MB Download
md5:46b3756d961fefb8c60e43222be4e327
384.8 MB Download
md5:5bd0a386252788ac0f5d0422e57f5aae
5.4 GB Download
md5:453f6fd01cdb05f73c6df0b3effef6ba
5.5 GB Download