Neoantigen-1b-Non-Reference-Database-Generation

Do, Katherine

doi:10.5281/zenodo.14372521

Published December 11, 2024 | Version v1

Dataset Open

Neoantigen-1b-Non-Reference-Database-Generation

Do, Katherine (Data manager)¹

1. University of Minnesota

Proteogenomics leverages mass spectrometry (MS)-based proteomics data alongside genomics and transcriptomics data to identify neoantigens—unique peptide sequences arising from tumor-specific mutations. In the initial section of this tutorial, we will construct a customized protein database (FASTA) using RNA-sequencing files (FASTQ) derived from tumor samples. Following this, we will conduct sequence database searches using the resultant FASTA file and MS data to identify peptides corresponding to novel proteoforms, specifically focusing on potential neoantigens. We will then assign genomic coordinates and annotations to these identified peptides and visualize the data, assessing both spectral quality and genomic localization. In this framework, Proteogenomics incorporates RNA-Seq data to generate tailored protein sequence databases, enabling the identification of protein sequence variants, including neoantigens, through mass spectrometry analysis.

Files

Files (12.7 GB)

Name	Size	Download all
GffCompare_Annotated_GTF_to_BED.bed md5:a74384ae158f05e60a02eae5dd536eae	9.9 MB	Download
Homo_sapiens.GRCh38_canon.106.gtf md5:b537c06a615019071e49d6e9168919f4	1.4 GB	Download
HUMAN_CRAP.fasta md5:3f83752652d089948f8f0457eba846e2	38.4 MB	Download
Human_cRAP_Non_normal_transcripts_dB_generation.fasta md5:46b3756d961fefb8c60e43222be4e327	384.8 MB	Download
RNA-Seq_Reads_1.fastqsanger.gz md5:5bd0a386252788ac0f5d0422e57f5aae	5.4 GB	Download
RNA-Seq_Reads_2.fastqsanger.gz md5:453f6fd01cdb05f73c6df0b3effef6ba	5.5 GB	Download

	All versions	This version
Views	630	630
Downloads	747	747
Data volume	1.6 TB	1.6 TB

Neoantigen-1b-Non-Reference-Database-Generation

Authors/Creators

Description

Files

Files (12.7 GB)