Re-analysis of the data from synthetic peptides run on an orbitrap Fusion (Thermo Scientific) with different acquisition methods (Ferries et al. 2017). The aim is to compare the localisation scores from different bioinformatic pipelines on a low-complexity mix of phospho-peptides with known sequences and phospho-localisation.

The synthetic peptides were spiked in phospho-enriched peptides(Ferries et al. 2017):

Synthetic phosphopeptide standards (∼10 pmol, split into five pools to separate phosphoisomers and thus ensure confidence in phosphosite localization) and 6 μL of enriched phosphopeptides (equivalent to 100 μg of digested cell lysate) were loaded onto the trapping column (PepMap100, C18, 300 μm × 5 mm)

Description of the synthetic peptides used for the analysis

I find 179 unique phospho-peptide in the pools. This is not in line with what is described in (Ferries et al. 2017).

All related information can be found in the document “01_syntheticPeptides_Description.html”.

Search pipelines and parameters

MaxQuant

Search parameters with Andromeda integrated in MaxQuant (v1.6.1.0) using default settings unless otherwise specified. The databases is Uniprot_human_20170912 including contaminants database. The search is performed on each pool separately.

Proteome Discoverer 2.3

Search parameters with Mascot integrated in Proteome Discoverer 2.3 (PhosphoRS 3.0) using default settings unless otherwise specified:

The databases are:

  • Database 1 : contaminants 20150320 (245 sequences; 127304 residues)
  • Database 2 : Uniprot_human human_20170912 (160299 sequences; 50118170 residues)

A search is performed for each raw file separately. We use either Mascot, SequestHT or MSAmanda to compare the impact of different search engines on phosphorylation sites identification.

Percolator validation cannot be performed because not enough PSMs from the target and the decoy search. Therefore, the validation was performed with the Target Decoy method only.

.mgf peaklists were exported to serve as input in PeptideShaker.

Loading of the search results

The search results were parsed with the markdown scripts in “ParseInput/”. All the informations on filters and else are in the corresponding .html documents.

Combine the tables:

Analysis and figures

Total number of identification

Number of PSM per strategy (all pools combined):

These results are not entirely in line with what was published previously (Ferries et al. 2017):

Number of PSM per strategy (all pools combined) that correspond to a synthetic sequence (without taking into account the localisation):

Number of sequence ID per strategy (all pools combined) that correspond to a synthetic sequence (without taking into account the localisation):

## quartz_off_screen 
##                 2
Method NumPSMs Replicate Software SearchEngine Scoretype Validation colour
rep1_HCDOT_MQ_Andromeda_SearchEngine_TargetDecoy HCDOT 112 rep1 MQ Andromeda SearchEngine TargetDecoy MQ Andromeda
rep1_HCDOT_PD_Mascot_ptmRS_TargetDecoy HCDOT 100 rep1 PD Mascot ptmRS TargetDecoy PD Mascot
rep1_HCDOT_PD_Mascot_SearchEngine_TargetDecoy HCDOT 100 rep1 PD Mascot SearchEngine TargetDecoy PD Mascot
rep1_HCDOT_PD_MSAmanda_ptmRS_TargetDecoy HCDOT 107 rep1 PD MSAmanda ptmRS TargetDecoy PD MSAmanda
rep1_HCDOT_PD_MSAmanda_SearchEngine_TargetDecoy HCDOT 107 rep1 PD MSAmanda SearchEngine TargetDecoy PD MSAmanda
rep1_HCDOT_PD_SequestHT_ptmRS_TargetDecoy HCDOT 101 rep1 PD SequestHT ptmRS TargetDecoy PD SequestHT
rep1_HCDOT_PD_SequestHT_SearchEngine_TargetDecoy HCDOT 101 rep1 PD SequestHT SearchEngine TargetDecoy PD SequestHT
rep1_HCDOT_PS_Comet_Ascore_TargetDecoy HCDOT 103 rep1 PS Comet Ascore TargetDecoy PS Comet
rep1_HCDOT_PS_Comet_ptmRS_TargetDecoy HCDOT 103 rep1 PS Comet ptmRS TargetDecoy PS Comet
rep1_HCDOT_PS_Comet_SearchEngine_TargetDecoy HCDOT 103 rep1 PS Comet SearchEngine TargetDecoy PS Comet
rep1_HCDOT_PS_MSAmanda_Ascore_TargetDecoy HCDOT 108 rep1 PS MSAmanda Ascore TargetDecoy PS MSAmanda
rep1_HCDOT_PS_MSAmanda_ptmRS_TargetDecoy HCDOT 108 rep1 PS MSAmanda ptmRS TargetDecoy PS MSAmanda
rep1_HCDOT_PS_MSAmanda_SearchEngine_TargetDecoy HCDOT 108 rep1 PS MSAmanda SearchEngine TargetDecoy PS MSAmanda
rep1_HCDOT_PS_X!Tandem_Ascore_TargetDecoy HCDOT 80 rep1 PS X!Tandem Ascore TargetDecoy PS X!Tandem
rep1_HCDOT_PS_X!Tandem_ptmRS_TargetDecoy HCDOT 80 rep1 PS X!Tandem ptmRS TargetDecoy PS X!Tandem
rep1_HCDOT_PS_X!Tandem_SearchEngine_TargetDecoy HCDOT 80 rep1 PS X!Tandem SearchEngine TargetDecoy PS X!Tandem
rep2_HCDOT_MQ_Andromeda_SearchEngine_TargetDecoy HCDOT 114 rep2 MQ Andromeda SearchEngine TargetDecoy MQ Andromeda
rep2_HCDOT_PD_Mascot_ptmRS_TargetDecoy HCDOT 99 rep2 PD Mascot ptmRS TargetDecoy PD Mascot
rep2_HCDOT_PD_Mascot_SearchEngine_TargetDecoy HCDOT 99 rep2 PD Mascot SearchEngine TargetDecoy PD Mascot
rep2_HCDOT_PD_MSAmanda_ptmRS_TargetDecoy HCDOT 106 rep2 PD MSAmanda ptmRS TargetDecoy PD MSAmanda
rep2_HCDOT_PD_MSAmanda_SearchEngine_TargetDecoy HCDOT 106 rep2 PD MSAmanda SearchEngine TargetDecoy PD MSAmanda
rep2_HCDOT_PD_SequestHT_ptmRS_TargetDecoy HCDOT 106 rep2 PD SequestHT ptmRS TargetDecoy PD SequestHT
rep2_HCDOT_PD_SequestHT_SearchEngine_TargetDecoy HCDOT 106 rep2 PD SequestHT SearchEngine TargetDecoy PD SequestHT
rep2_HCDOT_PS_Comet_Ascore_TargetDecoy HCDOT 85 rep2 PS Comet Ascore TargetDecoy PS Comet
rep2_HCDOT_PS_Comet_ptmRS_TargetDecoy HCDOT 99 rep2 PS Comet ptmRS TargetDecoy PS Comet
rep2_HCDOT_PS_Comet_SearchEngine_TargetDecoy HCDOT 99 rep2 PS Comet SearchEngine TargetDecoy PS Comet
rep2_HCDOT_PS_MSAmanda_Ascore_TargetDecoy HCDOT 106 rep2 PS MSAmanda Ascore TargetDecoy PS MSAmanda
rep2_HCDOT_PS_MSAmanda_ptmRS_TargetDecoy HCDOT 106 rep2 PS MSAmanda ptmRS TargetDecoy PS MSAmanda
rep2_HCDOT_PS_MSAmanda_SearchEngine_TargetDecoy HCDOT 106 rep2 PS MSAmanda SearchEngine TargetDecoy PS MSAmanda
rep2_HCDOT_PS_X!Tandem_Ascore_TargetDecoy HCDOT 96 rep2 PS X!Tandem Ascore TargetDecoy PS X!Tandem
rep2_HCDOT_PS_X!Tandem_ptmRS_TargetDecoy HCDOT 96 rep2 PS X!Tandem ptmRS TargetDecoy PS X!Tandem
rep2_HCDOT_PS_X!Tandem_SearchEngine_TargetDecoy HCDOT 96 rep2 PS X!Tandem SearchEngine TargetDecoy PS X!Tandem

Number of unique synthetic phospho-sequence: 133

Analysis of the set of synthetic phospho-peptides

In these data, we find 219 unique sequences (without taking into account modifications: Sequence field) that do not correspond to any of the synthetic peptides expected.

Number of unique synthetic phospho-peptide per strategy (all pools combined):

I work on the basis of the 179 synthetic phospho-peptides used for the analysis. In the data, there are: 157 unique sequences among the 179 synthetic.

The synthetic peptides that are not identified by any of the methods are the following: pool_1_ADENYYK_5, pool_1_ESKSSPRPTAEK_4, pool_1_SQSTSEQEK_1, pool_2_ADENYYK_6, pool_2_EDAANNYAR_7, pool_2_ETTTSPKKYYLAEK_2, pool_2_NIDQSEFEGFSFVNSEFLKPEVK_11, pool_3_AGGKPSQSPSQEAAGEAVLGAK_10, pool_3_ETTTSPKKYYLAEK_3, pool_3_MPSHEAR_3, pool_3_SRTPPSAPSQSR_6, pool_3_TAPTPPKR_4, pool_3_VYELMR_2, pool_3_VYHYR_2, pool_4_ILSDVTHSAVFGVPASK_8, pool_4_SFNGSLKNVAVDELSR_1, pool_4_SQSDIFSR_3, pool_4_STVASMMHR_5, pool_4_YELTGLPEQDR_1, pool_5_ILSDVTHSAVFGVPASK_6, pool_5_TIYVRDPTSNK_3, pool_5_TSSFAEPGGGGGGGGGGPGGSASGPGGTGGGK_2&3

In the following density plots, the SyntheticPeptideID field indicates which PSMs are attributed to a synthetic peptide: with a correct localisation (TRUE) as opposed to incorrect phosphorylation localisations (FALSE).

I work on mono-phosphorylated peptides only.

I find the spectra that are matched to an ID in all the pipelines.

I plot the histograms of the localisation scores on the PSM matched in all the pipelines (325 PSMs).

## quartz_off_screen 
##                 2

I keep only the mono-phosphorylated spectra with a localisation score > 50% for ptmRS, and ≥ 0.75 for MQ.

I rank the localisation scores from the highest to lowest and go through them. For each step, I count the number of errors.

## quartz_off_screen 
##                 2

The black dots correspond to a localisation threshold of 0.75. In PhosphoRS scoring scheme, there is no value between 0.75 and 0.5, which explains the step at this threshold value.

The “x” correspond to a threshold localisation score of 0.2.

## quartz_off_screen 
##                 2

I save the table with the true FLR and corresponding localisation scores:

write.table(plotval, "FLR.txt", sep = "\t", row.names = F)

Specific comparison of the localisation scoring

Here, I use all the mono-phosphorylated PSMs, even with scores <= 0.5. There are 325 PSMs common to all the pipelines. I use them for the following figures:

## quartz_off_screen 
##                 2

The figure is saved in “Figures/Score_realign.pdf”.


References

Ferries, Samantha, Simon Perkins, Philip J. Brownridge, Amy Campbell, Patrick A. Eyers, Andrew R. Jones, and Claire E. Eyers. 2017. “Evaluation of Parameters for Confident Phosphorylation Site Localization Using an Orbitrap Fusion Tribrid Mass Spectrometer.” Journal of Proteome Research 16 (9): 3448–59. https://doi.org/10.1021/acs.jproteome.7b00337.