Re-analysis of the data from phospho-enriched samples run on Thermo Q-Exactive HF (Searle et al. 2018):

DDA Acquisition and Processing: The Thermo Q-Exactive HF was set to positive mode in a top 12 configuration. Full MS scans of mass range 400-1600 were collected at 60,000 resolution to hit an AGC target of 3e6. The maximum inject time was set to 100 ms. MS/MS scans were collected at 30,000 resolution, AGC target of 1e6, and maximum inject time of 55 ms. The isolation width was set to 1.5 m/z with a normalized collision energy of 27. Only precursors charged between +2 and +4 that achieved a minimum AGC of 1e4 were acquired. Dynamic exclusion was set to “auto” and to exclude all isotopes in a cluster.

The aim is to compare the localisation scores from different bioinformatic pipelines on a relatively high-complexity mix of phospho-peptides that was also analysed with a DIA pipeline (for reference).

Description of the input files

Description of the input samples:

Chorus.ID File Size..GB. Experiment Condition Replicate
185068 20170430_HeLa_phosp_DDA_B_01_170506194651.raw 1.24 HeLa DIA/DDA reproducibility DDA 1
185066 20170430_HeLa_phosp_DDA_B_02_170507002341.raw 1.23 HeLa DIA/DDA reproducibility DDA 2
185064 20170430_HeLa_phosp_DDA_B_03_170507050032.raw 1.20 HeLa DIA/DDA reproducibility DDA 3
185062 20170430_HeLa_phosp_DDA_B_04.raw 1.25 HeLa DIA/DDA reproducibility DDA 4
185067 20170430_HeLa_phosp_DIA_B_01_170506220515.raw 1.08 HeLa DIA/DDA reproducibility DIA 1
185065 20170430_HeLa_phosp_DIA_B_02_170507024206.raw 1.07 HeLa DIA/DDA reproducibility DIA 2
185063 20170430_HeLa_phosp_DIA_B_03_170507071858.raw 1.09 HeLa DIA/DDA reproducibility DIA 3
185061 20170430_HeLa_phosp_DIA_B_04.raw 1.10 HeLa DIA/DDA reproducibility DIA 4

Search pipelines and parameters

MaxQuant

Search parameters with Andromeda integrated in MaxQuant (v1.6.1.0) using default settings unless otherwise specified. The databases is Uniprot_human_20170912 including contaminants database.

Fill up with the search parameters

Proteome Discoverer 2.3

Search parameters with Mascot integrated in Proteome Discoverer 2.3 (PhosphoRS 3.0 ) using default settings unless otherwise specified:

Fill up with the search parameters

PeptideShaker

TO BE FILLED UP

Parsing of the search results

For all the different inputs, I create IDs of the phospho-peptides:

When there are several scores for the phosphorylations localisations, I create one ID for each scoring. I define the scorings as “ptmRS” when it is the phosphoRS algorithm, or “SearchEngine” when it is the default localisation score of the pipeline.

The filters associated with the analysis are in the documents of ParseInput/. I kept only the phosphorylated PSMs.

Combine the tables:

Analysis and figures

Total number of identification

I define a spectrumID as the concatenation of the file name + the scan number. It corresponds to a spectrum. I look at the spectra that are identified in all the pipelines.

How many peptide sequences are identified across the pipelines?

## [1] "Mean values:"
##            MaxQuant Andromeda TargetDecoy 
##                                   7463.50 
##          PeptideShacker Comet TargetDecoy 
##                                   6785.00 
##       PeptideShacker MSAmanda TargetDecoy 
##                                   6789.00 
##       PeptideShacker X!Tandem TargetDecoy 
##                                   6248.25 
##     Proteome.Discoverer Mascot Percolator 
##                                   7565.75 
##    Proteome.Discoverer Mascot TargetDecoy 
##                                   6159.75 
##   Proteome.Discoverer MSAmanda Percolator 
##                                   7612.50 
##  Proteome.Discoverer MSAmanda TargetDecoy 
##                                   7212.50 
##  Proteome.Discoverer SequestHT Percolator 
##                                   7968.50 
## Proteome.Discoverer SequestHT TargetDecoy 
##                                   6909.75
## [1] "all values:"
##      MaxQuant Andromeda TargetDecoy PeptideShacker Comet TargetDecoy
## [1,]                           7393                             6919
## [2,]                           7358                             6606
## [3,]                           7523                             6733
## [4,]                           7580                             6882
##      PeptideShacker MSAmanda TargetDecoy
## [1,]                                6790
## [2,]                                6603
## [3,]                                6708
## [4,]                                7055
##      PeptideShacker X!Tandem TargetDecoy
## [1,]                                6221
## [2,]                                6144
## [3,]                                6239
## [4,]                                6389
##      Proteome.Discoverer Mascot Percolator
## [1,]                                  7503
## [2,]                                  7448
## [3,]                                  7582
## [4,]                                  7730
##      Proteome.Discoverer Mascot TargetDecoy
## [1,]                                   6078
## [2,]                                   6047
## [3,]                                   6254
## [4,]                                   6260
##      Proteome.Discoverer MSAmanda Percolator
## [1,]                                    7573
## [2,]                                    7502
## [3,]                                    7667
## [4,]                                    7708
##      Proteome.Discoverer MSAmanda TargetDecoy
## [1,]                                     7074
## [2,]                                     7113
## [3,]                                     7253
## [4,]                                     7410
##      Proteome.Discoverer SequestHT Percolator
## [1,]                                     7940
## [2,]                                     7811
## [3,]                                     8000
## [4,]                                     8123
##      Proteome.Discoverer SequestHT TargetDecoy
## [1,]                                      6845
## [2,]                                      6757
## [3,]                                      6947
## [4,]                                      7090
## quartz_off_screen 
##                 2

Same after filtering of 75% localisation score (or ≥ 20 for Ascore):

How many peptide sequences are identified across the pipelines?

## [1] "Mean values:"
##            MaxQuant Andromeda TargetDecoy 
##                                  5993.500 
##          PeptideShacker Comet TargetDecoy 
##                                  2314.750 
##       PeptideShacker MSAmanda TargetDecoy 
##                                  2298.333 
##       PeptideShacker X!Tandem TargetDecoy 
##                                  3726.667 
##     Proteome.Discoverer Mascot Percolator 
##                                  4082.375 
##    Proteome.Discoverer Mascot TargetDecoy 
##                                  3257.625 
##   Proteome.Discoverer MSAmanda Percolator 
##                                  3368.750 
##  Proteome.Discoverer MSAmanda TargetDecoy 
##                                  3207.500 
##  Proteome.Discoverer SequestHT Percolator 
##                                  3237.625 
## Proteome.Discoverer SequestHT TargetDecoy 
##                                  2854.625
## [1] "all values:"
## $`MaxQuant Andromeda TargetDecoy`
## [1] 5909 5902 6044 6119
## 
## $`PeptideShacker Comet TargetDecoy`
##  [1] 1661 1665 1667 1741 3772 3570 3674 3749 1573 1574 1503 1628
## 
## $`PeptideShacker MSAmanda TargetDecoy`
##  [1] 1623 1656 1635 1701 3650 3542 3635 3844 1545 1539 1560 1650
## 
## $`PeptideShacker X!Tandem TargetDecoy`
##  [1] 1525 1552 1547 1595 3367 3301 3400 3447 6221 6142 6235 6388
## 
## $`Proteome.Discoverer Mascot Percolator`
## [1] 5931 5850 6008 6062 2160 2174 2198 2276
## 
## $`Proteome.Discoverer Mascot TargetDecoy`
## [1] 4974 4898 5123 5109 1443 1493 1494 1527
## 
## $`Proteome.Discoverer MSAmanda Percolator`
## [1] 5970 5895 6060 6081  736  722  742  744
## 
## $`Proteome.Discoverer MSAmanda TargetDecoy`
## [1] 5629 5624 5773 5867  686  690  683  708
## 
## $`Proteome.Discoverer SequestHT Percolator`
## [1] 6213 6103 6273 6350  246  224  246  246
## 
## $`Proteome.Discoverer SequestHT TargetDecoy`
## [1] 5424 5336 5512 5623  243  216  242  241
## quartz_off_screen 
##                 2

I plot the localisation score distributions for the spectra (monophosphorylated) that are identified across all the pipelines (16920 spectra): Ascores are divided by their max value.

## quartz_off_screen 
##                 2

Now without scaling:

## quartz_off_screen 
##                 2

I calculate the Jaccard index of each pair of search (for the identified spectra) to get a feel for the comparative performance/similarity of the searches. First with all the spectra:

## quartz_off_screen 
##                 2

I calculate the Jaccard index of each pair of search (for the identified spectra) to get a feel for the comparative performance/similarity of the searches. This is done only with the spectra identified in all the pipelines:

## quartz_off_screen 
##                 2

Now, I calculate the Jaccard index of each pair of search (for the identified spectra) to get a feel for the comparative performance/similarity of the searches in terms of peptide ID, not in terms of spectra.

Again, this is done only with the spectra identified in all the pipelines:

## quartz_off_screen 
##                 2

And with all spectra:

## quartz_off_screen 
##                 2

References

Searle, Brian C, Robert T Lawrence, Michael J MacCoss, and Judit Villén. 2018. “Thesaurus: quantifying phosphoprotein positional isomers.” bioRxiv. https://doi.org/10.1101/421214.