Re-analysis of the data from phospho-enriched samples run on Thermo Q-Exactive HF (Searle et al. 2018):
DDA Acquisition and Processing: The Thermo Q-Exactive HF was set to positive mode in a top 12 configuration. Full MS scans of mass range 400-1600 were collected at 60,000 resolution to hit an AGC target of 3e6. The maximum inject time was set to 100 ms. MS/MS scans were collected at 30,000 resolution, AGC target of 1e6, and maximum inject time of 55 ms. The isolation width was set to 1.5 m/z with a normalized collision energy of 27. Only precursors charged between +2 and +4 that achieved a minimum AGC of 1e4 were acquired. Dynamic exclusion was set to “auto” and to exclude all isotopes in a cluster.
The aim is to compare the localisation scores from different bioinformatic pipelines on a relatively high-complexity mix of phospho-peptides that was also analysed with a DIA pipeline (for reference).
Description of the input samples:
| Chorus.ID | File | Size..GB. | Experiment | Condition | Replicate |
|---|---|---|---|---|---|
| 185068 | 20170430_HeLa_phosp_DDA_B_01_170506194651.raw | 1.24 | HeLa DIA/DDA reproducibility | DDA | 1 |
| 185066 | 20170430_HeLa_phosp_DDA_B_02_170507002341.raw | 1.23 | HeLa DIA/DDA reproducibility | DDA | 2 |
| 185064 | 20170430_HeLa_phosp_DDA_B_03_170507050032.raw | 1.20 | HeLa DIA/DDA reproducibility | DDA | 3 |
| 185062 | 20170430_HeLa_phosp_DDA_B_04.raw | 1.25 | HeLa DIA/DDA reproducibility | DDA | 4 |
| 185067 | 20170430_HeLa_phosp_DIA_B_01_170506220515.raw | 1.08 | HeLa DIA/DDA reproducibility | DIA | 1 |
| 185065 | 20170430_HeLa_phosp_DIA_B_02_170507024206.raw | 1.07 | HeLa DIA/DDA reproducibility | DIA | 2 |
| 185063 | 20170430_HeLa_phosp_DIA_B_03_170507071858.raw | 1.09 | HeLa DIA/DDA reproducibility | DIA | 3 |
| 185061 | 20170430_HeLa_phosp_DIA_B_04.raw | 1.10 | HeLa DIA/DDA reproducibility | DIA | 4 |
Search parameters with Andromeda integrated in MaxQuant (v1.6.1.0) using default settings unless otherwise specified. The databases is Uniprot_human_20170912 including contaminants database.
Fill up with the search parameters
Search parameters with Mascot integrated in Proteome Discoverer 2.3 (PhosphoRS 3.0 ) using default settings unless otherwise specified:
Fill up with the search parameters
TO BE FILLED UP
For all the different inputs, I create IDs of the phospho-peptides:
PhosphopeptideID: concatenation of sequence and localisation of the phosphorylation (seperated with "_").PhosphosequenceID: concatenation of sequence and number of phosphorylations on the peptide (seperated with "_").When there are several scores for the phosphorylations localisations, I create one ID for each scoring. I define the scorings as “ptmRS” when it is the phosphoRS algorithm, or “SearchEngine” when it is the default localisation score of the pipeline.
The filters associated with the analysis are in the documents of ParseInput/. I kept only the phosphorylated PSMs.
I define a spectrumID as the concatenation of the file name + the scan number. It corresponds to a spectrum. I look at the spectra that are identified in all the pipelines.
How many peptide sequences are identified across the pipelines?
## [1] "Mean values:"
## MaxQuant Andromeda TargetDecoy
## 7463.50
## PeptideShacker Comet TargetDecoy
## 6785.00
## PeptideShacker MSAmanda TargetDecoy
## 6789.00
## PeptideShacker X!Tandem TargetDecoy
## 6248.25
## Proteome.Discoverer Mascot Percolator
## 7565.75
## Proteome.Discoverer Mascot TargetDecoy
## 6159.75
## Proteome.Discoverer MSAmanda Percolator
## 7612.50
## Proteome.Discoverer MSAmanda TargetDecoy
## 7212.50
## Proteome.Discoverer SequestHT Percolator
## 7968.50
## Proteome.Discoverer SequestHT TargetDecoy
## 6909.75
## [1] "all values:"
## MaxQuant Andromeda TargetDecoy PeptideShacker Comet TargetDecoy
## [1,] 7393 6919
## [2,] 7358 6606
## [3,] 7523 6733
## [4,] 7580 6882
## PeptideShacker MSAmanda TargetDecoy
## [1,] 6790
## [2,] 6603
## [3,] 6708
## [4,] 7055
## PeptideShacker X!Tandem TargetDecoy
## [1,] 6221
## [2,] 6144
## [3,] 6239
## [4,] 6389
## Proteome.Discoverer Mascot Percolator
## [1,] 7503
## [2,] 7448
## [3,] 7582
## [4,] 7730
## Proteome.Discoverer Mascot TargetDecoy
## [1,] 6078
## [2,] 6047
## [3,] 6254
## [4,] 6260
## Proteome.Discoverer MSAmanda Percolator
## [1,] 7573
## [2,] 7502
## [3,] 7667
## [4,] 7708
## Proteome.Discoverer MSAmanda TargetDecoy
## [1,] 7074
## [2,] 7113
## [3,] 7253
## [4,] 7410
## Proteome.Discoverer SequestHT Percolator
## [1,] 7940
## [2,] 7811
## [3,] 8000
## [4,] 8123
## Proteome.Discoverer SequestHT TargetDecoy
## [1,] 6845
## [2,] 6757
## [3,] 6947
## [4,] 7090
## quartz_off_screen
## 2
How many peptide sequences are identified across the pipelines?
## [1] "Mean values:"
## MaxQuant Andromeda TargetDecoy
## 5993.500
## PeptideShacker Comet TargetDecoy
## 2314.750
## PeptideShacker MSAmanda TargetDecoy
## 2298.333
## PeptideShacker X!Tandem TargetDecoy
## 3726.667
## Proteome.Discoverer Mascot Percolator
## 4082.375
## Proteome.Discoverer Mascot TargetDecoy
## 3257.625
## Proteome.Discoverer MSAmanda Percolator
## 3368.750
## Proteome.Discoverer MSAmanda TargetDecoy
## 3207.500
## Proteome.Discoverer SequestHT Percolator
## 3237.625
## Proteome.Discoverer SequestHT TargetDecoy
## 2854.625
## [1] "all values:"
## $`MaxQuant Andromeda TargetDecoy`
## [1] 5909 5902 6044 6119
##
## $`PeptideShacker Comet TargetDecoy`
## [1] 1661 1665 1667 1741 3772 3570 3674 3749 1573 1574 1503 1628
##
## $`PeptideShacker MSAmanda TargetDecoy`
## [1] 1623 1656 1635 1701 3650 3542 3635 3844 1545 1539 1560 1650
##
## $`PeptideShacker X!Tandem TargetDecoy`
## [1] 1525 1552 1547 1595 3367 3301 3400 3447 6221 6142 6235 6388
##
## $`Proteome.Discoverer Mascot Percolator`
## [1] 5931 5850 6008 6062 2160 2174 2198 2276
##
## $`Proteome.Discoverer Mascot TargetDecoy`
## [1] 4974 4898 5123 5109 1443 1493 1494 1527
##
## $`Proteome.Discoverer MSAmanda Percolator`
## [1] 5970 5895 6060 6081 736 722 742 744
##
## $`Proteome.Discoverer MSAmanda TargetDecoy`
## [1] 5629 5624 5773 5867 686 690 683 708
##
## $`Proteome.Discoverer SequestHT Percolator`
## [1] 6213 6103 6273 6350 246 224 246 246
##
## $`Proteome.Discoverer SequestHT TargetDecoy`
## [1] 5424 5336 5512 5623 243 216 242 241
## quartz_off_screen
## 2
I plot the localisation score distributions for the spectra (monophosphorylated) that are identified across all the pipelines (16920 spectra): Ascores are divided by their max value.
## quartz_off_screen
## 2
Now without scaling:
## quartz_off_screen
## 2
I calculate the Jaccard index of each pair of search (for the identified spectra) to get a feel for the comparative performance/similarity of the searches. First with all the spectra:
## quartz_off_screen
## 2
I calculate the Jaccard index of each pair of search (for the identified spectra) to get a feel for the comparative performance/similarity of the searches. This is done only with the spectra identified in all the pipelines:
## quartz_off_screen
## 2
Now, I calculate the Jaccard index of each pair of search (for the identified spectra) to get a feel for the comparative performance/similarity of the searches in terms of peptide ID, not in terms of spectra.
Again, this is done only with the spectra identified in all the pipelines:
## quartz_off_screen
## 2
And with all spectra:
## quartz_off_screen
## 2
Searle, Brian C, Robert T Lawrence, Michael J MacCoss, and Judit Villén. 2018. “Thesaurus: quantifying phosphoprotein positional isomers.” bioRxiv. https://doi.org/10.1101/421214.