Select my path

The path to my input files is ../../InputData/ThesaurusData/SearchResults/PD.

Select localisation filter

Load data

Parse the method (search engine and validation strategy) from folder names:

I keep only the tables with ptmRS data.

To homogenise the outputs of different pipelines, I change “Amanda.Score” and “XCorr” to “Ions.Score”, “Apex.RT.in.min” to “RT.in.min”, and I remove the columns “Identity.Strict”, “Identity.Relaxed”, “Expectation.Value”, “Homology.Threshold”, “Peptides.Matched”, “Percolator.SVMScore”, “Percolator.q.Value”, “Search.Space”, “MS2.Errorin.ppm”, “MS.Amanda.Rank” (There are two rankings for MS Amanda searches, I keep “Search.Engine.Rank”).

Add information to the table

I manually add the information of which search engine has been used for the search.

Filters

I keep only the high confidence PSM.

I keep only the rank 1 PSM.

I keep only te PSMs with a phosphorylation.

Some sequences from fragment proteins are identified with a “x” on N-ter that indicates that this does not correspond to the start of the protein. It leads to weird wrongly determined phosphorylations (on a non-existing “x” amino-acid) that I remove.

Parse the localisation scores:

Distribution of the ptmRS scores:

Distribution of the delta scores:

I duplicate the rows to get for each PSM one row with the ptmRS localisation and an other one with the delta score.

apply the threshold of localisation score : above 0.75. The data are not filtered yet, I indicate if the localisation score passes the threshold in the field LocalisationsFilter.

Create a phospho-peptide ID in the PD table

For all the different inputs, I create IDs of the phospho-peptides:

When there are several scores for the phosphorylations localisations, I create one ID for each scoring. I define the scorings as “ptmRS” when it is the phosphoRS algorithm, or “SearchEngine” when it is the default localisation score of the pipeline.

Save data


References