Select my path

The path to my input files is ../../InputData/ThesaurusData/SearchResults/PS.

Select localisation filter

Load data

Parse the method (search engine and validation strategy) from folder names:

Add information to the table

I manually add the information of which search engine has been used for the search.

Filters

I keep only te PSMs with a phosphorylation.

Parse the localisation scores:

For the localisation scores of the search engines, I use the column “D score”. There is one score per PSM so if the PSM has several phosphorylations, I consider that they all have the same score. To find the positions of the phosphorylations I use the field “Annotated sequence”.

I also parse the localisation scores from the ptmRS adn Ascore algorithm. Both are returned in the column “Probabilistic.PTM.score”.

Comment: Some Localisation confidences are “Not Scored”, or “Random”. I don’t take these into consideration.

There are empty fields in the “Probabilistic.PTM.score” field for the Ascore: 8425, 8463, 7340for the pipelines PeptideShacker Comet Ascore TargetDecoy, PeptideShacker MSAmanda Ascore TargetDecoy, PeptideShacker X!Tandem Ascore TargetDecoy, respectively. (For a total of 54392, 54392, 48269, 48269, 47734, 47734, for PeptideShacker Comet Ascore TargetDecoy, PeptideShacker Comet PhosphoRS TargetDecoy, PeptideShacker MSAmanda Ascore TargetDecoy, PeptideShacker MSAmanda PhosphoRS TargetDecoy, PeptideShacker X!Tandem Ascore TargetDecoy, PeptideShacker X!Tandem PhosphoRS TargetDecoy, respectively).

Distribution of the ptmRS scores:

Distribution of the Ascores:

*The Ascore is the only scoring scheme tested in this study that does not range between 0 and 1 or 100.

Distribution of the search engine scores:

apply the threshold of localisation score : * above 0.75. The data are not filtered yet, I indicate if the localisation score passes the threshold in the field LocalisationsFilter. * above 20. The data are not filtered yet for Ascore.

Create a phospho-peptide ID

For all the different inputs, I create IDs of the phospho-peptides:

When there are several scores for the phosphorylations localisations, I create one ID for each scoring. I define the scorings as “ptmRS” when it is the phosphoRS algorithm, or “SearchEngine” when it is the default localisation score of the pipeline.

So in the end, the tables contains two rows per PSM, one with each localisation scoring scheme.

I remove the PSMs with empty Ascores.

Save data


References