Path to the search results parsed with this script: ../../InputData/FerriesEtAl/SearchResults/PS/.
Parse the method (search engine and validation strategy) from folder names:
I combine the tables in one.
I manually add the information of which search engine has been used for the search.
I keep only the phosphorylated ions.
I keep only the files corresponding to the following acquisition methods: HCDOT.
For the localisation scores of the search engines, I use the column “D score”. There is one score per PSM so if the PSM has several phosphorylations, I consider that they all have the same score. To find the positions of the phosphorylations I use the field “Annotated sequence”.
I also parse the localisation scores from the ptmRS adn Ascore algorithm. Both are returned in the column “Probabilistic.PTM.score”.
Comment: Some Localisation confidences are “Not Scored”, or “Random”. I don’t take these into consideration.
Here, I work only with one acquisition method: HCDOT.
Distribution of the ptmRS scores:
Distribution of the Ascores:
*The Ascore is the only scoring scheme tested in this study that does not range between 0 and 1 or 100.
Distribution of the search engine scores:
apply the threshold of localisation score : above 0.75. The data are not filtered yet, I indicate if the localisation score passes the threshold in the field LocalisationsFilter.
For all the different inputs, I create IDs of the phospho-peptides:
PhosphopeptideID: concatenation of pool, sequence and localisation of the phosphorylation (seperated with "_").PhosphosequenceID: concatenation of pool, sequence and number of phosphorylations on the peptide (seperated with "_").When there are several scores for the phosphorylations localisations, I create one ID for each scoring. I define the scorings as “ptmRS” when it is the phosphoRS algorithm, or “SearchEngine” when it is the default localisation score of the pipeline.
So in the end, the tables contains two rows per PSM, one with each localisation scoring scheme.