# Comparison of phospho-localisation pipelines for bottom-up DDA        

-------------

# Marie Locard-Paulet NNF-CPR February 2019

This project consists in the re-analysis of published data:

  + Ferries et al. 2017 (JPR): analysis of synthetic phospho-peptides spiked in
  a low amount of background constituted of human cellular extract. The same samples 
  have been run on the same orbitrap instrument with different methods (we only work with the OTHCD results here). The localisations of the phosphorylations are known. 
  + Searle et al. 2018 (BiorXiv): analysis of 5 runs of the same sample with a
  DDA and a DIA setup. These samples are phospho-enriched cellular extracts.

-------------

# Input data:

The data associated with the publications are all located in the Folder `InputData/`:

- `FerriesEtAl/PeptidePools/`: synthetic peptides used in the paper Ferries et 
al., 2017 (JPR). These are organised by pools to separate phospho-isomers.
- `FerriesEtAl/SearchResults/`: search results of the synthetic phospho-peptide samples used as input for this analysis. Rq.: these files are too big for being kept in github. They are available as supplementary information in the associated publication.
- `ThesaurusData/SupTable2.txt`: Supplementary table 2 of the paper Searle et al. 2018 (BiorXiv).
- `ThesaurusData/SearchResults/`: search results of the complex phospho-enriched samples used as input for this analysis. Rq.: these files are too big for being kept in github. They are available as supplementary information in the associated publication.

-------------

# Analysis:

## General folder organisation:

- Functions are in the folder `Functions/`.
- R scripts used for the analysis are in the folder `RScripts/`.
- The folder `Pictures/` contains the pictures used for the HTML reports.
- The folders `complexemixture_Searle/` and `syntheticpeptides_Ferries/` contain the scripts and associated HTML of the analysis of the search resluts of the complex phospho-enriched samples and synthetic phospho-peptides, respectively.
- The initial parsing of the search results is performed with the scripts in the folders `complexemixture_Searle/ParseInput` and `syntheticpeptides_Ferries/ParseInput`. 
- The data were saved at each step of the data analysis in the folder `RData/`.
- For each analysis, the folders `Figures/` contains the output figures.

## Data from Ferries et al. (mix of synthetic peptides):

Files in the folder `syntheticpeptides_Ferries/`:

- Folder `ParseInput/`: contains the scripts used for loading the input tables 
(search results) before analysis. *Set this directory as working directory to run the scripts.*
- `01_syntheticPeptides_SelectedMethods_Threshold.Rmd`: Analysis of the data 
from Ferries et al. with only the methods of interest (OT). -> the generated 
table is saved in `/RData/Output/Ferriesandal_Globaltable.Rdata`
- `02_syntheticPeptides_MatchedSpectra`: Compare the ID and localisations attri-
buted to the same spectrum.

## Data from the Thesaurus paper (complex phospho-enriched sample):

Files in the folder `complexmixture_Searle/`:

- Folder `ParseInput/`: contains the scripts used for loading the input tables 
(search results) before analysis. *Set this directory as working directory to run the scripts.*
- `00_ComplexPhosphoSample_ThesaurusData`: Comparison of the bioinformatics 
pipelines in terms of the number of PSM identified.
- `01_ComplexPhosphoSample_ThesaurusData`: Analysis of the data from the Thesau-
rus paper. Comparison of the bioinformatics pipelines on a complex sample.
- `02_.._MatchedSpectra`: comparison of the identification and phosphorylation 
localisation results obtained for the same spectrum.