Quantitative phosphoproteomics to characterize signaling networks

Reversible protein phosphorylation is involved in the regulation of most, if not all, major cellular processes via dynamic signal transduction pathways. During the last decade quantitative phosphoproteomics have evolved from a highly specialized area to a powerful and versatile platform for analyzing protein phosphorylation at a system-wide scale and has become the intuitive strategy for comprehensive characterization of signaling networks. Contemporary phosphoproteomics use highly optimized procedures for sample preparation, mass spectrometry and data analysis algorithms to identify and quantify thousands of phosphorylations, thus providing extensive overviews of the cellular signaling networks. As a result of these developments quantitative phosphoproteomics have been applied to study processes as diverse as immunology, stem cell biology and DNA damage. Here we review the developments in phosphoproteomics technology that have facilitated the application of phosphoproteomics to signaling networks and introduce examples of recent system-wide applications of quantitative phosphoproteomics. Despite the great advances in phosphoproteomics technology there are still several outstanding issues and we provide here our outlook on the current limitations and challenges in the field. © 2012 Elsevier Ltd. All rights reserved.


Introduction
Most, if not all, signal transduction pathways depend on protein phosphorylation to relay information through signaling cascades or regulate effector proteins, such as kinases, transcription factors or ubiquitin ligases, to elicit the end result of pathway activation [1]. During the last decade it became apparent that analysis of signaling networks at a system-wide level is required for understanding the dynamic and complex mechanisms of cellular signaling. This aspiration to study signaling pathways on a global scale has been among the principal motivations for developing and improving strategies in mass spectrometry (MS)-based phosphoproteomics [2]. Thus, the aim of phosphoproteomics and cell signaling studies converge on the need for an efficient, reliable and reproducible platform for quantitative phosphorylation analysis [3]. Numerous developments in enrichment procedures, instrumentation, quantitation strategies and software tools have been essential to enable routine application of phosphoproteomics [4] and as a consequence phosphoproteomics has matured from an exotic approach applied by only few labs to the method of choice for studying global phosphorylation in signal transduction. In this review we will focus on the improvements in experimental strategies for studying signal transduction pathways and provide an overview of the application of this technology and finally address some outstanding issues in this field.

Experimental strategies in mass spectrometry-based phosphorylation analysis of signaling networks
A fundamental challenge in analyzing protein phosphorylation by mass spectrometry is the low stoichiometry of phosphorylated proteins arising from the fact that usually only a small fraction of the complete complement of a given protein will exist in a particular phosphorylated form [5,6]. This constitutes a large obstacle for detection of phosphorylation sites by MS because this technology is biased toward high abundant sample components [7]. In the context of signal transduction this challenge is exacerbated by the generally low copy number of many proteins with pivotal roles in signaling cascades [8].
A major breakthrough in the detection phosphorylated tyrosine residues came with the development of phospho-tyrosine (pTyr) specific antibodies which proved very suitable for immunoprecipitation of both pTyr containing intact proteins [9][10][11] and also pTyr peptides obtained from endopeptidic digestion of proteins (see Fig. 1A) [12]. Furthermore, if the enrichment is performed under native buffer conditions it is possible to enrich not only pTyr containing proteins but also additional secondary interactors (see Fig. 1B) [13][14][15]. Although powerful, the antibody-based strategies are inherently directed toward pTyr, while generic antibodies targeting phosphorylated serine and threonine residues in sequence independent manner prove unsatisfactory over the years. Therefore complementary techniques have been developed that allow also enrichment of phosphorylated serine and threonine containing proteins and peptides. For this task the use of immobilized metal affinity chromatography (IMAC) [16][17][18] or metal oxides [19][20][21], performed either off-line or in an automated setup [22][23][24], has proven very successful in providing near complete enrichment of phosphorylated peptides (see Fig. 1C). These enrichment techniques are often being used in combination with additional chromatographic approaches for sample fractionation in order to reduce sample complexity and thereby to further increase the coverage of the phosphoproteome [25][26][27][28][29].
In addition, to the developments in phosphopeptide enrichment and fractionation, also numerous improvements in mass spectrometry have greatly facilitated identification of phosphopeptides. In particular the emergence of instrumentation utilizing the Orbitrap analyzer [30] combined with a linear ion-trap [31][32][33] or by itself [34,35] have been very beneficial due to their high sensitivity, sequencing speed and excellent accuracy. Furthermore, several new innovations in peptide fragmentation technology [36][37][38][39][40] have greatly aided identification of phosphopeptides by overcoming the poor peptide backbone cleavage of phosphopeptides that impede peptide sequencing by conventional techniques [41].
Another challenge in phosphoproteomics relates to the highly dynamic nature of protein phosphorylation involved in signal transduction [42,43]. Although establishing whether a given protein or specific amino acid residue is phosphorylated is highly Fig. 1. Enrichment of phosphorylated proteins or peptides. To compensate for the low levels of most phosphorylated proteins in complex biological samples two general principles based on antibodies or metal affinity have been applied for enrichment of phosphorylated proteins or peptides. (A) Antibodies to phosphotyrosine residues can be used to enrich for either intact proteins or proteolytic peptides containing phosphorylated tyrosine. (B) By using phosphotyrosine specific antibodies for enrichment of intact proteins not only phosphotyrosine containing proteins (red) but also additional proteins physically interacting with the primary bait protein can be enriched (blue) providing information about protein-protein interactions. (C) To enrich for peptides phosphorylated on serine, threonine or tyrosine a number of procedures based on the affinity of different metals for phosphate groups has been developed.
informative by itself, the main goal when applying phosphoproteomics to study signal transduction pathways is usually also to quantitate the changes in phosphorylation associated with a given stimulus or cellular process. To this extent the development of a range of different strategies utilizing stable isotopes has been highly influential as those enable proteomics experiments to be performed in a quantitative manner. The basic concept of stable isotope labeling for MS-based quantitation is that isotopically different peptides behave virtually identically during mass spectrometry analysis, but are distinguishable due to the mass shift conveyed by the different isotopic composition and hence the ratio of observed intensities are directly proportional to the relative quantities of the peptides in the sample [44,45] (see Fig. 2A). Two different approaches have been the ones predominantly applied To allow quantitative characterization of protein phosphorylation stable isotope labeling has been widely applied. (A) The basic concept of stable isotope labeling for MS based quantitation is that differentially labeled peptides can be discriminated based on the mass shift introduced by the labeling and the intensity of the observed peaks corresponds directly to the abundance in the sample. (B) Stable isotopes can be introduced by culturing cells in growth medium containing different versions of one or more amino acids, the cells will then take them up from the medium and incorporate them into the proteome. Subsequently, the labeled cells or cell lysates can be mixed and used for phosphopeptide enrichment. (C) An alternative to the metabolic labeling is to use chemical reagents with different isotope composition to label peptides. (D) The result of quantitative phosphoproteomics experiments is usually a collection of data tables. To extract biologically relevant information from these, various bioinformatics analyses are usually performed resulting in the identification of a subset of phosphorylation sites subjected to further functional follow-up.
for introducing isotope labeling to the protein or peptide sample. The first approach exploits the metabolism of cells grown in culture to incorporate isotopes by culturing cells in growth medium containing labeled amino acids [46,47] (see Fig. 2B). The second approach relies on conjugating isotopically different variants of a chemical reagent with sample peptides to introduce the labeling [48,49] (see Fig. 2C).
The initial result of quantitative phosphoproteomics experiments analyzing signaling pathways will be a collection of data about the dynamics of phosphosites upon, e.g. receptor activation. The data in this form, although highly rich in information, does not readily provide direct biological conclusions. Thus a significant amount of data analysis is required to extract the critical features of the data in order to put the experimental observations into the context of existing knowledge and build new hypotheses (see Fig. 2D) [50]. To approach this task numerous different analysis strategies has been employed, but a small collection of these stand out as the most extensively used due to their robustness and versatility [51]. One popular approach is to subject quantitative data from, e.g. time-course studies to unsupervised clustering, hereby partitioning the potentially thousands of identified phosphorylation sites into a collection of usually less than a dozen clusters. The power of this analysis reside in the fact that, despite that most identified phosphorylation sites will have a unique exact dynamic profile, the observed profiles will fall within groups corresponding to, e.g. early responders such as membrane receptors and late responders as transcription factors [6,52,53]. Highly beneficial to the analysis of phosphoproteomics data on signaling pathways is also the large collection of curated databases of signal transduction pathways. In particular the publically available Kyoto Encyclopedia of Genes and Genomes database [54] and commercial solutions as the Ingenuity Pathway Analysis (Ingenuity Systems) and MetaCore (GeneGO) have gained increasing popularity due to their comprehensive databases about signaling pathways and user-friendly mode of operation. To identify potential pathway members, not previously associated with a given pathway, the information about binary protein-protein interactions stored in databases such as String [55], IntAct [56] or MINT [57] can be used to construct complete interaction networks. As a final example of commonly employed analysis strategies is kinase motif analysis. Most serine/threonine kinases require the presence of specific amino acid residues in the proximity of the substrate residue to target the site for phosphorylation, these sequence requirements of kinases are often referred to as linear kinase motifs [58]. Using this information, in combination with prediction algorithms to obtain putative kinases for a phosphorylation site or motif, the experimentally identified phosphorylation sites can be associated with the likely kinases responsible for their phosphorylation [59,60].

Applications of phosphoproteomics in eukaryotic cell signaling studies
Due to the versatility of phosphoproteomics strategies, these have been applied to study a wide range of biological processes in many different organisms. The area attracting most initial interest in the field of phosphoproteomics has been the signaling downstream of receptor tyrosine kinases (RTKs), with the epidermal growth receptor (EGFR) pathway serving as the prototypical example. Among the reasons for this initial focus is (i) the vast importance of RTK signaling for normal cellular functions and its association with development of numerous human diseases, when deregulated [61], and (ii) that these signaling pathways can be well described by focusing on Tyr phosphorylation with quantitation of phosphorylation providing a comprehensive overview of the RTK network. There are many proteomics-centered studies of RTK signaling so far and these have covered diverse experimental approaches and biological insights ranging from temporal characterization of signaling at the pTyr level and global phosphorylation as well as protein-protein interactions and signaling cross-talk. Due to the widespread application of phosphoproteomics to study RTK signaling, this topic has been extensively covered in several recent excellent reviews and we refer the reader to these for a detailed description of the current status of this area, see for example [3,43,62]. The remainder of this section will introduce examples of some of the many other widespread applications of phosphoproteomics in primarily human and mouse cellular systems and further refer the reader to separate chapter in this issue regarding phosphoproteomics in bacteria.

G-protein coupled receptor signaling
The canonical view on signaling from G-protein coupled receptors (GPCRs) is that they signal by activating intracellular G-proteins, ultimately resulting in the generation of second messengers as diacylglycerol, cyclic AMP (cAMP) and inositolphosphates. However, a growing bulk of information is accumulating, which reveals parallel signaling from GPCRs via G-protein independent events by phosphorylation. Supporting this view is a study by Hoffert et al. which used quantitative phosphoproteomics to study the signaling from the GPCR Vasopressin 2 receptor and demonstrated that the signaling downstream of this receptor affect not only the cAMP-PKA signaling cascade but also result in inactivation of MAPK signaling pathways [63]. The lipid lysophosphatidic acid (LPA) is a ligand for GPCRs and stimulation with LPA results in modulation of a range of biological processes [64]. The signaling elicited by LPA stimulation was compared with signaling resulting from heparin binding epidermal growth factor (HB-EGF) and identified stronger transactivation of EGFR by LPA resulting in stronger induction of most HB-EGF triggered events than stimulation with HB-EGF itself [65].
The C-X-C chemokine receptor type 4 (CXCR4) also signals through G-proteins and the signaling from this receptor was the topic of two recent studies which stimulated the receptor with the chemokine CXCL12 in either B cells [66] or T cells [67]. Both studies identified novel CXCL12 responsive targets of the kinase AKT as well as validating several previously proposed phosphorylation targets and suggested cross-talk between several pathways.
Another system utilizing GPCR signaling is the activation of the angiotensin II type 1A receptor (AT1R) that also transmits the signal via G-protein independent pathways. Two studies characterized the G-protein independent signaling from AT1R by stimulating cells with the ligand [Sar 1 ,Ile 4 ,Ile 8 ] angiotensin II (SII), which selectively inhibits G-protein signaling while eliciting G-protein independent events [68,69]. These studies demonstrated that several kinases take part in the G-protein independent signaling from AT1R and proposed critical roles for protein kinase D [68] and beta-arrestin signaling in the AT1R signaling network.

Immunology and infection
Activation of the T cell receptor (TCR) is critical to the adaptive immune response and the signaling from this receptor is known to occur largely via phosphorylation of tyrosine residues. Based on this knowledge Kim and White applied a quantitative pTyrdirected strategy to characterize the early signaling events upon TCR activation [70]. The resulting data provided quantitation of the tyrosine phosphorylation of critical TCR pathway proteins as Zap70, ITAMs, ERK1/2 and proposed a potential mechanism by which enhanced tyrosine phosphorylation of PLC-␥/Shc/Grap2/Vav1 upon TCR activation cause an increased MAPK activation resulting in the increased expression of the cytokine IL-2, an established T Cell marker protein.
Mayya et al. applied global phosphoproteomics to cover also Ser and Thr phosphorylation events in Jurkat cells resulting in the identification 696 TCR-responsive phosphorylation sites on 453 proteins covering a range of proteins with established function in TCR signaling, as well as proteins not previously implicated in TCR signaling with functions in, e.g. integrin activation and endocytosis [71]. Motif analysis revealed over-representation of MAPK substrates in the TCR responsive phosphorylations and inhibitor experiments was performed to identify novel ERK substrates in TCR activated Jurkat cells. Furthermore, the authors extracted information from their data to propose that the unifying theme for the observed serine and threonine phosphorylations is to modulate protein-protein interactions and provide evidence of phosphorylation modulated regulation of microtubule assembly.
Navarro et al. applied phosphoproteomics to study TCR signaling in cytotoxic T Cells and identified more than 2000 phosphorylation sites of which 450 displayed changes upon TCR stimulation [72]. This study quantified phosphorylation on known TCR signaling components but also identified number of regulators of chromatin acetylation, such as several histone deacetylases, and proceeded to demonstrate phosphorylation dependent constitutive cytosolic localization of HDAC7 in cytotoxic T cells resulting in sustained high expression of CD25 accompanied by a greater ability to produce interferon-␥ upon TCR activation.
To characterize the cellular signaling induced by infection with Salmonella enterica Rogers et al. infected a HeLa cell culture with Salmonella [73] and the response of the phosphoproteome was quantified. This resulted in the identification of 9508 phosphorylation sites, of which 24% showed significant dynamics during the infection period. Based on a comparison with a global phosphoproteomics analysis of EGFR signaling [52] the authors could confirm that Salmonella infection induce signaling events similar to those resulting from EGF stimulation, such as activation of the canonical MAPK signaling pathway. However, compared to stimulation with EGF the Salmonella infection resulted in a delayed induction of pTyr signaling. In addition, by infecting cells with a deletion mutation for the sopB gene, that code for a phosphoinositide phosphatase in Salmonella utilized to activate downstream signaling, the authors identified SopB to be important for Akt-mediated phosphorylation of several substrates such as BAD and Rac1.

Cell cycle regulation
A fundamental property of living systems is the ability to proliferate, in eukaryotes by means of the cell cycle. The cell cycle is a highly regulated machinery which ensure that the integrity of daughter cells are maintained. The bulk of regulatory events in the cell cycle depend on reversible protein phosphorylations and thus kinases are of critical importance. To expand the understanding of kinase regulations during human cell cycle, Daub et al. [74] applied a combination of SILAC with kinase enrichment based on immobilized kinase inhibitors resulting in identification of more than 1000 phosphorylation sites on 219 protein kinases, including important cell cycle regulators as Wee1, Plk1, Aurora kinases and CDK1. The quantitative data revealed that more than 50% of all phosphopeptides derived from protein kinases demonstrated more than two-fold increase in mitotic cells, including several previously unknown sites on critical cell-cycle kinases. To expand further the understanding of phosphorylation mediated regulation of the human cell cycle, Olsen et al. applied a global phosphoproteomics strategy to quantitate the phosphoproteome across six cell cycle stages resulting in the identification of 24,714 phosphorylation sites [6]. From the quantitative profiles across the cell cycle the authors could assign regulation of particular biological processes and signaling pathways to the different stages of the cell cycle providing an extensive map of the human cell cycle regulation. Based on kinase predictions for the identified phosphorylations, a map of kinase activities across the cell cycle could be created demonstrating the expected activation of CDKs in M-phase but also an over-representation of the DNA damage kinases ATM/ATR substrates in S-phase, attributable to a universal replication stress response. To estimate the occupancy of the observed phosphorylation sites the authors devised an algorithm to calculate the fraction of a given protein phosphorylated at a given site and discovered that most substrates of CDK1 and other mitotic kinases had almost complete occupancy in mitotic cells. Complementing the aforementioned study is work by Dephoure

DNA damage response
The early cellular response to DNA damage induced by, e.g. chemical agents or radiation is dependent on protein phosphorylation, in conjunction with other PTMs to relay information about loss of DNA integrity [76]. Central to the DNA damage response (DDR) are the kinases ATM and ATR, which become activated upon DNA damage [76]. To expand the knowledge about members of the DDR signaling network, a strategy based on immunoprecipitation of peptides containing the SQ/TQ sequence recognition motif in ATM/ATR substrates from cells subjected to ionizing radiation was used [77]. Using this strategy 905 potential ATM/ATR substrates were identified, including several sites on members of the DDR machinery such as KAP1, BRCA1 and FANCD2. To validate some of the new findings, several functional readouts for implications in the DDR were performed and demonstrated that siRNA based knockdown of some of the identified ATM/ATR substrates resulted in an altered DDR. To further broaden the knowledge about phosphorylation dependent signaling after DNA damage, Bennetzen et al. [78] and Bensimon et al. [79] applied quantitative phosphoproteomics to nuclear fractions from cells treated with either ionizing radiation or neocarzinostatin, both inducing DNA double strand breaks. From these endeavors Bennetzen et al. found 594 sites out of the 7043 identified to be regulated and Bensimon et al. found 753 sites out of 2871 to show significant dynamics during DDR. In both studies linear motif analysis was performed identifying occurrence of the ATM/ATR SQ/TQ motif and Bensimon et al. found also overrepresentation of SP, TP and SxxE motifs. From network analyses the two studies, also in concordance, found processes relating to RNA and DNA processing to be among the main targets of phosphorylationbased regulation.
While the studies introduced above used cellular treatments that induced DNA breaks, another study subjected mouse embryonic stem cells to cisplatin, a widely used cancer chemotherapy drug [80]. The mode of action of cisplatin is to bind the DNA creating adducts that interfere with transcription and replication [81]. In this study 9137 phosphorylation sites were identified, of which 377 showed more than two-fold regulation after 4 h of cisplatin treatment. Network analysis revealed that processes relating to DNA repair were over-represented among proteins with up-regulated sites whereas processes relating to mitotic control and cytoskeleton were enriched within proteins containing down-regulated phosphorylations. Furthermore, it was found that cisplatin treatment induced activation of ATM and ATR as well as additional known DDR proteins. In addition to ATM and ATR, phosphorylation in the activation loop of 11 kinases was identified and siRNA-based follow-up demonstrated a protective role for three of these, namely CDK7, Plk1 and KPCD1.

Stem cell differentiation and reprogramming
The biology of stem cells has been intensely studied, motivated by their potential for characterizing development of multicellular organisms and, not least, their promise in clinical applications. It is well established that phosphorylation-dependant signal transduction is of crucial importance for maintaining stem cells in their resting undifferentiated state, as well as to induce and direct stem cells to differentiate to particular derivative cell types [82,83]. Thus quantitative phosphoproteomics present an attractive platform for studying stem cell biology. Indeed, various phosphoproteomic approaches have already been applied to study unique processes taking place in both adult and embryonic stem cells [84][85][86][87]. In a pioneering study, e.g. Kratchmarova et al. used a pTyr focused approach and identified PI3K-dependent mechanisms by which growth factors regulate the differentiation of adult mesenchymal stem cells to bone-forming osteoblasts [88]. In this section we will only summarize several recent studies focused on human embryonic and induced pluripotent stem cells.
Embryonic stem cells (ESCs) constitute the source of all cells found in the adult individual and thus represent an ideal system for studying embryonic development and cellular differentiation [89,90]. Van Hoof et el. used global phosphoproteomics to characterize phosphorylation dynamics following BMP4 stimulation, a treatment known to induce trophoblast differentiation of human ESCs [91], while Brill et al. combined label-free quantitation with global phosphoproteomics of cells undergoing non-specific differentiation induced by retinoic acid treatment [92]. Both studies obtained a similar coverage of the phosphoproteome to a depth of ∼3000 sites and both found widespread regulation of protein phosphorylation of as much as 50% of the quantified sites, underlining the importance of phosphorylation mediated signaling in ESC differentiation. In addition to identifying phosphorylations on several human ESC marker proteins, Van Hoof et al. performed a kinase motif analysis and identified the critical cell-cycle regulator CDK1/2 to play a prominent role and could demonstrate cross-talk between phosphorylation and SUMOylation since the SUMOylation of the critical ESC transcription factor SOX2 was dependent on phosphorylation of sites adjacent to the site of SUMOylation. From a pathway analysis Van Hoof et al. found phosphorylation of SMAD5 and SMAD8 serving as a confirmation of activation of the BMP4-SMAD signaling pathway by the stimulation. This analysis also identified the PI3K/AKT pathway to be perturbed as increasing phosphorylation were found on critical kinases in the cascade such as PDK1 and mTOR. Similarly, Brill et al. also applied a pathway analysis approach and found a number of RTK pathways to be of potential importance. By performing inhibitor-based experiments, Brill et al. could extend their finding to propose that PDGF and VEGF pathways may promote the undifferentiated state of human ESCs.
A more recent study applied a strategy similar to that used by Van Hoof et al. to quantitatively characterize the dynamics in the proteome and phosphoproteome in human ESC following induction of differentiation with either an activator of the protein kinase C or treatment of the ESC with medium not conditioned on feeder cells (non-conditioned media, NCM) and therefore not able to sustain the undifferentiatied pluripotent growth of the ESCs [53]. Common for the two treatments is that both induce ESC to undergo spontaneous (undirected) differentiation. In this study a total of 23,522 phosphorylation sites were identified and, similar to the two reports mentioned above, a high proportion of the identified phosphorylation sites showed significant dynamics. Linear kinase motif analysis indicated that groups of kinases appeared to be co-regulated, since, e.g. kinases directed by a proline C-terminal to the phosphorylated sites predicted to be phosphorylated by members of the CDK and MAPK families were down-regulated. Conversely charged motifs predicted to be targets of CLK and CSK families were up-regulated. Since two distinct treatments were used to induce differentiation in this study, discrimination between generic and treatment specific events were possible by systematic comparison of the profiles of phosphorylation sites between the treatments. This allowed identification of most prominent treatment-specific phosphorylation events on proteins involved in processes associated with cell adhesion, while processes relating to cell cycle control displayed highly similar phosphorylation dynamics in both differentiation paradigms. Furthermore, a combination of unsupervised clustering and Gene Ontology enrichment was used in order to extract the biological processes affected by one or both of the treatments. Within the processes affected by both treatments was DNA methylation, which is a critical epigenetic mechanism involved in the regulation of hESC differentiation. This finding was further explored using co-immunoprecipitation experiments which resulted in identification of interaction between DNA methyltransferases and the RNA polymerase II-associated factor 1 (PAF1) complex, a complex involved in transcriptional regulation of critical human ESC pluripotency genes. The identification of this interaction in the early stages of ESC differentiation might therefore provide a connection between regulation at the level of protein phosphorylation and the epigenetic regulation.
The recent demonstration that somatic cells can be manipulated to attain characteristics highly similar to ESC, called induced pluripotent stem cells (iPSCs), have attracted much attention as those hold great therapeutic potentials [93]. Phanstiel et al. applied a phosphoproteomics approach, combined with both proteomics and transcriptomics techniques, to evaluate the similarity between human ESC, iPSCs and fibroblasts [94]. As expected, widespread differences were seen between ESCs and fibroblasts but only subtle differences were found between ESCs and iPSCs. However, these subtle differences did result in functional enrichment of several processes required for somatic cellular function. Based on bioinformatics analyses the authors were able to attribute the main differences between ESC and iPSC to incomplete silencing of the somatic cell programs during reprogramming and generation of the iPSCs.

Concluding remarks
In the preceding sections we have attempted to give an overview of the diverse applications of quantitative phosphoproteomics for characterization of various signaling networks. However, despite the many technological advances, many successful applications and invaluable contribution of phosphoproteomics to cell signaling research, several unresolved issues still remain. While some of these are likely to be solved in the coming years, others are of a more fundamental character and are inherent to the applied strategies in phosphoproteomics. The majority of outstanding issues apply to phosphoproteomics technology in general, but the consequence of these is often particularly adverse when studying signal transduction.
As the number of published phosphoproteomics datasets increase, the number of known phosphorylations sites keeps accumulating as well making efficient data sharing and presentation a growing challenge. To address this issue several databases have been initiated which focus on collecting the knowledge of phosphorylation sites and providing a clear reference to the studies identifying this site [95][96][97]. This is an important initiative since a usual proteomics experiment will be fixed at a false discovery rate of 1% and therefore the number of erroneous phosphorylation will also accumulate. Therefore, allowing users to easily inspect if a given site was identified ambiguously by a single study only or is repeatedly being reported, provide a good basis for evaluating the certainty of the identification before proceeding to do, e.g. functional follow-up studies.
Despite the rapid developments in phosphoproteomics, it is still not possible to identify all phosphorylation sites from a given proteome. The bias of MS-based technologies for high abundant sample constituents represents one of the obstacles for detecting all phosphorylation sites, especially on usually low abundant proteins with regulatory functions. It is therefore uncommon to identify phosphorylations on all the members of a signaling cascade. Furthermore, a typical quantitative phosphoproteomics dataset may not provide complete information about the phosphorylation on a protein, e.g. only identifying some of several known phosphorylation sites for that protein. However, to alleviate this issue there is a constant push from the community for establishing new and improved methodologies for sample preparation. In parallel, instrument vendors have a clear financial incentive for developing faster and more sensitive equipment.
Close to all large-scale quantitative phosphoproteomics experiments are currently using a bottom-up strategy where proteins are enzymatically digested to peptides which are then analyzed by MS. However, in doing so critical quantitative information about the context of the phosphorylation may be lost. For example, it is not applicable to directly outline possible regulation of one phosphorylation with another distant phosphorylation (or other PTM) on the same polypeptide molecule. It is also not possible to directly link whether the presence of two sites on the same protein is mutually exclusive or, e.g. have a positive dependence of coexistence, such as priming phosphorylation sites. Another pitfall of the bottom-up strategy is that it is not possible to quantify a specific site if this is only identified on a peptide harboring also another site since the regulation of these two sites will be inseparable. The information for the individual site could however be readily extrapolated if all different phosphorylation states of the peptide are detected in the MS measurements. Finally, phosphorylation sites may be located on identical peptide sequences but derived from distinct splice-isoforms of the protein, as well as on peptides that are not observable by MS. This issue can be minimized by the use of different proteolytic enzymes, however on the expense of multiplying the number of samples and analytical time accordingly.
The improvements in phosphoproteomics technology have enabled researchers to identify and quantify thousands of phosphorylation. However, functional characterization of single phosphorylation sites is lacking far behind our ability to identify phosphorylation sites and becoming one of the major bottlenecks. As an outcome of this grave discrepancy, information regarding the role and function of specific phosphorylation sites will only exist for a fraction of all the sites in a phosphoproteomics experiment. Moreover, phosphoproteomics has not yet reached the limits of its high potential and therefore there is an urgent demand for largescale procedures for functional characterization of phosphorylation sites to match the speed at which the technology for identifying and quantifying these are progressing.
Finally, another challenge arises from the more and more appreciated prevalence and importance of the cross-talk between various PTMs. Despite the many new insights gained from quantitative phosphoproteomics, several recent studies have highlighted the need for a characterization of additional PTMs for a fully comprehensive description of proteomes. It is well established that ubiquinitation plays a pivotal role in signaling networks [98,99], but recent studies have demonstrated that the prevalence of ubiquitination is more widespread than previously anticipated [100,101]. Adding even further complexity is the observation that Lysine acetylation appears to be another widespread PTM involved in a range of cellular processes [33]. Because of this dramatic prevalence of PTMs, the ultimate goals of characterizing the full protein complement of living systems including PTM isoforms, remains a rather distant goal.