Dataset Open Access

The Archaeal Proteome Project advances knowledge about archaeal cell biology through comprehensive proteomics

Schulze, Stefan

Data collector(s)
Adams, Zachary; Cerletti, Micaela; De Castro, Rosana; Ferreira-Cerca, Sébastien; Garcia, Ben A.; Giménez, María Inés; Hippler, Michael; Jevtic, Zivojin; Knüppel, Robert; Legerme, Georgio; Lenz, Christof; Marchfelder, Anita; Maupin-Furlow, Julie; Paggi, Roberto A.; Pfeiffer, Friedhelm; Poetsch, Ansgar; Urlaub, Henning
Data curator(s)
Fufezan, Christian
Project manager(s)
Pohlschroder, Mechthild

Modern proteomics approaches can explore whole proteomes within a single mass spectrometry (MS) run. However, the enormous amount of MS data generated often remains incompletely analyzed due to a lack of sophisticated bioinformatic tools and expertise needed from a diverse array of fields. In particular, in the field of microbiology, efforts to combine large-scale proteomic datasets have so far largely been missing. Thus, despite their relatively small genomes, the proteomes of most archaea remain incompletely characterized. This in turn undermines our ability to gain a greater understanding of archaeal cell biology.

Therefore, we have initiated the Archaeal Proteome Project (ArcPP), a community effort that works towards a comprehensive analysis of archaeal proteomes. Starting with the model archaeon Haloferax volcanii, using state-of-the-art bioinformatic tools, we have:

  • reanalyzed more than 26 Mio. spectra
  • optimized the analysis using parameter sweeps, multiple search engines implemented in Ursgal, and the combination of results through the combined PEP approach
  • thoroughly controlled false discovery rates for high confidence protein identifications using the picked protein FDR approach and limiting FDR to 0.5%
  • identified more than 45k peptides, corresponding to 3069 proteins (>75% of the proteome) with a median sequence coverage of 55%.
  • analyzed N-terminal protein processing, including N-terminal acetylation and signal peptide cleavage
  • performed a detailed glycoproteomic analysis, identifying >230 glycopeptides corresponding to 45 glycoproteins

Benefiting from the established bioinformatic infrastructure, we will follow up on this analysis focusing on H. volcanii proteogenomics as well as the characterization of additional post-translational modifications. Furthermore, ArcPP will integrate quantitative results obtained from the individual datasets in order to identify common regulatory mechanisms. These studies on the H. volcanii proteome can serve as a blueprint for comprehensive proteomic analyses performed on a diverse range of archaea and bacteria.

 

For further details, please refer to the following publications. Please also cite this work if you use these results for further analyses:

Schulze, S., Adams, Z., Cerletti, M. et al. The Archaeal Proteome Project advances knowledge about archaeal cell biology through comprehensive proteomics. Nat Commun 11, 3145 (2020). https://doi.org/10.1038/s41467-020-16784-7

Schulze, S.; Pfeiffer, F.; Garcia, B.A.; Pohlschroder, M. (2021). Comprehensive glycoproteomics shines new light on the complexity and extent of glycosylation in archaea. PLOS Biol.  https://doi.org/10.1371/journal.pbio.3001277

 

An interactive website to explore the combined results can be found at https://archaealproteomeproject.org/

Scripts and metadata used for the analysis can be found at https://github.com/arcpp/ArcPP

 

Updates version 1.3.0:

- includes dataset PXD021827

Updates version 1.2.0:

- Includes dataset PXD021874
- Includes results from a comprehensive glycoproteomic analysis of ArcPP datasets

Updates version 1.1.0:
- Natrialba magadii results are included in PXD009116.zip

Files (4.6 GB)
Name Size
ArcPP_FDR_summary.pkl
md5:8b0b9d7ce8804653e9463622c5327828
4.4 MB Download
ArcPP_results_2021-06-25_peptides_incl_glyco.csv
md5:b3cca2f24e72cdce9de4f0615f4e2554
62.4 MB Download
ArcPP_results_2021-06-25_proteins_incl_glyco.csv
md5:d59bc50daf5221f6613e4552ca532703
3.8 MB Download
ArcPP_results_2021-06-25_PSMs_incl_glyco.csv
md5:9bb1e3dbff80922f5e62f6ea9e712ca9
679.6 MB Download
ArcPP_results_predicted_CS.csv
md5:0079710b440edba5491b92cd172a52ae
36.1 kB Download
ArcPP_results_PSMs_N-glyco_v1_3.csv
md5:f08e22afbcc3755f59b78cc76b4143ca
2.4 MB Download
ArcPP_results_PSMs_non-canonical-N-glyco_v1_3.csv
md5:05eb49d8aa12354c936ae81f52c4e487
17.6 kB Download
ArcPP_results_PSMs_O-glyco_v1_3.csv
md5:c7ec5e0455ee8c44c5415627551967c5
33.8 kB Download
ArcPP_summarized_results.pkl
md5:a52f801e323d9b87d49b212ff69b62e6
827.0 MB Download
Haloferax_volcanii_ArcPP_20190606_uniprot.fasta
md5:3a1bdf65fd02e0ab14cdc18a15b9cfa4
1.4 MB Download
Haloferax_volcanii_ArcPP_20190606_uniprot_cRAP_target_decoy_gluc.fasta
md5:bc0663da2ec8f3df84ba31fafd6c01b7
3.0 MB Download
Haloferax_volcanii_ArcPP_20190606_uniprot_cRAP_target_decoy_trypsin.fasta
md5:5439b904da6e4baeb07f1848d453f974
3.0 MB Download
PXD000202.zip
md5:b1035b1fa4d845d5814876ebadd45602
12.9 MB Download
PXD006877.zip
md5:896d1851482eb8d7a1abf9af390c0e08
353.8 MB Download
PXD007061.zip
md5:2102c3e3f9c100881eac1b75c00b5e0c
485.4 MB Download
PXD009116.zip
md5:8a59472dfc1f6dd9a83a1e949c4adf95
182.7 MB Download
PXD010824.zip
md5:30592a242d73f646a120825fa5c0ef3f
41.3 MB Download
PXD011012.zip
md5:895a893f85ee1dfc1acedb280d979ba1
694.1 MB Download
PXD011015.zip
md5:ad852f45499116cf9d46a5959572f057
14.8 MB Download
PXD011050.zip
md5:4cbd909e8be3bb8324c40426b997f780
57.7 MB Download
PXD011056.zip
md5:a0b57098467a73a1d1eea7acd5ddeace
135.4 MB Download
PXD011218.zip
md5:adbc0b81fed380131fc64ae3469ee10e
165.8 MB Download
PXD013046.zip
md5:b2d4cd26cd780e2cfe6514affac3cc34
525.0 MB Download
PXD014974.zip
md5:890f0a82a1ed3f099e18a0b4014b7738
17.2 MB Download
PXD021827.zip
md5:7b988961f5d1850b6f661450c256f868
148.5 MB Download
PXD021874.zip
md5:448b2166c89082637c2b4489094e992c
183.1 MB Download
381
414
views
downloads
All versions This version
Views 381109
Downloads 414136
Data volume 150.9 GB56.4 GB
Unique views 30294
Unique downloads 21168

Share

Cite as