Published March 16, 2024 | Version 1.0.0
Dataset Open

Data for "A learned score function improves the power of mass spectrometry database search"

  • 1. ROR icon University of Washington

Contributors

Project leader:

  • 1. ROR icon University of Washington

Description

These data files are associated with the following publication:
For the benchmarking data, we used a dataset that is publicly available on ProteomeXchange (PXD028735). The paper that introduced this dataset is:
  • Van Puyvelde, B., Daled, S., Willems, S., Gabriels, R., Gonzalez de Peredo, A., Chaoui, K., Mouton-Barbosa, E., Bouyssié, D., Boonen, K., Hughes, C. J., Gethings, L. A., Perez-Riverol, Y., Bloomfield, N., Tate, S., Schiltz, O., Martens, L., Deforce, D., & Dhaenens, M. (2022). A comprehensive LFQ benchmark dataset on modern day acquisition strategies in proteomics. In Scientific Data (Vol. 9, Issue 1). Springer Science and Business Media LLC. https://doi.org/10.1038/s41597-022-01216-6
More specifically, the following `.raw` files were downloaded:
  • LFQ_Orbitrap_DDA_Ecoli_01.raw
  • LFQ_Orbitrap_DDA_Human_01.raw
  • LFQ_Orbitrap_DDA_Yeast_01.raw
Those files can be accessed via FTP here.
We upload here the annotated .mgf files created from these .raw files, as described in our paper.
The human, yeast, and E. coli .fasta files used in all database searches were downloaded from UniProt on 11/6/23, 4:30 PM.
  • Bateman, A., Martin, M.-J., Orchard, S., Magrane, M., Ahmad, S., Alpi, E., Bowler-Barnett, E. H., Britto, R., Bye-A-Jee, H., Cukura, A., Denny, P., Dogan, T., Ebenezer, T., Fan, J., Garmiri, P., da Costa Gonzales, L. J., Hatton-Ellis, E., Hussein, A., … Zhang, J. (2022). UniProt: the Universal Protein Knowledgebase in 2023. In Nucleic Acids Research (Vol. 51, Issue D1, pp. D523–D531). Oxford University Press (OUP). https://doi.org/10.1093/nar/gkac1052
We include these files here, with only minor modifications to replace U amino acids with X so that all amino acids fall into Casanovo-DB's vocabulary.

Files

Files (6.7 GB)

Name Size Download all
md5:ef8af0867671b043972933dd5caad34e
379.4 MB Download
md5:972ccf93924314b2aa77fc97c81cc026
1.9 MB Download
md5:ce05fb66e576ca7a60d32954b9609e7a
5.0 GB Download
md5:c3b872b099e04a72f0acd3c818932a2d
13.7 MB Download
md5:1ebbc14a45e352896db9faad2212b7ca
1.4 GB Download
md5:586268e8d5afe18773626c332385656e
4.0 MB Download

Additional details

Related works

Is supplement to
Preprint: 10.1101/2024.01.26.577425 (DOI)

Software

Repository URL
https://github.com/Noble-Lab/casanovo/tree/db_search
Programming language
Python
Development Status
Wip