Published March 16, 2024
| Version 1.0.0
Dataset
Open
Data for "A learned score function improves the power of mass spectrometry database search"
Description
These data files are associated with the following publication:
- Varun Ananth, Justin Sanders, Melih Yilmaz, Bo Wen, Sewoong Oh and William Stafford Noble. "A learned score function improves the power of mass spectrometry database search". Bioinformatics (Proceedings of the ISMB). 2024.
For the benchmarking data, we used a dataset that is publicly available on ProteomeXchange (PXD028735). The paper that introduced this dataset is:
- Van Puyvelde, B., Daled, S., Willems, S., Gabriels, R., Gonzalez de Peredo, A., Chaoui, K., Mouton-Barbosa, E., Bouyssié, D., Boonen, K., Hughes, C. J., Gethings, L. A., Perez-Riverol, Y., Bloomfield, N., Tate, S., Schiltz, O., Martens, L., Deforce, D., & Dhaenens, M. (2022). A comprehensive LFQ benchmark dataset on modern day acquisition strategies in proteomics. In Scientific Data (Vol. 9, Issue 1). Springer Science and Business Media LLC. https://doi.org/10.1038/s41597-022-01216-6
More specifically, the following `.raw` files were downloaded:
LFQ_Orbitrap_DDA_Ecoli_01.rawLFQ_Orbitrap_DDA_Human_01.rawLFQ_Orbitrap_DDA_Yeast_01.raw
Those files can be accessed via FTP here.
We upload here the annotated
.mgf files created from these .raw files, as described in our paper.The human, yeast, and E. coli .fasta files used in all database searches were downloaded from UniProt on 11/6/23, 4:30 PM.
- Bateman, A., Martin, M.-J., Orchard, S., Magrane, M., Ahmad, S., Alpi, E., Bowler-Barnett, E. H., Britto, R., Bye-A-Jee, H., Cukura, A., Denny, P., Dogan, T., Ebenezer, T., Fan, J., Garmiri, P., da Costa Gonzales, L. J., Hatton-Ellis, E., Hussein, A., … Zhang, J. (2022). UniProt: the Universal Protein Knowledgebase in 2023. In Nucleic Acids Research (Vol. 51, Issue D1, pp. D523–D531). Oxford University Press (OUP). https://doi.org/10.1093/nar/gkac1052
We include these files here, with only minor modifications to replace
U amino acids with X so that all amino acids fall into Casanovo-DB's vocabulary.Files
Files
(6.7 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:ef8af0867671b043972933dd5caad34e
|
379.4 MB | Download |
|
md5:972ccf93924314b2aa77fc97c81cc026
|
1.9 MB | Download |
|
md5:ce05fb66e576ca7a60d32954b9609e7a
|
5.0 GB | Download |
|
md5:c3b872b099e04a72f0acd3c818932a2d
|
13.7 MB | Download |
|
md5:1ebbc14a45e352896db9faad2212b7ca
|
1.4 GB | Download |
|
md5:586268e8d5afe18773626c332385656e
|
4.0 MB | Download |
Additional details
Related works
- Is supplement to
- Preprint: 10.1101/2024.01.26.577425 (DOI)
Software
- Repository URL
- https://github.com/Noble-Lab/casanovo/tree/db_search
- Programming language
- Python
- Development Status
- Wip