Data for RAPPPID: Towards Generalisable Protein Interaction Prediction with AWD-LSTM Twin Networks
Creators
- 1. McGill University, Mila
- 2. McGill University, Mila, Goodman Cancer Research Institute
Description
Data for RAPPPID, a method for the Regularised Automative Prediction of Protein-Protein Interactions using Deep Learning.
These datasets are in a format that RAPPPID is ready to read.
Comparatives Dataset
These datasets were derived from the STRING v11 H. sapiens dataset, according to the C1, C2, and C3 procedures outlined by Park and Marcotte, 2012. Negative samples are sampled randomly from the space of proteins not known to interact. See Szymborski & Emad for details.
Repeatability Datasets
The following datasets are all derived from STRING in the manner as the comparatives dataset, but three different random seeds are used for drawing proteins.
References
Park,Y. and Marcotte,E.M. (2012) Flaws in evaluation schemes for pair-input computational predictions. Nat Methods, 9, 1134–1136.
Szklarczyk, D., Gable, A. L., Lyon, D., Junge, A., Wyder, S., Huerta-Cepas, J., Simonovic, M., Doncheva, N. T., Morris, J. H., Bork, P., Jensen, L. J., and Mering, C. (2019). String v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research, 47(D1), D607–D613.
Szymborski,J. and Emad,A. (2021) RAPPPID: Towards Generalisable Protein Interaction Prediction with AWD-LSTM Twin Networks. bioRxiv https://doi.org/10.1101/2021.08.13.456309
Files
rapppid_data.zip
Files
(62.7 MB)
Name | Size | Download all |
---|---|---|
md5:dbde8ac07a195285aae9f0e0d34522de
|
62.7 MB | Preview Download |
Additional details
Related works
- Is derived from
- Journal article: 10.1093/nar/gky1131 (DOI)
- References
- Journal article: 10.1038/nmeth.2259 (DOI)