Published June 23, 2022 | Version 1
Dataset Open

Data for RAPPPID: Towards Generalisable Protein Interaction Prediction with AWD-LSTM Twin Networks

  • 1. McGill University, Mila
  • 2. McGill University, Mila, Goodman Cancer Research Institute

Description

Data for RAPPPID, a method for the Regularised Automative Prediction of Protein-Protein Interactions using Deep Learning.

These datasets are in a format that RAPPPID is ready to read.

Comparatives Dataset
These datasets were derived from the STRING v11 H. sapiens dataset, according to the C1, C2, and C3 procedures outlined by Park and Marcotte, 2012. Negative samples are sampled randomly from the space of proteins not known to interact. See Szymborski & Emad for details.

Repeatability Datasets
The following datasets are all derived from STRING in the manner as the comparatives dataset, but three different random seeds are used for drawing proteins.

References
Park,Y. and Marcotte,E.M. (2012) Flaws in evaluation schemes for pair-input computational predictions. Nat Methods, 9, 1134–1136.

Szklarczyk, D., Gable, A. L., Lyon, D., Junge, A., Wyder, S., Huerta-Cepas, J., Simonovic, M., Doncheva, N. T., Morris, J. H., Bork, P., Jensen, L. J., and Mering, C. (2019). String v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research, 47(D1), D607–D613.

Szymborski,J. and Emad,A. (2021) RAPPPID: Towards Generalisable Protein Interaction Prediction with AWD-LSTM Twin Networks. bioRxiv https://doi.org/10.1101/2021.08.13.456309

Files

rapppid_data.zip

Files (62.7 MB)

Name Size Download all
md5:dbde8ac07a195285aae9f0e0d34522de
62.7 MB Preview Download

Additional details

Related works

Is derived from
Journal article: 10.1093/nar/gky1131 (DOI)
References
Journal article: 10.1038/nmeth.2259 (DOI)