There is a newer version of the record available.

Published June 7, 2021 | Version 1.0.0
Dataset Open

DIPS-Plus: The Enhanced Database of Interacting Protein Structures for Interface Prediction

  • 1. University of Missouri
  • 2. Oak Ridge National Laboratory

Description

This dataset contains replication data for the paper titled "DIPS-Plus: The Enhanced Database of Interacting Protein Structures for Interface Prediction". The dataset consists of pickled Pandas DataFrame files that can be used to train and validate protein interface prediction models. This dataset also contains the externally generated residue-level PSAIA and HH-suite3 features for users' convenience (e.g. raw MSAs and profile HMMs for each protein complex). Our GitHub repository linked in the "Additional notes" metadata section below provides more details on how we parsed through these files to create training and validation datasets. The GitHub repository for DIPS-Plus also includes scripts that can be used to impute missing feature values and convert the final "raw" complexes into DGL-compatible graph objects.

Notes

This dataset can be updated periodically using the instructions contained in our GitHub repository for DIPS-Plus (https://github.com/amorehead/DIPS-Plus). For data provenance, the complexes curated for DIPS-Plus originate from the RCSB's bound protein complex repository (https://ftp.wwpdb.org/pub/pdb/data/biounit/coordinates/divided/).

Files

Files (44.6 GB)

Name Size Download all
md5:eba04682c64fae66938742e282960e38
42.8 MB Download
md5:6ed737072d0f075f036751ff7a2d0d27
7.7 GB Download
md5:a61aea4af023abd17b6ddd19863c0ffc
6.4 kB Download
md5:cb1283cf1fb91a586d786a8e2c53053c
2.3 MB Download
md5:893fa1d932bbb0738f093ba634155d09
291.8 MB Download
md5:04088a0afca2107c0418868bb4380fb0
4.3 GB Download
md5:f7f14525ea07aabbadc52af25917e82b
4.3 GB Download
md5:afe62360640af90b4fc52c4044c84b4c
4.3 GB Download
md5:f132e558ebebf2d2d2a0765022d4c3f3
4.3 GB Download
md5:259ceccd4e2397e17712606f5e43f3e0
4.3 GB Download
md5:a4d8493d22652781225a3af3ef2ae724
4.3 GB Download
md5:0547b5b72b3912c22f6036f843a05f2a
4.3 GB Download
md5:072be5754b4c27241e761878a42647dd
3.7 GB Download
md5:fd17825eafd0bee22daddf1475336929
15.4 MB Download
md5:2925bba15a1f04b70f437fde982e4717
2.8 GB Download

Additional details

Related works

Funding

III: Medium: Collaborative Research: Guiding Exploration of Protein Structure Spaces with Deep Learning 1763246
National Science Foundation
ABI Innovation: Deep learning methods for protein bioinformatics 1759934
National Science Foundation