DIPS-Plus: The Enhanced Database of Interacting Protein Structures for Interface Prediction
- 1. University of Missouri
- 2. Oak Ridge National Laboratory
Description
This dataset contains replication data for the paper titled "DIPS-Plus: The Enhanced Database of Interacting Protein Structures for Interface Prediction". The dataset consists of pickled Pandas DataFrame files, along with training, validation, and (for DB5-Plus) test filename lists for cross-validation, that can be used to develop and evaluate protein interface prediction models. This dataset also contains the externally generated residue-level PSAIA and HH-suite3 features for users' convenience (e.g. raw MSAs and profile HMMs for each protein complex). Our GitHub repository linked in the "Additional notes" metadata section below provides more details on how we parsed through these files to create our cross-validation datasets. The GitHub repository for DIPS-Plus also includes scripts that can be used to impute missing feature values and convert the final "raw" complexes into DGL-compatible graph objects. Since our final DGL graph representation for each complex uses PyTorch tensors in its construction of residue embeddings, the final representation of each complex can easily be adapted to fit the users' needs (e.g. feeding a complex's 2D residue feature tensors into a convolutional neural network).
Notes
Files
Files
(44.6 GB)
Name | Size | Download all |
---|---|---|
md5:9cbc07672e705f9ba8549168b06e1e06
|
42.8 MB | Download |
md5:775576dbb27cbe0127419dbdaa6c36d1
|
7.7 GB | Download |
md5:a61aea4af023abd17b6ddd19863c0ffc
|
6.4 kB | Download |
md5:cb1283cf1fb91a586d786a8e2c53053c
|
2.3 MB | Download |
md5:893fa1d932bbb0738f093ba634155d09
|
291.8 MB | Download |
md5:04088a0afca2107c0418868bb4380fb0
|
4.3 GB | Download |
md5:f7f14525ea07aabbadc52af25917e82b
|
4.3 GB | Download |
md5:afe62360640af90b4fc52c4044c84b4c
|
4.3 GB | Download |
md5:f132e558ebebf2d2d2a0765022d4c3f3
|
4.3 GB | Download |
md5:259ceccd4e2397e17712606f5e43f3e0
|
4.3 GB | Download |
md5:a4d8493d22652781225a3af3ef2ae724
|
4.3 GB | Download |
md5:0547b5b72b3912c22f6036f843a05f2a
|
4.3 GB | Download |
md5:072be5754b4c27241e761878a42647dd
|
3.7 GB | Download |
md5:fd17825eafd0bee22daddf1475336929
|
15.4 MB | Download |
md5:2925bba15a1f04b70f437fde982e4717
|
2.8 GB | Download |
Additional details
Related works
- Cites
- 10.7910/DVN/H93ZKK (DOI)
Funding
- U.S. National Science Foundation
- III: Medium: Collaborative Research: Guiding Exploration of Protein Structure Spaces with Deep Learning 1763246
- U.S. National Science Foundation
- ABI Innovation: Deep learning methods for protein bioinformatics 1759934