DIPS-Plus: The Enhanced Database of Interacting Protein Structures for Interface Prediction
- 1. University of Missouri
- 2. Oak Ridge National Laboratory
Description
This dataset contains replication data for the paper titled "DIPS-Plus: The Enhanced Database of Interacting Protein Structures for Interface Prediction". The dataset consists of pickled Pandas DataFrame files that can be used to train and validate protein interface prediction models. This dataset also contains the externally generated residue-level PSAIA and HH-suite3 features for users' convenience (e.g. raw MSAs and profile HMMs for each protein complex). Our GitHub repository linked in the "Additional notes" metadata section below provides more details on how we parsed through these files to create training and validation datasets. The GitHub repository for DIPS-Plus also includes scripts that can be used to impute missing feature values and convert the final "raw" complexes into DGL-compatible graph objects.
Notes
Files
Files
(44.6 GB)
Name | Size | Download all |
---|---|---|
md5:eba04682c64fae66938742e282960e38
|
42.8 MB | Download |
md5:6ed737072d0f075f036751ff7a2d0d27
|
7.7 GB | Download |
md5:a61aea4af023abd17b6ddd19863c0ffc
|
6.4 kB | Download |
md5:cb1283cf1fb91a586d786a8e2c53053c
|
2.3 MB | Download |
md5:893fa1d932bbb0738f093ba634155d09
|
291.8 MB | Download |
md5:04088a0afca2107c0418868bb4380fb0
|
4.3 GB | Download |
md5:f7f14525ea07aabbadc52af25917e82b
|
4.3 GB | Download |
md5:afe62360640af90b4fc52c4044c84b4c
|
4.3 GB | Download |
md5:f132e558ebebf2d2d2a0765022d4c3f3
|
4.3 GB | Download |
md5:259ceccd4e2397e17712606f5e43f3e0
|
4.3 GB | Download |
md5:a4d8493d22652781225a3af3ef2ae724
|
4.3 GB | Download |
md5:0547b5b72b3912c22f6036f843a05f2a
|
4.3 GB | Download |
md5:072be5754b4c27241e761878a42647dd
|
3.7 GB | Download |
md5:fd17825eafd0bee22daddf1475336929
|
15.4 MB | Download |
md5:2925bba15a1f04b70f437fde982e4717
|
2.8 GB | Download |
Additional details
Related works
- Cites
- 10.7910/DVN/H93ZKK (DOI)
Funding
- III: Medium: Collaborative Research: Guiding Exploration of Protein Structure Spaces with Deep Learning 1763246
- National Science Foundation
- ABI Innovation: Deep learning methods for protein bioinformatics 1759934
- National Science Foundation