DIPS-Plus: The Enhanced Database of Interacting Protein Structures for Interface Prediction (Supplementary Data)
- 1. University of Missouri
- 2. Oak Ridge National Laboratory
Description
This dataset contains supplementary replication data for the paper titled "DIPS-Plus: The Enhanced Database of Interacting Protein Structures for Interface Prediction". In particular, it contains a new version of our `final_raw_dips.tar.gz` protein pair representations which now contain (1) residue-level annotations for intrinsic disorder regions (IDRs) as well as (2) a copy of each protein pair representation in the HDF5 file format for programming language-agnostic read capabilities. In addition, this record also contains (3) raw MSAs (in HDF5 file format) generated for each protein pair using Jackhmmer and AlphaFold's small version of the Big Fantastic Database (BFD). Lastly, this record contains (4) PDB metadata derived for each DIPS-Plus complex using Graphein's PDBManager API as well as (5) structure-based (i.e., FoldSeek-based) training and validation splits of the dataset's complexes in the form of respective text files containing the file paths of complexes assigned to each split.
Notes
Files
Files
(27.7 GB)
Name | Size | Download all |
---|---|---|
md5:a4af4e14162aa88a59c05fef5d562088
|
15.7 GB | Download |
md5:518c7f452f64ce138704771a0dac5724
|
12.0 GB | Download |
Additional details
Funding
- U.S. National Science Foundation
- III: Medium: Collaborative Research: Guiding Exploration of Protein Structure Spaces with Deep Learning 1763246
- U.S. National Science Foundation
- ABI Innovation: Deep learning methods for protein bioinformatics 1759934