Published June 22, 2023 | Version 1.3.0
Dataset Open

DIPS-Plus: The Enhanced Database of Interacting Protein Structures for Interface Prediction (Supplementary Data)

  • 1. University of Missouri
  • 2. Oak Ridge National Laboratory

Description

This dataset contains supplementary replication data for the paper titled "DIPS-Plus: The Enhanced Database of Interacting Protein Structures for Interface Prediction". In particular, it contains a new version of our `final_raw_dips.tar.gz` protein pair representations which now contain (1) residue-level annotations for intrinsic disorder regions (IDRs) as well as (2) a copy of each protein pair representation in the HDF5 file format for programming language-agnostic read capabilities. In addition, this record also contains (3) raw MSAs (in HDF5 file format) generated for each protein pair using Jackhmmer and AlphaFold's small version of the Big Fantastic Database (BFD). Lastly, this record contains (4) PDB metadata derived for each DIPS-Plus complex using Graphein's PDBManager API as well as (5) structure-based (i.e., FoldSeek-based) training and validation splits of the dataset's complexes in the form of respective text files containing the file paths of complexes assigned to each split.

Notes

The primary DIPS-Plus dataset can be updated periodically using the instructions contained in our GitHub repository for DIPS-Plus (https://github.com/BioinfoMachineLearning/DIPS-Plus). For data provenance, the complexes curated for DIPS-Plus originate from the RCSB's bound protein complex repository (https://ftp.wwpdb.org/pub/pdb/data/biounit/coordinates/divided/).

Files

Files (27.7 GB)

Name Size Download all
md5:a4af4e14162aa88a59c05fef5d562088
15.7 GB Download
md5:518c7f452f64ce138704771a0dac5724
12.0 GB Download

Additional details

Funding

U.S. National Science Foundation
III: Medium: Collaborative Research: Guiding Exploration of Protein Structure Spaces with Deep Learning 1763246
U.S. National Science Foundation
ABI Innovation: Deep learning methods for protein bioinformatics 1759934