There is a newer version of the record available.

Published June 22, 2023 | Version 1.2.0
Dataset Open

DIPS-Plus: The Enhanced Database of Interacting Protein Structures for Interface Prediction (Supplementary Data)

  • 1. University of Missouri
  • 2. Oak Ridge National Laboratory

Description

This dataset contains supplementary replication data for the paper titled "DIPS-Plus: The Enhanced Database of Interacting Protein Structures for Interface Prediction". In particular, it contains a new version of our `final_raw_dips.tar.gz` protein pair representations which now contain (1) residue-level annotations for intrinsic disorder regions (IDRs) as well as (2) a copy of each protein pair representation in the HDF5 file format for programming language-agnostic read capabilities. In addition, this record also contains raw MSAs (in HDF5 file format) generated for each protein pair using Jackhmmer and AlphaFold's small version of the Big Fantastic Database (BFD).

Notes

The primary DIPS-Plus dataset can be updated periodically using the instructions contained in our GitHub repository for DIPS-Plus (https://github.com/BioinfoMachineLearning/DIPS-Plus). For data provenance, the complexes curated for DIPS-Plus originate from the RCSB's bound protein complex repository (https://ftp.wwpdb.org/pub/pdb/data/biounit/coordinates/divided/).

Files

Files (27.6 GB)

Name Size Download all
md5:9a4d7d53faadd79b246cb4b2ebdca8c6
15.6 GB Download
md5:518c7f452f64ce138704771a0dac5724
12.0 GB Download

Additional details

Funding

U.S. National Science Foundation
III: Medium: Collaborative Research: Guiding Exploration of Protein Structure Spaces with Deep Learning 1763246
U.S. National Science Foundation
ABI Innovation: Deep learning methods for protein bioinformatics 1759934