Published April 11, 2023 | Version v1.0.0
Dataset Open

Multiscale Interactome Data -- Revised

Authors/Creators

  • 1. Columbia University Irving Medical Center

Description

Original GitHub Repository: https://github.com/snap-stanford/multiscale-interactome

Forked GitHub Repository: https://github.com/callahantiff/multiscale-interactome/tree/development

 

 

This repository stores a revised version of the original data that was used in the publication titled: Identification of disease treatment mechanisms through the multiscale interactome. As described in the original GitHub's Readme, the original data can be downloaded directly from: http://snap.stanford.edu/multiscale-interactome/data/data.tar.gz

 

Description of Original Data

Drug-Protein (n=8,568):

  • Source(s):
  • Processing: map Uniprot to Entrez gene using HUGO (October 2018) and drug ids to DrugBank ids
  • Filtering: Filter proteins to only keep those that appear in the Protein-Protein edge set.

Disease-Protein (n=25,212):

  • Source(s): DisGeNet (March 2018)
  • Filtering: only keep only expert curated gene-disease associations. (1) exclude disease-gene relationships that are inferred, based on orthology, animal models, or literature mining; (2) remove therapeutic disease-gene associations; and (3) remove disease-gene relationships that do not appear in the Protein-Protein edge set.

Protein-Protein (n=387,626):

  • Source(s):
  • Processing: Map protein ids to Entrez gene ids using HUGO (sources 1-2 only)
  • Filtering: only human proteins with physical interactions and direct experimental evidence (no genetic/indirect)

Protein-Biological Process (n=34,777): Source(s): Gene Ontology (human; February 2018)

  • Processing: use master ids provided by GOATOOLS (v0.8.4)
  • Filtering: only allow: EXP, IDA, IMP, IGI, HTP, HDA, HMP, HGI. Exclude any protein-biological functions inferred from: physical interactions, gene expression patterns, phylogenetically inferred annotations or computational analyses, automatic annotations (i.e., based on author statements, curator inference, electronic annotation), and those with no biological data

Biological Process-Biological Process (n=22,545):

  • Source(s): Gene Ontology (human; February 2018) + Gene Ontology Plus (human version; July 2020)
  • Filtering: Allow following relationship types: regulates, positively regulates, negatively regulates, part of, is a. Only consider BPs associated with at least one drug target or disease protein (directly or through children)

 

 

⚠️ Updates to Original Implementation ⚠️

Modifications to Original Data and Code

  • Ensured every entry had a valid identifier and label
  • Reconciled duplicate gene entries (i.e., gene identifiers that had been merged)
  • Changed genes are listed in: resources/data/updated_gene_identifiers.xlsx

Files

data.zip

Files (565.3 MB)

Name Size Download all
md5:ec79418a9c4c0c5fe740f71e7cc3af14
565.3 MB Preview Download

Additional details