Multiscale Interactome Data -- Revised
Description
Original GitHub Repository: https://github.com/snap-stanford/multiscale-interactome
Forked GitHub Repository: https://github.com/callahantiff/multiscale-interactome/tree/development
This repository stores a revised version of the original data that was used in the publication titled: Identification of disease treatment mechanisms through the multiscale interactome. As described in the original GitHub's Readme, the original data can be downloaded directly from: http://snap.stanford.edu/multiscale-interactome/data/data.tar.gz.
Description of Original Data
Drug-Protein (n=8,568):
- Source(s):
- DrugBank (
v5.1.1; 2018;drugbank_approved_target_uniprot_links.csv) - Drug Repurposing Hub (
September 2018)
- DrugBank (
- Processing: map Uniprot to Entrez gene using HUGO (October 2018) and drug ids to DrugBank ids
- Filtering: Filter proteins to only keep those that appear in the Protein-Protein edge set.
Disease-Protein (n=25,212):
- Source(s): DisGeNet (
March 2018) - Filtering: only keep only expert curated gene-disease associations. (1) exclude disease-gene relationships that are inferred, based on orthology, animal models, or literature mining; (2) remove therapeutic disease-gene associations; and (3) remove disease-gene relationships that do not appear in the Protein-Protein edge set.
Protein-Protein (n=387,626):
- Source(s):
- BioGRID (
v3.5.178;November 2019;BIOGRID-ORGANISM-Homo_sapiens-3.5.178.tab) - Database of Interacting Proteins (
February 2017;Hsapi20170205.txt). Include all experimental methods - Human Reference Protein Interactome Mapping Project. Four networks derived from high-throughput yeast two hybrid assays.
- Menche 2015 (PMID:25700523). Compiles different types of physical protein-protein interactions.
- BioGRID (
- Processing: Map protein ids to Entrez gene ids using HUGO (sources 1-2 only)
- Filtering: only human proteins with physical interactions and direct experimental evidence (no genetic/indirect)
Protein-Biological Process (n=34,777): Source(s): Gene Ontology (human; February 2018)
- Processing: use master ids provided by GOATOOLS (
v0.8.4) - Filtering: only allow: EXP, IDA, IMP, IGI, HTP, HDA, HMP, HGI. Exclude any protein-biological functions inferred from: physical interactions, gene expression patterns, phylogenetically inferred annotations or computational analyses, automatic annotations (i.e., based on author statements, curator inference, electronic annotation), and those with no biological data
Biological Process-Biological Process (n=22,545):
- Source(s): Gene Ontology (human;
February 2018) + Gene Ontology Plus (human version;July 2020) - Filtering: Allow following relationship types: regulates, positively regulates, negatively regulates, part of, is a. Only consider BPs associated with at least one drug target or disease protein (directly or through children)
⚠️ Updates to Original Implementation ⚠️
Modifications to Original Data and Code
- Ensured every entry had a valid identifier and label
- Reconciled duplicate gene entries (i.e., gene identifiers that had been merged)
- Changed genes are listed in:
resources/data/updated_gene_identifiers.xlsx
Files
data.zip
Files
(565.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:ec79418a9c4c0c5fe740f71e7cc3af14
|
565.3 MB | Preview Download |
Additional details
Related works
- Is referenced by
- Software: https://github.com/callahantiff/multiscale-interactome/tree/development (URL)