OncoEnrichR report - cMYC_BioID

Project background

Project owner/contact: Raught et al.
Project description

Query verification

A total of n = 134 target identifiers were provided (type: symbol, option ignore_id_err = TRUE)
All query identifiers have been mapped towards identifiers for known human genes (including non-ambiguous aliases), and valid/invalid entries in the query set are indicated as follows:
- Invalid identifier : n = 0
- Valid identifier (mapped as alias) : n = 4
- Valid identifier : n = 130

Disease associations

Each protein in the query set is annotated with:
- Known associations to cancer phenotypes (ontology terms) established from multiple sources through the Open Targets Platform
- Status with respect to roles as tumor suppressors/oncogenes, fetched from the CancerMine literature mining resource and the Network of Cancer Genes

Query set - cancer association rank

Query set genes are ranked according to their overall strength of association to cancer phenotype ontology terms, visualized in varying shades of blue. Specifically, ranking is based on the sum of mean association scores pr. tumor type/tissue, and scaled as percent rank within the query set (column targetset_cancer_prank)

Target

Tumor suppressor

FALSE

TRUE

Proto-oncogene

FALSE

TRUE

Associated cancer types

Associated diseases (non-cancer)

Query set - association strength pr. tumor type

Top cancer-associated genes (maximum 100) in the query set are shown with their specific tumor-type association strengths (percent rank)

Cancer hallmark evidence

Each gene in the query set is annotated with cancer hallmarks evidence (Hanahan & Weinberg, Cell, 2011), indicating genes associated with essential alterations in cell physiology that can dictate malignant growth.
Data has been collected from the Open Targets Platform, and we list evidence for each hallmark per gene, indicated as either being promoted , or suppressed

Cancer hallmark

Poorly characterized genes

The aim of this section is to highlight poorly characterized genes or genes with unknown function in the query set
A set of uncharacterized/poorly characterized human protein-coding genes (n = 1128) have been established based on
1. Genes specifically designated as uncharacterized or as open reading frames
2. Missing gene function summary in NCBI Gene AND function summary in UniProt Knowledgebase
3. Missing or limited (<= 2) gene ontology (GO) annotations with respect to molecular function (MF) and biological process (BP)
  - Ontology annotations attributed with an electronic annotation evidence code (IEA) are not considered in this calculation (less reliable due to lack of manually review)
Query genes found within the set of poorly characterized genes are listed below, colored in varying shades of red according to the level of missing characterization (from unknown function to poorly defined function )

Target

Has gene summary

FALSE

Number of annotated GO terms (non-IEA)

Drug associations

Each protein/protein in the query set is annotated with:
- Targeted cancer drugs (inhibitors/antagonists), as found through the Open Targets Platform
- We distinguish between drugs in early clinical development/phase (ep), and drugs already in late clinical development/phase (lp)

Target tractabilities

Each gene/protein in the query set is annotated with target tractability information (aka druggability) towards small molecules/compounds and antibodies
Query genes are colored in varying shades of purple (from unknown tractability to clinical precedence )

Small molecules/compounds

Antibodies

Protein complexes

Here we show how members of the query set that are involved in known protein complexes, using two different collections of protein complex annotations:
1. OmniPath - a meta-database of molecular biology prior knowledge, containing protein complex annotations predominantly from CORUM, ComplexPortal, Compleat, and PDB.
  - We limit complex annotations to those that are supported by references to the scientific literature (i.e. manually curated)
2. Human Protein Complex Map - hu.MAP v2.0 - created through an integration of > 15,000 proteomics experiments (biochemical fractionation data, proximity labeling data, and RNA hairpin pulldown data)
  - Each complex comes with a confidence score from clustering (1=Extremely High, 2=Very High, 3=High, 4=Medium High, 5=Medium)
The protein complexes that overlap with members of the query set are ranked according to the total number of participating members in the query set

OmniPath

hu.MAP v2.0

Function and pathway enrichment

The query set is analyzed with clusterProfiler for functional enrichment/overrepresentation with respect to:
- Gene Ontology terms. All three subontologies: Molecular Function (GO_MF), Cellular Component (GO_CC) & Biological Process (GO_BP)
- Molecular signalling networks from KEGG
- Cellular pathways from Reactome, and other curated gene signature sets from the Molecular Signatures Database (MSiGDB)
- WikiPathways
- Manually curated signal transduction pathways from NetPath

Enrichment/overrepresentation test settings (clusterProfiler)
- P-value cutoff: 0.05
- Q-value cutoff: 0.2
- Correction for multiple testing: BH
- Minimal size of genes annotated by term for testing: 10
- Maximal size of genes annotated by term for testing: 500
- Background gene set description: All protein-coding genes
- Background gene set size: 19680
- Remove redundancy of enriched GO terms: TRUE

Enrichment tables

Gene Ontology

Enrichment

Ontology

Molecular Signatures Database (MSigDB)

Enrichment

Signature collection

KEGG pathways

Enrichment

WikiPathways

No pathway signatures from WikiPathways were enriched in the query set.

NetPath

Enrichment

GO enrichment plots

All subontologies

Molecular function

Biological Process

Cellular Component

Regulatory interactions

Using data from the OmniPath/DoRothEA gene set resource, we are here interrogating previously established transcription factor (TF) - target interactions for members of the query set. TF-target interactions in DoRothEA have been established according to different lines of evidence, i.e.
1. literature-curated resources
2. ChIP-seq peaks
3. TF binding site motifs
4. gene expression-inferred interactions.
In DoRothEA, each interaction is assigned a confidence level based on the amount of supporting evidence, ranging from A (highest confidence) to D (lowest confidence):
- A - Supported by all four lines of evidence, manually curated by experts in specific reviews, or supported both in at least two curated resources are considered to be highly reliable
- B-D - Curated and/or ChIP-seq interactions with different levels of additional evidence
- E - Used for interactions that are uniquely supported by computational predictions (not included in oncoEnrichR)
Here, we show regulatory interactions related to the queryset along three different axes:
1. interactions for which both regulatory gene and regulatory target are found in the queryset
2. interactions for which only the regulatory gene is found in the queryset
3. interactions for which only the regulatory target is found in the queryset

We interrogate interactions in the query set for two separate collections of regulatory interactions in DoRothEA:

regulatory interactions inferred with gene expression from GTex (global set),
regulatory interactions inferred with gene expression from TCGA (cancer-focused set)