Metascape Gene List Analysis Report

metascape.org1

Bar Graph Summary

Figure 1. Bar graph of enriched terms across input gene lists, colored by p-values.
Metascape only visualizes the top 20 clusters. Up to 100 enriched clusters can be viewed here.
The top-level Gene Ontology biological processes can be viewed here.

Gene Lists

User-provided gene identifiers are first converted into their corresponding H. sapiens Entrez gene IDs using the latest version of the database (last updated on 2024-09-01). If multiple identifiers correspond to the same Entrez gene ID, they will be considered as a single Entrez gene ID in downstream analyses. The gene lists are summarized in Table 1.

Table 1. Statistics of input gene lists.
Name Total Unique
MyList 154 152

Gene Annotation

The following are the list of annotations retrieved from the latest version of the database (last updated on 2024-09-01) (Table 2).

Table 2. Gene annotations extracted
Name Type Description
Gene Symbol Description Primary HUGO gene symbol.
Description Description Short description.
Biological Process (GO) Function/Location Descriptions summarized based on gene ontology database, where up to three most informative GO terms are kept.
Kinase Class (UniProt) Function/Location Detailed kinase classes.
Protein Function (Protein Atlas) Function/Location Protein Function (Protein Atlas)
Subcellular Location (Protein Atlas) Function/Location Subcellular Location (Protein Atlas)
Drug (DrugBank) Genotype/Phenotype/Disease Drug information for the given gene as target.
Protein Functions (ChatGPT) Description Uncurated gene functions described by ChatGPT.
Disease & Drugs (ChatGPT) Genotype/Phenotype/Disease Uncurated disease and drug associations described by ChatGPT.
Canonical Pathways Ontology Canonical Pathways
Hallmark Gene Sets Ontology Hallmark Gene Sets

Pathway and Process Enrichment Analysis

For each given gene list, pathway and process enrichment analysis have been carried out with the following ontology sources: KEGG Pathway, GO Biological Processes, Reactome Gene Sets, Canonical Pathways, CORUM, and WikiPathways. All genes in the genome have been used as the enrichment background. Terms with a p-value < 0.01, a minimum count of 3, and an enrichment factor > 1.5 (the enrichment factor is the ratio between the observed counts and the counts expected by chance) are collected and grouped into clusters based on their membership similarities. More specifically, p-values are calculated based on the cumulative hypergeometric distribution2, and q-values are calculated using the Benjamini-Hochberg procedure to account for multiple testings3. Kappa scores4 are used as the similarity metric when performing hierarchical clustering on the enriched terms, and sub-trees with a similarity of > 0.3 are considered a cluster. The most statistically significant term within a cluster is chosen to represent the cluster.

Table 3. Top 20 clusters with their representative enriched terms (one per cluster). "Count" is the number of genes in the user-provided lists with membership in the given ontology term. "%" is the percentage of all of the user-provided genes that are found in the given ontology term (only input genes with at least one ontology term annotation are included in the calculation). "Log10(P)" is the p-value in log base 10. "Log10(q)" is the multi-test adjusted p-value in log base 10.
GO Category Description Count % Log10(P) Log10(q)
GO:0030036 GO Biological Processes actin cytoskeleton organization 18 11.84 -9.41 -5.06
R-HSA-9012999 Reactome Gene Sets RHO GTPase cycle 16 10.53 -8.96 -4.91
GO:0030865 GO Biological Processes cortical cytoskeleton organization 6 3.95 -6.67 -3.23
GO:0051056 GO Biological Processes regulation of small GTPase mediated signal transduction 11 7.24 -6.37 -3.03
GO:1990049 GO Biological Processes retrograde neuronal dense core vesicle transport 3 1.97 -6.31 -3.02
GO:0030903 GO Biological Processes notochord development 4 2.63 -5.65 -2.49
GO:0035239 GO Biological Processes tube morphogenesis 14 9.21 -4.99 -1.95
GO:0031175 GO Biological Processes neuron projection development 14 9.21 -4.95 -1.95
GO:0051493 GO Biological Processes regulation of cytoskeleton organization 12 7.89 -4.85 -1.87
GO:0021782 GO Biological Processes glial cell development 6 3.95 -4.33 -1.43
GO:0008285 GO Biological Processes negative regulation of cell population proliferation 14 9.21 -4.33 -1.43
GO:0098657 GO Biological Processes import into cell 13 8.55 -4.32 -1.43
GO:0001667 GO Biological Processes ameboidal-type cell migration 7 4.61 -4.17 -1.31
GO:2000147 GO Biological Processes positive regulation of cell motility 12 7.89 -4.12 -1.30
R-HSA-1483249 Reactome Gene Sets Inositol phosphate metabolism 4 2.63 -3.93 -1.19
GO:0003333 GO Biological Processes amino acid transmembrane transport 5 3.29 -3.88 -1.17
GO:0003158 GO Biological Processes endothelium development 5 3.29 -3.78 -1.11
GO:0042982 GO Biological Processes amyloid precursor protein metabolic process 3 1.97 -3.69 -1.07
GO:0048660 GO Biological Processes regulation of smooth muscle cell proliferation 6 3.95 -3.61 -1.03
GO:1903076 GO Biological Processes regulation of protein localization to plasma membrane 5 3.29 -3.61 -1.03

To further capture the relationships between the terms, a subset of enriched terms has been selected and rendered as a network plot, where terms with a similarity > 0.3 are connected by edges. We select the terms with the best p-values from each of the 20 clusters, with the constraint that there are no more than 15 terms per cluster and no more than 250 terms in total. The network is visualized using Cytoscape5, where each node represents an enriched term and is colored first by its cluster ID (Figure 2.a) and then by its p-value (Figure 2.b). These networks can be interactively viewed in Cytoscape through the .cys files (contained in the Zip package, which also contains a publication-quality version as a PDF) or within a browser by clicking on the web icon. For clarity, term labels are only shown for one term per cluster, so it is recommended to use Cytoscape or a browser to visualize the network in order to inspect all node labels. We can also export the network into a PDF file within Cytoscape, and then edit the labels using Adobe Illustrator for publication purposes. To switch off all labels, delete the "Label" mapping under the "Style" tab within Cytoscape, and then export the network view.

Figure 2. Network of enriched terms: (a) colored by cluster ID, where nodes that share the same cluster ID are typically close to each other; (b) colored by p-value, where terms containing more genes tend to have a more significant p-value.

Protein-protein Interaction Enrichment Analysis

For each given gene list, protein-protein interaction enrichment analysis has been carried out with the following databases: STRING6, BioGrid7, OmniPath8, InWeb_IM9.Only physical interactions in STRING (physical score > 0.132) and BioGrid are used (details). The resultant network contains the subset of proteins that form physical interactions with at least one other member in the list. If the network contains between 3 and 500 proteins, the Molecular Complex Detection (MCODE) algorithm10 has been applied to identify densely connected network components. The MCODE networks identified for individual gene lists have been gathered and are shown in Figure 3.

Pathway and process enrichment analysis has been applied to each MCODE component independently, and the three best-scoring terms by p-value have been retained as the functional description of the corresponding components, shown in the tables underneath corresponding network plots within Figure 3.

Figure 3. Protein-protein interaction network and MCODE components identified in the gene lists.
GO Description Log10(P)
R-HSA-194315 Signaling by Rho GTPases -10.8
R-HSA-9716542 Signaling by Rho GTPases, Miro GTPases and RHOBTB3 -10.7
R-HSA-9012999 RHO GTPase cycle -10.4
Color MCODE GO Description Log10(P)
MCODE_2 GO:1990049 retrograde neuronal dense core vesicle transport -11.1
MCODE_2 GO:0030705 cytoskeleton-dependent intracellular transport -11.0
MCODE_2 GO:0047496 vesicle transport along microtubule -10.8
MCODE_3 R-HSA-9013106 RHOC GTPase cycle -13.1
MCODE_3 R-HSA-8980692 RHOA GTPase cycle -11.6
MCODE_3 R-HSA-9013026 RHOB GTPase cycle -9.9

Quality Control and Association Analysis

Gene list enrichments are identified in the following ontology categories: COVID, Cell_Type_Signatures, DisGeNET, PaGenBase, Transcription_Factor_Targets. All genes in the genome have been used as the enrichment background. Terms with a p-value < 0.01, a minimum count of 3, and an enrichment factor > 1.5 (the enrichment factor is the ratio between the observed counts and the counts expected by chance) are collected and grouped into clusters based on their membership similarities. The top few enriched clusters (one term per cluster) are shown in the Figure 4-8. The algorithm used here is the same as that is used for pathway and process enrichment analysis.

Figure 4. Summary of enrichment analysis in COVID11.


GO Description Count % Log10(P) Log10(q)
COVID018 RNA_Blanco-Melo_Lung_Up 8 5.30 -8.00 -4.60
COVID059 Phosphoproteome_Bouhaddou_Vero_E6_24h_Down 12 7.90 -7.40 -4.10
COVID065 Phosphoproteome_Bouhaddou_Vero_E6_8h_Down 12 7.90 -7.40 -4.10
COVID243 RNA_Riva_Vero-E6_24h_Up 11 7.20 -6.40 -3.30
COVID036 RNA_Sun_Calu-3_0h_Up 10 6.60 -5.50 -2.40
COVID061 Phosphoproteome_Bouhaddou_Vero_E6_2h_Down 10 6.60 -5.50 -2.40
COVID063 Phosphoproteome_Bouhaddou_Vero_E6_4h_Down 8 5.30 -5.10 -2.10
COVID009 RNA_Blanco-Melo_A549-ACE2_Down 8 5.30 -5.00 -2.00
COVID005 RNA_Appelberg_Huh-7_72h_Down 9 5.90 -4.70 -1.80
COVID034 RNA_Liao_BALF-severe-vs-mild_Up 9 5.90 -4.70 -1.80
COVID338 RNA_Wilk_B-cells_patient-C1B-severe_Down 5 3.30 -4.40 -1.50
COVID007 RNA_Blanco-Melo_A549_Down 8 5.30 -3.80 -1.20
COVID057 Phosphoproteome_Bouhaddou_Vero_E6_12h_Down 8 5.30 -3.80 -1.20
COVID373 Interactome_Laurent_HEK293_24h_NSP2 7 4.60 -3.80 -1.20
COVID202 RNA_Vanderheiden_pHAE_48h_Down 6 3.90 -3.40 -0.92
COVID055 Phosphoproteome_Bouhaddou_Vero_E6_0h_Down 7 4.60 -3.30 -0.87
COVID023 RNA_Keller_B-cells-Infected_CD21pos_Down 7 4.60 -3.30 -0.86
COVID058 Phosphoproteome_Bouhaddou_Vero_E6_12h_Up 7 4.60 -3.10 -0.76
COVID062 Phosphoproteome_Bouhaddou_Vero_E6_2h_Up 7 4.60 -3.10 -0.76
COVID234 Phosphoproteome_Klann_Caco-2_24h_Down 7 4.60 -3.10 -0.76
Figure 5. Summary of enrichment analysis in Cell Type Signatures12.


GO Description Count % Log10(P) Log10(q)
M39037 FAN EMBRYONIC CTX OLIG 28 18.00 -16.00 -11.00
M39175 MURARO PANCREAS MESENCHYMAL STROMAL CELL 26 17.00 -15.00 -11.00
M39050 MANNO MIDBRAIN NEUROTYPES HPERIC 27 18.00 -14.00 -10.00
M39054 MANNO MIDBRAIN NEUROTYPES HRGL2B 20 13.00 -13.00 -9.10
M39264 HU FETAL RETINA FIBROBLAST 18 12.00 -12.00 -8.20
M39055 MANNO MIDBRAIN NEUROTYPES HRGL2A 21 14.00 -11.00 -7.80
M40158 DESCARTES FETAL CEREBELLUM VASCULAR ENDOTHELIAL CELLS 20 13.00 -10.00 -6.50
M39040 FAN EMBRYONIC CTX BRAIN ENDOTHELIAL 2 14 9.20 -9.30 -5.70
M39018 FAN EMBRYONIC CTX BIG GROUPS BRAIN ENDOTHELIAL 14 9.20 -8.30 -4.70
M39167 GAO LARGE INTESTINE ADULT CJ IMMUNE CELLS 16 11.00 -8.20 -4.70
M39128 AIZARANI LIVER C29 MVECS 2 13 8.60 -8.10 -4.70
M39279 DURANTE ADULT OLFACTORY NEUROEPITHELIUM VASCULAR SMOOTH MUSCLE CELLS 8 5.30 -7.60 -4.20
M39056 MANNO MIDBRAIN NEUROTYPES HRGL3 16 11.00 -7.50 -4.10
M39039 FAN EMBRYONIC CTX BRAIN ENDOTHELIAL 1 14 9.20 -7.30 -4.00
M39053 MANNO MIDBRAIN NEUROTYPES HRGL2C 12 7.90 -7.00 -3.70
M41666 TRAVAGLINI LUNG CAPILLARY INTERMEDIATE 1 CELL 9 5.90 -7.00 -3.70
M40167 DESCARTES FETAL CEREBRUM VASCULAR ENDOTHELIAL CELLS 15 9.90 -6.60 -3.30
M39176 MURARO PANCREAS ENDOTHELIAL CELL 12 7.90 -6.50 -3.30
M41716 FAN OVARY CL14 MATURE SMOOTH MUSCLE CELL 10 6.60 -5.40 -2.40
M39074 ZHONG PFC MAJOR TYPES ASTROCYTES 10 6.60 -5.20 -2.20
Figure 6. Summary of enrichment analysis in DisGeNET13.


GO Description Count % Log10(P) Log10(q)
C0278488 Carcinoma breast stage IV 12 7.90 -4.50 -1.60
C0004158 Athetosis 4 2.60 -4.40 -1.50
C0038525 Subarachnoid Hemorrhage 11 7.20 -4.30 -1.50
C0280324 Laryngeal Squamous Cell Carcinoma 11 7.20 -4.30 -1.50
C4551686 Malignant neoplasm of soft tissue 13 8.60 -4.30 -1.50
C0027809 Neurilemmoma 7 4.60 -4.30 -1.50
C1837279 Hypoplastic toenails 4 2.60 -4.20 -1.50
C1858712 Spastic paraplegia 10, autosomal dominant 3 2.00 -4.20 -1.40
C0555198 Malignant Glioma 13 8.60 -4.10 -1.40
C2720436 Fibrosis of pleura 3 2.00 -3.90 -1.20
C0020608 Hypodontia 7 4.60 -3.90 -1.20
C0039101 synovial sarcoma 8 5.30 -3.90 -1.20
C0015300 Exophthalmos 7 4.60 -3.80 -1.20
C0025995 Micromelia 5 3.30 -3.70 -1.10
C3642347 Basal-Like Breast Carcinoma 7 4.60 -3.60 -1.00
C0264545 Thickening of pleura 3 2.00 -3.60 -1.00
C0854917 Rhabdoid Tumor of the Kidney 4 2.60 -3.60 -1.00
C0553580 Ewings sarcoma 10 6.60 -3.50 -0.99
C0241181 Fragile skin 3 2.00 -3.50 -0.99
C0685938 Malignant neoplasm of gastrointestinal tract 9 5.90 -3.50 -0.99
Figure 7. Summary of enrichment analysis in PaGenBase14.


GO Description Count % Log10(P) Log10(q)
PGB:00021 Tissue-specific: cortex 4 2.60 -3.70 -1.10
PGB:00094 Cell-specific: Bronchial Epithelial Cells 4 2.60 -2.30 -0.24
PGB:00032 Tissue-specific: Cerebellum 4 2.60 -2.30 -0.21
Figure 8. Summary of enrichment analysis in Transcription Factor Targets.


GO Description Count % Log10(P) Log10(q)
M19265 SREBP1 Q6 10 6.60 -6.30 -3.10
M2459 EGR Q6 10 6.60 -5.80 -2.70
M30192 TAZ TARGET GENES 12 7.90 -5.20 -2.20
M17000 CP2 01 9 5.90 -5.20 -2.10
M13482 ZIC1 01 9 5.90 -5.10 -2.10
M1460 TGGNNNNNNKCCAR UNKNOWN 11 7.20 -4.90 -1.90
M16022 CTAWWWATA RSRFC4 Q2 10 6.60 -4.80 -1.80
M551 TEF1 Q6 8 5.30 -4.70 -1.80
M4831 SP1 01 8 5.30 -4.50 -1.70
M15929 MEF2 Q6 01 8 5.30 -4.40 -1.50
M19851 FOXO3 01 8 5.30 -4.40 -1.50
M10112 RNGTGGGC UNKNOWN 14 9.20 -4.40 -1.50
M2389 E47 02 8 5.30 -4.30 -1.50
M9300 SP1 Q4 01 8 5.30 -4.30 -1.50
M30410 ZSCAN5C TARGET GENES 6 3.90 -4.00 -1.30
M11345 AP4 Q6 7 4.60 -3.80 -1.20
M11934 SRF Q5 01 7 4.60 -3.80 -1.20
M6568 TAAWWATAG RSRFC4 Q2 6 3.90 -3.60 -1.00
M9557 SP1 Q6 01 7 4.60 -3.60 -1.00
M5320 HIF1 Q5 7 4.60 -3.60 -1.00

Reference

  1. Zhou et al., Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nature Communications (2019) 10(1):1523.
  2. Zar, J.H. Biostatistical Analysis 1999 4th edn., NJ Prentice Hall, pp. 523
  3. Hochberg Y., Benjamini Y. More powerful procedures for multiple significance testing. Statistics in Medicine (1990) 9:811-818.
  4. Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. (1960) 20:27-46.
  5. Shannon P. et al., Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res (2003) 11:2498-2504.
  6. Szklarczyk D. et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. (2019) 47:D607-613.
  7. Stark C. et al. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. (2006) 34:D535-539.
  8. Turei D. et al. A scored human protein-protein interaction network to catalyze genomic interpretation. Nat. Methods. (2016) 13:966-967.
  9. Li T. et al. A scored human protein-protein interaction network to catalyze genomic interpretation. Nat. Methods. (2017) 14:61-64.
  10. Bader, G.D. et al. An automated method for finding molecular complexes in large protein interaction networks. BMC bioinformatics (2003) 4:2.
  11. https://metascape.org/COVID.
  12. Subramanian A, et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 15545-15550 (2005).
  13. Pinero J, et al. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic acids research 45, D833-D839 (2017).
  14. Pan JB, et al. PaGenBase: a pattern gene database for the global and dynamic understanding of gene function. PLoS One 8, e80747 (2013).