Metascape Gene List Analysis Report

metascape.org1

Bar Graph Summary

Figure 1. Bar graph of enriched terms across input gene lists, colored by p-values.
Metascape only visualizes the top 20 clusters. Up to 100 enriched clusters can be viewed here.
The top-level Gene Ontology biological processes can be viewed here.

Gene Lists

User-provided gene identifiers are first converted into their corresponding H. sapiens Entrez gene IDs using the latest version of the database (last updated on 2024-09-01). If multiple identifiers correspond to the same Entrez gene ID, they will be considered as a single Entrez gene ID in downstream analyses. The gene lists are summarized in Table 1.

Table 1. Statistics of input gene lists.
Name Total Unique
MyList 418 414

Gene Annotation

The following are the list of annotations retrieved from the latest version of the database (last updated on 2024-09-01) (Table 2).

Table 2. Gene annotations extracted
Name Type Description
Gene Symbol Description Primary HUGO gene symbol.
Description Description Short description.
Biological Process (GO) Function/Location Descriptions summarized based on gene ontology database, where up to three most informative GO terms are kept.
Kinase Class (UniProt) Function/Location Detailed kinase classes.
Protein Function (Protein Atlas) Function/Location Protein Function (Protein Atlas)
Subcellular Location (Protein Atlas) Function/Location Subcellular Location (Protein Atlas)
Drug (DrugBank) Genotype/Phenotype/Disease Drug information for the given gene as target.
Protein Functions (ChatGPT) Description Uncurated gene functions described by ChatGPT.
Disease & Drugs (ChatGPT) Genotype/Phenotype/Disease Uncurated disease and drug associations described by ChatGPT.
Canonical Pathways Ontology Canonical Pathways
Hallmark Gene Sets Ontology Hallmark Gene Sets

Pathway and Process Enrichment Analysis

For each given gene list, pathway and process enrichment analysis have been carried out with the following ontology sources: KEGG Pathway, GO Biological Processes, Reactome Gene Sets, Canonical Pathways, CORUM, WikiPathways, and PANTHER Pathway. All genes in the genome have been used as the enrichment background. Terms with a p-value < 0.01, a minimum count of 3, and an enrichment factor > 1.5 (the enrichment factor is the ratio between the observed counts and the counts expected by chance) are collected and grouped into clusters based on their membership similarities. More specifically, p-values are calculated based on the cumulative hypergeometric distribution2, and q-values are calculated using the Benjamini-Hochberg procedure to account for multiple testings3. Kappa scores4 are used as the similarity metric when performing hierarchical clustering on the enriched terms, and sub-trees with a similarity of > 0.3 are considered a cluster. The most statistically significant term within a cluster is chosen to represent the cluster.

Table 3. Top 20 clusters with their representative enriched terms (one per cluster). "Count" is the number of genes in the user-provided lists with membership in the given ontology term. "%" is the percentage of all of the user-provided genes that are found in the given ontology term (only input genes with at least one ontology term annotation are included in the calculation). "Log10(P)" is the p-value in log base 10. "Log10(q)" is the multi-test adjusted p-value in log base 10.
GO Category Description Count % Log10(P) Log10(q)
GO:0035239 GO Biological Processes tube morphogenesis 52 12.59 -22.96 -18.62
GO:0001501 GO Biological Processes skeletal system development 41 9.93 -19.34 -15.84
GO:0007507 GO Biological Processes heart development 42 10.17 -18.04 -14.65
M5884 Canonical Pathways NABA CORE MATRISOME 29 7.02 -16.74 -13.47
GO:0072001 GO Biological Processes renal system development 29 7.02 -15.19 -12.08
GO:0060485 GO Biological Processes mesenchyme development 26 6.30 -15.04 -11.95
GO:0009725 GO Biological Processes response to hormone 44 10.65 -14.40 -11.38
GO:0098609 GO Biological Processes cell-cell adhesion 35 8.47 -13.30 -10.38
GO:0061061 GO Biological Processes muscle structure development 34 8.23 -13.15 -10.25
R-HSA-1474244 Reactome Gene Sets Extracellular matrix organization 26 6.30 -13.02 -10.15
GO:0030198 GO Biological Processes extracellular matrix organization 24 5.81 -12.13 -9.35
GO:0007423 GO Biological Processes sensory organ development 34 8.23 -11.86 -9.12
GO:0090100 GO Biological Processes positive regulation of transmembrane receptor protein serine/threonine kinase signaling pathway 16 3.87 -11.84 -9.10
GO:0003013 GO Biological Processes circulatory system process 31 7.51 -11.31 -8.63
GO:0048732 GO Biological Processes gland development 28 6.78 -11.07 -8.40
hsa04350 KEGG Pathway TGF-beta signaling pathway 15 3.63 -10.66 -8.02
GO:0035107 GO Biological Processes appendage morphogenesis 17 4.12 -10.55 -7.94
GO:0050678 GO Biological Processes regulation of epithelial cell proliferation 26 6.30 -10.00 -7.45
GO:0035050 GO Biological Processes embryonic heart tube development 13 3.15 -9.86 -7.36
GO:0030855 GO Biological Processes epithelial cell differentiation 32 7.75 -9.62 -7.12

To further capture the relationships between the terms, a subset of enriched terms has been selected and rendered as a network plot, where terms with a similarity > 0.3 are connected by edges. We select the terms with the best p-values from each of the 20 clusters, with the constraint that there are no more than 15 terms per cluster and no more than 250 terms in total. The network is visualized using Cytoscape5, where each node represents an enriched term and is colored first by its cluster ID (Figure 2.a) and then by its p-value (Figure 2.b). These networks can be interactively viewed in Cytoscape through the .cys files (contained in the Zip package, which also contains a publication-quality version as a PDF) or within a browser by clicking on the web icon. For clarity, term labels are only shown for one term per cluster, so it is recommended to use Cytoscape or a browser to visualize the network in order to inspect all node labels. We can also export the network into a PDF file within Cytoscape, and then edit the labels using Adobe Illustrator for publication purposes. To switch off all labels, delete the "Label" mapping under the "Style" tab within Cytoscape, and then export the network view.

Figure 2. Network of enriched terms: (a) colored by cluster ID, where nodes that share the same cluster ID are typically close to each other; (b) colored by p-value, where terms containing more genes tend to have a more significant p-value.

Protein-protein Interaction Enrichment Analysis

For each given gene list, protein-protein interaction enrichment analysis has been carried out with the following databases: STRING6, BioGrid7, OmniPath8, InWeb_IM9.Only physical interactions in STRING (physical score > 0.132) and BioGrid are used (details). The resultant network contains the subset of proteins that form physical interactions with at least one other member in the list. If the network contains between 3 and 500 proteins, the Molecular Complex Detection (MCODE) algorithm10 has been applied to identify densely connected network components. The MCODE networks identified for individual gene lists have been gathered and are shown in Figure 3.

Pathway and process enrichment analysis has been applied to each MCODE component independently, and the three best-scoring terms by p-value have been retained as the functional description of the corresponding components, shown in the tables underneath corresponding network plots within Figure 3.

Figure 3. Protein-protein interaction network and MCODE components identified in the gene lists.
GO Description Log10(P)
GO:0007167 enzyme-linked receptor protein signaling pathway -22.0
GO:0007507 heart development -21.2
GO:0001568 blood vessel development -20.4
Color MCODE GO Description Log10(P)
MCODE_1 GO:0007507 heart development -13.3
MCODE_1 GO:0048738 cardiac muscle tissue development -11.8
MCODE_1 GO:0014706 striated muscle tissue development -11.7
MCODE_2 R-HSA-216083 Integrin cell surface interactions -20.6
MCODE_2 hsa04820 Cytoskeleton in muscle cells -17.0
MCODE_2 hsa04512 ECM-receptor interaction -16.9
MCODE_3 WP4659 Gastrin signaling -7.9
MCODE_3 WP3414 Initiation of transcription and translation elongation at the HIV 1 LTR -7.2
MCODE_3 WP4754 IL18 signaling -6.5
MCODE_4 R-HSA-381426 Regulation of Insulin-like Growth Factor (IGF) transport and uptake by Insulin-like Growth Factor Binding Proteins (IGFBPs) -8.9
MCODE_4 R-HSA-8957275 Post-translational protein phosphorylation -6.4
MCODE_4 GO:0051897 positive regulation of phosphatidylinositol 3-kinase/protein kinase B signal transduction -5.6
MCODE_5 hsa04630 JAK-STAT signaling pathway -11.3
MCODE_5 M50 PID PTP1B PATHWAY -7.3
MCODE_5 GO:1904892 regulation of receptor signaling pathway via STAT -6.6

Quality Control and Association Analysis

Gene list enrichments are identified in the following ontology categories: COVID, Cell_Type_Signatures, DisGeNET, PaGenBase, TRRUST, Transcription_Factor_Targets. All genes in the genome have been used as the enrichment background. Terms with a p-value < 0.01, a minimum count of 3, and an enrichment factor > 1.5 (the enrichment factor is the ratio between the observed counts and the counts expected by chance) are collected and grouped into clusters based on their membership similarities. The top few enriched clusters (one term per cluster) are shown in the Figure 4-9. The algorithm used here is the same as that is used for pathway and process enrichment analysis.

Figure 4. Summary of enrichment analysis in COVID11.


GO Description Count % Log10(P) Log10(q)
COVID202 RNA_Vanderheiden_pHAE_48h_Down 21 5.10 -13.00 -9.80
COVID049 RNA_Wyler_Calu-3_24h_Down 19 4.60 -7.40 -5.10
COVID047 RNA_Wyler_Calu-3_12h_Down 15 3.60 -7.30 -5.00
COVID205 Proteome_Li_Urine-severe-vs-healthy_Down 14 3.40 -7.00 -4.70
COVID124 Interactome_Stukalov_A549_72h_ORF3 17 4.10 -6.00 -3.90
COVID243 RNA_Riva_Vero-E6_24h_Up 17 4.10 -6.00 -3.90
COVID009 RNA_Blanco-Melo_A549-ACE2_Down 14 3.40 -6.00 -3.90
COVID008 RNA_Blanco-Melo_A549_Up 16 3.90 -5.40 -3.40
COVID052 RNA_Xiong_BALF_Up 16 3.90 -5.40 -3.40
COVID011 RNA_Blanco-Melo_A549-ACE2-ruxolitinib_Down 14 3.40 -5.40 -3.40
COVID209 Proteome_Li_Urine-recovery-vs-healthy_Down 10 2.40 -5.00 -3.10
COVID032 RNA_Liao_BALF-severe_Up 15 3.60 -4.70 -2.90
COVID059 Phosphoproteome_Bouhaddou_Vero_E6_24h_Down 15 3.60 -4.70 -2.90
COVID127 Interactome_Stukalov_A549_72h_ORF8 5 1.20 -4.40 -2.60
COVID040 RNA_Sun_Calu-3_24h_Up 14 3.40 -4.10 -2.40
COVID048 RNA_Wyler_Calu-3_12h_Up 14 3.40 -4.10 -2.40
COVID057 Phosphoproteome_Bouhaddou_Vero_E6_12h_Down 14 3.40 -4.10 -2.40
COVID027 RNA_Lamers_intestinal-organoid_expansion_Down 13 3.10 -4.10 -2.40
COVID373 Interactome_Laurent_HEK293_24h_NSP2 12 2.90 -4.10 -2.40
COVID135 Proteome_Stukalov_A549-ACE2_24h_Up 9 2.20 -4.10 -2.30
Figure 5. Summary of enrichment analysis in Cell Type Signatures12.


GO Description Count % Log10(P) Log10(q)
M39209 HAY BONE MARROW STROMAL 92 22.00 -59.00 -54.00
M39050 MANNO MIDBRAIN NEUROTYPES HPERIC 75 18.00 -40.00 -36.00
M39175 MURARO PANCREAS MESENCHYMAL STROMAL CELL 70 17.00 -40.00 -36.00
M39264 HU FETAL RETINA FIBROBLAST 50 12.00 -33.00 -29.00
M39056 MANNO MIDBRAIN NEUROTYPES HRGL3 58 14.00 -32.00 -28.00
M39018 FAN EMBRYONIC CTX BIG GROUPS BRAIN ENDOTHELIAL 47 11.00 -31.00 -27.00
M39039 FAN EMBRYONIC CTX BRAIN ENDOTHELIAL 1 49 12.00 -29.00 -26.00
M39128 AIZARANI LIVER C29 MVECS 2 42 10.00 -28.00 -25.00
M39167 GAO LARGE INTESTINE ADULT CJ IMMUNE CELLS 50 12.00 -27.00 -24.00
M39114 AIZARANI LIVER C10 MVECS 1 38 9.20 -26.00 -23.00
M40167 DESCARTES FETAL CEREBRUM VASCULAR ENDOTHELIAL CELLS 52 13.00 -26.00 -23.00
M39057 MANNO MIDBRAIN NEUROTYPES HRGL1 41 9.90 -25.00 -21.00
M39122 AIZARANI LIVER C21 STELLATE CELLS 1 32 7.70 -24.00 -21.00
M40158 DESCARTES FETAL CEREBELLUM VASCULAR ENDOTHELIAL CELLS 51 12.00 -24.00 -20.00
M39176 MURARO PANCREAS ENDOTHELIAL CELL 40 9.70 -24.00 -20.00
M39055 MANNO MIDBRAIN NEUROTYPES HRGL2A 49 12.00 -23.00 -20.00
M39040 FAN EMBRYONIC CTX BRAIN ENDOTHELIAL 2 36 8.70 -22.00 -19.00
M39054 MANNO MIDBRAIN NEUROTYPES HRGL2B 40 9.70 -21.00 -17.00
M39274 DURANTE ADULT OLFACTORY NEUROEPITHELIUM FIBROBLASTS STROMAL CELLS 21 5.10 -20.00 -17.00
M41750 RUBENSTEIN SKELETAL MUSCLE FBN1 FAP CELLS 33 8.00 -20.00 -17.00
Figure 6. Summary of enrichment analysis in DisGeNET13.


GO Description Count % Log10(P) Log10(q)
C0002793 Anaplasia 40 9.70 -17.00 -14.00
C0153690 Secondary malignant neoplasm of bone 41 9.90 -15.00 -12.00
C0042373 Vascular Diseases 40 9.70 -14.00 -11.00
C0555198 Malignant Glioma 41 9.90 -14.00 -11.00
C0024796 Marfan Syndrome 19 4.60 -14.00 -11.00
C0151744 Myocardial Ischemia 41 9.90 -13.00 -10.00
C0025286 Meningioma 37 9.00 -13.00 -9.90
C0085207 Gestational Diabetes 37 9.00 -13.00 -9.60
C0018798 Congenital Heart Defects 29 7.00 -12.00 -9.40
C0598935 Tumor Initiation 33 8.00 -12.00 -9.30
C1449563 Cardiomyopathy, Familial Idiopathic 40 9.70 -12.00 -9.30
C0151650 Renal fibrosis 34 8.20 -12.00 -9.30
C4086152 Childhood Astrocytoma 35 8.50 -12.00 -9.00
C0007193 Cardiomyopathy, Dilated 31 7.50 -11.00 -8.50
C0010278 Craniosynostosis 30 7.30 -11.00 -8.30
C0017601 Glaucoma 38 9.20 -11.00 -8.20
C0025500 Mesothelioma 32 7.70 -11.00 -8.20
C3203102 Idiopathic pulmonary arterial hypertension 38 9.20 -11.00 -8.10
C0042133 Uterine Fibroids 32 7.70 -11.00 -8.00
C0023267 Fibroid Tumor 27 6.50 -11.00 -7.90
Figure 7. Summary of enrichment analysis in PaGenBase14.


GO Description Count % Log10(P) Log10(q)
PGB:00080 Cell-specific: Adipocyte 18 4.40 -12.00 -9.00
PGB:00029 Tissue-specific: placenta 18 4.40 -6.60 -4.40
PGB:00081 Cell-specific: Brain cell 7 1.70 -4.50 -2.60
PGB:00002 Tissue-specific: kidney 18 4.40 -3.90 -2.20
PGB:00023 Tissue-specific: heart 14 3.40 -3.80 -2.20
PGB:00035 Tissue-specific: ovary 12 2.90 -3.80 -2.10
PGB:00065 Cell-specific: DRG 17 4.10 -3.70 -2.10
PGB:00071 Cell-specific: HUVEC 15 3.60 -3.70 -2.00
PGB:00060 Cell-specific: liver cell 7 1.70 -3.60 -2.00
PGB:00058 Cell-specific: HEPG2 16 3.90 -3.60 -1.90
PGB:00033 Tissue-specific: thyroid 11 2.70 -3.40 -1.80
PGB:00094 Cell-specific: Bronchial Epithelial Cells 8 1.90 -3.20 -1.70
PGB:00106 Cell-specific: Breast cell 5 1.20 -3.00 -1.50
PGB:00067 Cell-specific: Testis Germ Cell 6 1.50 -2.90 -1.40
PGB:00016 Tissue-specific: ovary pool 3 0.73 -2.20 -0.88
PGB:00133 Cell-specific: Cardiac Myocytes 4 0.97 -2.00 -0.76
Figure 8. Summary of enrichment analysis in TRRUST.


GO Description Count % Log10(P) Log10(q)
TRR01256 Regulated by: SP1 21 5.10 -4.90 -3.00
TRR00270 Regulated by: EP300 6 1.50 -3.70 -2.00
TRR01379 Regulated by: TAL1 3 0.73 -3.40 -1.80
TRR00602 Regulated by: IRF1 5 1.20 -3.00 -1.50
TRR00469 Regulated by: HDAC2 4 0.97 -2.90 -1.50
TRR00075 Regulated by: BRCA1 5 1.20 -2.90 -1.40
TRR01071 Regulated by: PTTG1 3 0.73 -2.80 -1.40
TRR00342 Regulated by: FOS 5 1.20 -2.70 -1.30
TRR01158 Regulated by: RELA 12 2.90 -2.60 -1.20
TRR00412 Regulated by: GATA3 4 0.97 -2.60 -1.20
TRR01172 Regulated by: RUNX1 4 0.97 -2.60 -1.20
TRR00284 Regulated by: ETV4 3 0.73 -2.50 -1.10
TRR01521 Regulated by: VHL 3 0.73 -2.40 -1.10
TRR00466 Regulated by: HDAC1 5 1.20 -2.30 -0.94
TRR01012 Regulated by: PARP1 3 0.73 -2.20 -0.92
TRR00908 Regulated by: NR3C1 4 0.97 -2.20 -0.89
TRR00872 Regulated by: NFIC 3 0.73 -2.10 -0.81
TRR00645 Regulated by: JUN 7 1.70 -2.00 -0.75
Figure 9. Summary of enrichment analysis in Transcription Factor Targets.


GO Description Count % Log10(P) Log10(q)
M13849 TGACATY UNKNOWN 30 7.30 -7.70 -5.40
M572 TGCCAAR NF1 Q6 31 7.50 -7.60 -5.20
M11934 SRF Q5 01 16 3.90 -7.00 -4.70
M12443 SRF C 14 3.40 -5.80 -3.70
M14012 GATA1 04 15 3.60 -5.80 -3.70
M946 GGATTA PITX2 Q2 24 5.80 -5.60 -3.50
M7691 CATTGTYY SOX9 B1 18 4.40 -5.40 -3.40
M1031 FREAC2 01 15 3.60 -5.40 -3.40
M15183 E12 Q6 15 3.60 -5.40 -3.40
M13052 GATA6 01 15 3.60 -5.30 -3.30
M4238 OCT C 15 3.60 -5.30 -3.30
M10220 AP1 01 15 3.60 -5.30 -3.30
M14276 LYF1 01 15 3.60 -5.30 -3.30
M10416 WWTAAGGC UNKNOWN 11 2.70 -5.30 -3.30
M17420 WGGAATGY TEF1 Q6 18 4.40 -5.10 -3.20
M12379 NKX62 Q2 14 3.40 -5.10 -3.10
M9364 SRF Q6 14 3.40 -5.10 -3.10
M18169 YATGNWAAT OCT C 17 4.10 -4.90 -3.00
M18963 HSF2 01 14 3.40 -4.80 -2.90
M9431 AP1 Q6 14 3.40 -4.80 -2.90

Reference

  1. Zhou et al., Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nature Communications (2019) 10(1):1523.
  2. Zar, J.H. Biostatistical Analysis 1999 4th edn., NJ Prentice Hall, pp. 523
  3. Hochberg Y., Benjamini Y. More powerful procedures for multiple significance testing. Statistics in Medicine (1990) 9:811-818.
  4. Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. (1960) 20:27-46.
  5. Shannon P. et al., Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res (2003) 11:2498-2504.
  6. Szklarczyk D. et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. (2019) 47:D607-613.
  7. Stark C. et al. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. (2006) 34:D535-539.
  8. Turei D. et al. A scored human protein-protein interaction network to catalyze genomic interpretation. Nat. Methods. (2016) 13:966-967.
  9. Li T. et al. A scored human protein-protein interaction network to catalyze genomic interpretation. Nat. Methods. (2017) 14:61-64.
  10. Bader, G.D. et al. An automated method for finding molecular complexes in large protein interaction networks. BMC bioinformatics (2003) 4:2.
  11. https://metascape.org/COVID.
  12. Subramanian A, et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 15545-15550 (2005).
  13. Pinero J, et al. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic acids research 45, D833-D839 (2017).
  14. Pan JB, et al. PaGenBase: a pattern gene database for the global and dynamic understanding of gene function. PLoS One 8, e80747 (2013).