Metascape Gene List Analysis Report

metascape.org1

Bar Graph Summary

Figure 1. Bar graph of enriched terms across input gene lists, colored by p-values.
The top-level Gene Ontology biological processes can be viewed here.

Gene Lists

User-provided gene identifiers are first converted into their corresponding H. sapiens Entrez gene IDs using the latest version of the database (last updated on 2024-09-01). If multiple identifiers correspond to the same Entrez gene ID, they will be considered as a single Entrez gene ID in downstream analyses. The gene lists are summarized in Table 1.

Table 1. Statistics of input gene lists.
Name Total Unique
MyList 153 147

Gene Annotation

The following are the list of annotations retrieved from the latest version of the database (last updated on 2024-09-01) (Table 2).

Table 2. Gene annotations extracted
Name Type Description
Gene Symbol Description Primary HUGO gene symbol.
Description Description Short description.
Biological Process (GO) Function/Location Descriptions summarized based on gene ontology database, where up to three most informative GO terms are kept.
Kinase Class (UniProt) Function/Location Detailed kinase classes.
Protein Function (Protein Atlas) Function/Location Protein Function (Protein Atlas)
Subcellular Location (Protein Atlas) Function/Location Subcellular Location (Protein Atlas)
Drug (DrugBank) Genotype/Phenotype/Disease Drug information for the given gene as target.
Protein Functions (ChatGPT) Description Uncurated gene functions described by ChatGPT.
Disease & Drugs (ChatGPT) Genotype/Phenotype/Disease Uncurated disease and drug associations described by ChatGPT.
Canonical Pathways Ontology Canonical Pathways
Hallmark Gene Sets Ontology Hallmark Gene Sets

Pathway and Process Enrichment Analysis

For each given gene list, pathway and process enrichment analysis have been carried out with the following ontology sources: KEGG Pathway, GO Biological Processes, Reactome Gene Sets, Canonical Pathways, CORUM, WikiPathways, and PANTHER Pathway. All genes in the genome have been used as the enrichment background. Terms with a p-value < 0.01, a minimum count of 3, and an enrichment factor > 1.5 (the enrichment factor is the ratio between the observed counts and the counts expected by chance) are collected and grouped into clusters based on their membership similarities. More specifically, p-values are calculated based on the cumulative hypergeometric distribution2, and q-values are calculated using the Benjamini-Hochberg procedure to account for multiple testings3. Kappa scores4 are used as the similarity metric when performing hierarchical clustering on the enriched terms, and sub-trees with a similarity of > 0.3 are considered a cluster. The most statistically significant term within a cluster is chosen to represent the cluster.

Table 3. Top 17 clusters with their representative enriched terms (one per cluster). "Count" is the number of genes in the user-provided lists with membership in the given ontology term. "%" is the percentage of all of the user-provided genes that are found in the given ontology term (only input genes with at least one ontology term annotation are included in the calculation). "Log10(P)" is the p-value in log base 10. "Log10(q)" is the multi-test adjusted p-value in log base 10.
GO Category Description Count % Log10(P) Log10(q)
hsa04080 KEGG Pathway Neuroactive ligand-receptor interaction 13 8.90 -7.53 -3.18
R-HSA-156590 Reactome Gene Sets Glutathione conjugation 3 2.05 -3.16 0.00
GO:0006400 GO Biological Processes tRNA modification 4 2.74 -2.98 0.00
GO:0009206 GO Biological Processes purine ribonucleoside triphosphate biosynthetic process 4 2.74 -2.93 0.00
GO:0009409 GO Biological Processes response to cold 3 2.05 -2.93 0.00
GO:0007528 GO Biological Processes neuromuscular junction development 3 2.05 -2.90 0.00
GO:0007005 GO Biological Processes mitochondrion organization 8 5.48 -2.84 0.00
GO:0032271 GO Biological Processes regulation of protein polymerization 5 3.42 -2.51 0.00
WP411 WikiPathways mRNA processing 4 2.74 -2.49 0.00
GO:0099072 GO Biological Processes regulation of postsynaptic membrane neurotransmitter receptor levels 3 2.05 -2.45 0.00
GO:0061024 GO Biological Processes membrane organization 10 6.85 -2.45 0.00
GO:0030072 GO Biological Processes peptide hormone secretion 3 2.05 -2.31 0.00
GO:0048663 GO Biological Processes neuron fate commitment 3 2.05 -2.27 0.00
GO:0060271 GO Biological Processes cilium assembly 6 4.11 -2.20 0.00
hsa01240 KEGG Pathway Biosynthesis of cofactors 4 2.74 -2.19 0.00
GO:0016050 GO Biological Processes vesicle organization 6 4.11 -2.18 0.00
hsa00983 KEGG Pathway Drug metabolism - other enzymes 3 2.05 -2.15 0.00

To further capture the relationships between the terms, a subset of enriched terms has been selected and rendered as a network plot, where terms with a similarity > 0.3 are connected by edges. We select the terms with the best p-values from each of the 20 clusters, with the constraint that there are no more than 15 terms per cluster and no more than 250 terms in total. The network is visualized using Cytoscape5, where each node represents an enriched term and is colored first by its cluster ID (Figure 2.a) and then by its p-value (Figure 2.b). These networks can be interactively viewed in Cytoscape through the .cys files (contained in the Zip package, which also contains a publication-quality version as a PDF) or within a browser by clicking on the web icon. For clarity, term labels are only shown for one term per cluster, so it is recommended to use Cytoscape or a browser to visualize the network in order to inspect all node labels. We can also export the network into a PDF file within Cytoscape, and then edit the labels using Adobe Illustrator for publication purposes. To switch off all labels, delete the "Label" mapping under the "Style" tab within Cytoscape, and then export the network view.

Figure 2. Network of enriched terms: (a) colored by cluster ID, where nodes that share the same cluster ID are typically close to each other; (b) colored by p-value, where terms containing more genes tend to have a more significant p-value.

Protein-protein Interaction Enrichment Analysis

For each given gene list, protein-protein interaction enrichment analysis has been carried out with the following databases: STRING6, BioGrid7, OmniPath8, InWeb_IM9.Only physical interactions in STRING (physical score > 0.132) and BioGrid are used (details). The resultant network contains the subset of proteins that form physical interactions with at least one other member in the list. If the network contains between 3 and 500 proteins, the Molecular Complex Detection (MCODE) algorithm10 has been applied to identify densely connected network components. The MCODE networks identified for individual gene lists have been gathered and are shown in Figure 3.

Pathway and process enrichment analysis has been applied to each MCODE component independently, and the three best-scoring terms by p-value have been retained as the functional description of the corresponding components, shown in the tables underneath corresponding network plots within Figure 3.

Figure 3. Protein-protein interaction network and MCODE components identified in the gene lists.
GO Description Log10(P)
hsa04080 Neuroactive ligand-receptor interaction -7.4
R-HSA-500792 GPCR ligand binding -6.5
GO:0007218 neuropeptide signaling pathway -5.6
Color MCODE GO Description Log10(P)
MCODE_1 R-HSA-375276 Peptide ligand-binding receptors -10.9
MCODE_1 R-HSA-418594 G alpha (i) signalling events -9.9
MCODE_1 R-HSA-373076 Class A/1 (Rhodopsin-like receptors) -9.8
MCODE_3 R-HSA-373080 Class B/2 (Secretin family receptors) -7.5
MCODE_3 R-HSA-418555 G alpha (s) signalling events -6.9
MCODE_3 hsa04024 cAMP signaling pathway -6.4

Quality Control and Association Analysis

Gene list enrichments are identified in the following ontology categories: COVID, Cell_Type_Signatures, DisGeNET, PaGenBase, TRRUST, Transcription_Factor_Targets. All genes in the genome have been used as the enrichment background. Terms with a p-value < 0.01, a minimum count of 3, and an enrichment factor > 1.5 (the enrichment factor is the ratio between the observed counts and the counts expected by chance) are collected and grouped into clusters based on their membership similarities. The top few enriched clusters (one term per cluster) are shown in the Figure 4-9. The algorithm used here is the same as that is used for pathway and process enrichment analysis.

Figure 4. Summary of enrichment analysis in COVID11.


GO Description Count % Log10(P) Log10(q)
COVID182 Proteome_Stukalov_A549_72h_ORF9B_Down 6 4.10 -3.20 -0.42
COVID015 RNA_Blanco-Melo_Calu-3_Down 6 4.10 -2.50 -0.08
Figure 5. Summary of enrichment analysis in Cell Type Signatures12.


GO Description Count % Log10(P) Log10(q)
M39066 MANNO MIDBRAIN NEUROTYPES HNBML5 13 8.90 -6.40 -2.30
M39070 MANNO MIDBRAIN NEUROTYPES HNBGABA 13 8.90 -4.40 -1.20
M39076 ZHONG PFC MAJOR TYPES INTERNEURON 3 2.10 -4.10 -0.99
M40270 DESCARTES FETAL PANCREAS ENS NEURONS 6 4.10 -3.80 -0.83
M39026 FAN EMBRYONIC CTX EX 4 EXCITATORY NEURON 6 4.10 -3.80 -0.83
M40083 DESCARTES MAIN FETAL UNIPOLAR BRUSH CELLS 3 2.10 -3.30 -0.51
M40249 DESCARTES FETAL LUNG VISCERAL NEURONS 6 4.10 -3.20 -0.46
M40214 DESCARTES FETAL INTESTINE ENS NEURONS 5 3.40 -3.00 -0.29
M40114 DESCARTES MAIN FETAL ENS NEURONS 3 2.10 -3.00 -0.29
M39195 HAY BONE MARROW EARLY ERYTHROBLAST 4 2.70 -2.90 -0.29
M41710 FAN OVARY CL8 MATURE CUMULUS GRANULOSA CELL 2 10 6.80 -2.90 -0.29
M41705 FAN OVARY CL3 MATURE CUMULUS GRANULOSA CELL 1 6 4.10 -2.80 -0.29
M39072 MANNO MIDBRAIN NEUROTYPES HSERT 8 5.50 -2.80 -0.29
M39068 MANNO MIDBRAIN NEUROTYPES HDA1 9 6.20 -2.70 -0.20
M39257 MENON FETAL KIDNEY 7 LOOPOF HENLE CELLS DISTAL 6 4.10 -2.60 -0.19
M40149 DESCARTES FETAL ADRENAL SYMPATHOBLASTS 4 2.70 -2.60 -0.16
M39073 MANNO MIDBRAIN NEUROTYPES HOMTN 7 4.80 -2.50 -0.13
M39063 MANNO MIDBRAIN NEUROTYPES HNBM 6 4.10 -2.50 -0.10
M39069 MANNO MIDBRAIN NEUROTYPES HDA2 8 5.50 -2.50 -0.07
M39161 GAO LARGE INTESTINE ADULT CA ENTEROENDOCRINE CELLS 6 4.10 -2.40 -0.04
Figure 6. Summary of enrichment analysis in DisGeNET13.


GO Description Count % Log10(P) Log10(q)
C0086768 Pancreatic Cholera 4 2.70 -7.20 -2.70
C0234230 Pain, Burning 6 4.10 -6.50 -2.30
C0234238 Ache 5 3.40 -5.40 -1.90
C0458257 Pain, Splitting 5 3.40 -5.40 -1.90
C0458259 Pain, Crushing 5 3.40 -5.40 -1.90
C0751407 Pain, Migratory 5 3.40 -5.40 -1.90
C0751408 Suffering, Physical 5 3.40 -5.40 -1.90
C0234254 Radiating pain 5 3.40 -5.30 -1.90
C0424139 Anxiety and fear 5 3.40 -5.30 -1.90
C0032002 Pituitary Diseases 7 4.80 -5.00 -1.60
C0525045 Mood Disorders 12 8.20 -4.60 -1.20
C0009088 Cluster Headache 4 2.70 -4.50 -1.20
C0563625 Agnosia for Pain 7 4.80 -4.20 -1.10
C0600467 Neurogenic Inflammation 3 2.10 -4.10 -0.99
C0006012 Borderline Personality Disorder 7 4.80 -4.00 -0.92
C0001973 Alcoholic Intoxication, Chronic 11 7.50 -3.90 -0.90
C3160917 Bladder pain syndrome 3 2.10 -3.90 -0.90
C0206718 Ganglioneuroblastoma 4 2.70 -3.80 -0.82
C4317109 Epileptic Seizures 7 4.80 -3.70 -0.72
C4551584 Brain atrophy 6 4.10 -3.60 -0.66
Figure 7. Summary of enrichment analysis in PaGenBase14.


GO Description Count % Log10(P) Log10(q)
PGB:00041 Tissue-specific: Pancreatic Islet 3 2.10 -4.30 -1.10
PGB:00032 Tissue-specific: Cerebellum 6 4.10 -4.20 -1.10
PGB:00091 Cell-specific: A204 4 2.70 -3.80 -0.83
PGB:00005 Tissue-specific: colon 6 4.10 -2.20 0.00
PGB:00065 Cell-specific: DRG 7 4.80 -2.20 0.00
Figure 8. Summary of enrichment analysis in TRRUST.


GO Description Count % Log10(P) Log10(q)
TRR01160 Regulated by: REST 4 2.70 -4.30 -1.10
Figure 9. Summary of enrichment analysis in Transcription Factor Targets.


GO Description Count % Log10(P) Log10(q)
M30065 MIER1 TARGET GENES 11 7.50 -5.70 -1.90
M10046 ATF B 8 5.50 -5.40 -1.90
M17997 TGACGTCA ATF3 Q6 8 5.50 -4.70 -1.30
M15675 CCCNNGGGAR OLF1 01 9 6.20 -4.50 -1.20
M16699 CREBP1CJUN 01 8 5.50 -4.40 -1.10
M16213 ATF 01 8 5.50 -4.30 -1.10
M17180 CREB 01 8 5.50 -4.30 -1.10
M16822 NRSF 01 5 3.40 -4.00 -0.91
M12826 ATF1 Q6 7 4.80 -3.80 -0.83
M7173 TCCATTKW UNKNOWN 7 4.80 -3.80 -0.81
M10704 NKX25 02 7 4.80 -3.50 -0.59
M9955 TGAYRTCA ATF3 Q6 10 6.80 -3.50 -0.59
M6446 CACCCBINDINGFACTOR Q6 7 4.80 -3.50 -0.59
M19642 CDPCR1 01 5 3.40 -3.40 -0.55
M6101 ER Q6 7 4.80 -3.30 -0.52
M30051 LMTK3 TARGET GENES 10 6.80 -3.20 -0.44
M3410 CREB Q4 01 6 4.10 -3.20 -0.42
M2209 CACBINDINGPROTEIN Q6 6 4.10 -2.90 -0.29
M30170 SNIP1 TARGET GENES 11 7.50 -2.90 -0.29
M574 RSRFC4 01 6 4.10 -2.90 -0.29

Reference

  1. Zhou et al., Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nature Communications (2019) 10(1):1523.
  2. Zar, J.H. Biostatistical Analysis 1999 4th edn., NJ Prentice Hall, pp. 523
  3. Hochberg Y., Benjamini Y. More powerful procedures for multiple significance testing. Statistics in Medicine (1990) 9:811-818.
  4. Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. (1960) 20:27-46.
  5. Shannon P. et al., Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res (2003) 11:2498-2504.
  6. Szklarczyk D. et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. (2019) 47:D607-613.
  7. Stark C. et al. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. (2006) 34:D535-539.
  8. Turei D. et al. A scored human protein-protein interaction network to catalyze genomic interpretation. Nat. Methods. (2016) 13:966-967.
  9. Li T. et al. A scored human protein-protein interaction network to catalyze genomic interpretation. Nat. Methods. (2017) 14:61-64.
  10. Bader, G.D. et al. An automated method for finding molecular complexes in large protein interaction networks. BMC bioinformatics (2003) 4:2.
  11. https://metascape.org/COVID.
  12. Subramanian A, et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 15545-15550 (2005).
  13. Pinero J, et al. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic acids research 45, D833-D839 (2017).
  14. Pan JB, et al. PaGenBase: a pattern gene database for the global and dynamic understanding of gene function. PLoS One 8, e80747 (2013).