0. Working directory

Ensure the ‘workshop’ directory is your current working directory:

getwd()
## [1] "/home/user3/workshop"

WebGestaltR outputs results to a folder containing multiple files. Make a parent directory for the results of this tool:

dir.create("WebGestaltR_results")
## Warning in dir.create("WebGestaltR_results"): 'WebGestaltR_results' already
## exists

1. Explore natively supported organisms, namespaces and databases

WebGestaltR supports 12 species directly, however, you can import your own database files to perform ORA and GSEA for novel species :-) We will do that in the next session.

Let’s view the list of supported organisms. We don’t have to specify any arguments, but we do need the empty brackets. Without them will print the function source code.

listOrganism()
##  [1] "athaliana"     "btaurus"       "celegans"      "cfamiliaris"  
##  [5] "drerio"        "sscrofa"       "dmelanogaster" "ggallus"      
##  [9] "hsapiens"      "mmusculus"     "rnorvegicus"   "scerevisiae"

The next two commands have a default setting of organism = "hsapiens", so running without any argument will show the genesets (databases) and ID types (namespaces) that are supported for human.

View databases for human. Use the black arrow on the right of the table to view the other 2 columns, and use the numbers below the table (or ‘next’) to view the next 10 rows.

listGeneSet()

And the supported human namespaces:

listIdType()
##  [1] "The_Genotype-Tissue_ExpressionProjectGTEx"  
##  [2] "affy_Axiom_BioBank1"                        
##  [3] "affy_Axiom_PMRA"                            
##  [4] "affy_Axiom_tx_v1"                           
##  [5] "affy_GenomeWideSNP_5"                       
##  [6] "affy_GenomeWideSNP_6"                       
##  [7] "affy_Mapping10K_Xba142"                     
##  [8] "affy_Mapping250K_Nsp"                       
##  [9] "affy_Mapping250K_Sty"                       
## [10] "affy_Mapping50K_Hind240"                    
## [11] "affy_Mapping50K_Xba240"                     
## [12] "affy_OncoScan"                              
## [13] "affy_RosettaMerck_Human_RSTA"               
## [14] "affy_hc_g110"                               
## [15] "affy_hg_focus"                              
## [16] "affy_hg_u133_plus_2"                        
## [17] "affy_hg_u133a"                              
## [18] "affy_hg_u133a_2"                            
## [19] "affy_hg_u133b"                              
## [20] "affy_hg_u95a"                               
## [21] "affy_hg_u95av2"                             
## [22] "affy_hg_u95b"                               
## [23] "affy_hg_u95c"                               
## [24] "affy_hg_u95d"                               
## [25] "affy_hg_u95e"                               
## [26] "affy_hta_2_0"                               
## [27] "affy_huex_1_0_st_v2"                        
## [28] "affy_hugene_1_0_st_v1"                      
## [29] "affy_hugene_2_0_st_v1"                      
## [30] "affy_hugenefl"                              
## [31] "affy_primeview"                             
## [32] "affy_u133_x3p"                              
## [33] "agilent_cgh_44b"                            
## [34] "agilent_custom_SAGE_Bionetworks_GPL564"     
## [35] "agilent_gpl6848"                            
## [36] "agilent_human_custom_GPL564"                
## [37] "agilent_sureprint_g3_ge_8x60k"              
## [38] "agilent_sureprint_g3_ge_8x60k_v2"           
## [39] "agilent_wholegenome"                        
## [40] "agilent_wholegenome_4x44k_v1"               
## [41] "agilent_wholegenome_4x44k_v2"               
## [42] "codelink_codelink"                          
## [43] "dbSNP"                                      
## [44] "embl"                                       
## [45] "ensembl_gene_id"                            
## [46] "ensembl_peptide_id"                         
## [47] "entrezgene"                                 
## [48] "entrezgene_protein-coding"                  
## [49] "genename"                                   
## [50] "genesymbol"                                 
## [51] "illumina_Infinium_HumanMethylation_beadchip"
## [52] "illumina_Sentrix_HumanRef-8v2_GPL2700"      
## [53] "illumina_human-6v3"                         
## [54] "illumina_humanRef-8v2"                      
## [55] "illumina_human_methylation_27"              
## [56] "illumina_human_methylation_450"             
## [57] "illumina_humanht_12_v3"                     
## [58] "illumina_humanht_12_v4"                     
## [59] "illumina_humanref_8_v3"                     
## [60] "illumina_humanwg_6_v1"                      
## [61] "illumina_humanwg_6_v2"                      
## [62] "illumina_humanwg_6_v3"                      
## [63] "interpro"                                   
## [64] "phalanx_onearray"                           
## [65] "phosphositeEnsembl"                         
## [66] "phosphositeRefseq"                          
## [67] "phosphositeSeq"                             
## [68] "phosphositeUniprot"                         
## [69] "protein_id"                                 
## [70] "refseq_mrna"                                
## [71] "refseq_peptide"                             
## [72] "unigene"                                    
## [73] "uniprotswissprot"

Pick your favourite species from the list of 12, using the same spelling as shown in the listOrganism() function, and investigate which namespaces and databases are available:

fave <- "cfamiliaris"
listIdType( organism = fave)
##  [1] "affy_CanGene-1_0-st-v1"    "affy_CanGene-1_1-st-v1"   
##  [3] "affy_Canine"               "affy_canine_2"            
##  [5] "dbSNP"                     "embl"                     
##  [7] "ensembl_gene_id"           "ensembl_peptide_id"       
##  [9] "entrezgene"                "entrezgene_protein-coding"
## [11] "genename"                  "genesymbol"               
## [13] "interpro"                  "protein_id"               
## [15] "refseq_mrna"               "refseq_peptide"           
## [17] "unigene"                   "uniprotswissprot"
listGeneSet(organism = "cfamiliaris")

2. Load input data and extract gene lists

We will use the same Pezzini RNAseq dataset as earlier. Since we have previously saved our ranked list, DEGs and background genes to the workshop folder, we could import those. However, clarity of how the gene list inputs were made is retained within the notebook, and this enhances reproducibility. Gene lists are quick and simple to extract from the input data. If the process was slow and compute-intensive, we would instead document the source and methods behind the gene lists in the notebook comments instead of re-creating them.

Load and check the input dataset:

# Full dataset
data <- read_tsv("Pezzini_DE.txt", col_names = TRUE, show_col_types = FALSE)
head(data)

Before we extract the gene lists, we need to understand what class of object is required by the enrichment function. For this package, the single enrichment function shares the package name.

Bring up the help menu for the WebGestaltR function and spend a few minutes reviewing the parameters.

?WebGestaltR

There are quite a few! For many of them (eg gene set size filters, multiple testing correction method, P value cutoff) the default settings are suitable.

In particular, look for the parameters that control:

Hopefully you’ve discovered that the WebGestaltR function can intake EITHER gene lists from files (as long as the right column format and file suffix is provided) or R objects.

Since we have decided to extract the gene lists from the DE matrix to R objects, we need to provide the gene list object to interestGene parameter (and referenceGene for ORA background).

For ORA, the gene lists need to be vectors, and for GSEA, a 2-column dataframe (unlike clusterProfiler, which requires a GSEA vector).

Our input matrix contains ENSEMBL IDs as well as official gene symbols, so we could use “ensembl_gene_id” or “genesymbol” for the parameter interestGeneType. Let’s extract the ENSEMBL IDs since they are more specific than symbol.

# Filter genes with adjusted p-value < 0.01 and absolute log2 fold change > 2 and saved as 'DEGs' vector
DEGs <- data %>%
  filter(FDR < 0.01, abs(Log2FC) > 2) %>%
  pull(Gene.ID)

# Extract all gene IDs as the 'background' vector
background <- data %>%
  pull(Gene.ID)

# Check: 
cat("Number of DEGS:", length(DEGs), "\n")
## Number of DEGS: 792
cat("Number of background genes:", length(background), "\n")
## Number of background genes: 14420
cat("First 6 DEGs:", head(DEGs), "\n")
## First 6 DEGs: ENSG00000000971 ENSG00000001617 ENSG00000002586 ENSG00000002746 ENSG00000003137 ENSG00000005243
cat("Fist 6 background genes:", head(background), "\n")
## Fist 6 background genes: ENSG00000000003 ENSG00000000419 ENSG00000000457 ENSG00000000460 ENSG00000000971 ENSG00000001036
# extract ranked dataframe, saved as 'ranked' object 
ranked <- data %>%
  arrange(desc(Log2FC)) %>%
  dplyr::select(Gene.ID, Log2FC)

# check
head(ranked)
tail(ranked)

3. Run ORA over multiple databases

WebGestaltR makes it simple to enrich over many databases at once in one run command. To do this, we just need to provide the arguments to the enrichDatabase parameter as a list of database names instead of a single database name.

For this task, let’s focus on the pathway gene sets. From skimming the output of listGeneSet() there were a few. We could manually locate these and copy them in to our list, or take advantage of the fact that the WebGestaltR developers have been systematic in the gene set naming, ensuring all database names are prefixed with their type, ie geneontology_, pathway_, network_, plus a few others.

# Save the databases for human
databases <- listGeneSet()

# Extract the the pathways from the 'name' column that start with 'pathway'
pathway_dbs <- subset(databases, grepl("^pathway", name))

# Save the pathway 'names' column to a list
pathway_names <- pathway_dbs$name

# Check the list
print(pathway_names)
## [1] "pathway_KEGG"               "pathway_Panther"           
## [3] "pathway_Reactome"           "pathway_Wikipathway"       
## [5] "pathway_Wikipathway_cancer"

This gives us the same result as pathway_names <- c("pathway_Reactome", "pathway_KEGG", "pathway_Panther", "pathway_Wikipathway", ""pathway_Wikipathway_cancer") but with less manual effort, and less room for typographic errors :-)

Since these VMs have 8 cores, let’s set nThreads to 6. For ORA, this will make very little difference, as the analysis is very fast, yet for GSEA, it can speed up the analysis a lot.

In testing, running GSEA over these 5 pathway databases with our ranked query and the default of 1,000 GSEA permutations required 24.1 minutes without threading, and 6.8 minutes with nThreads = 6.

There is also a function WebGestaltRBatch and this is for processing multiple input query lists. With this function, you can also set isParallel = TRUE along with nThreads = N to run your batch of query lists with multiple threads and in parallel rather than sequentially for much faster run times, assuming you had the compute resources to do so.

Since we don’t want to wait 7 minutes for a result, let’s run this multi-database query with ORA instead of GSEA.

# Specify output directory (must exist) 
outdir <- "WebGestaltR_results"

# Specify project name
project = "ORA_pathways"

WebGestaltR(
    organism = "hsapiens",                   # Species
    enrichMethod = "ORA",                    # Perform ORA, GSEA or NTA
    interestGene = DEGs,                # Query gene list
    interestGeneType = "ensembl_gene_id",         # Gene ID type for gene list
    referenceGene = background,                    # Background gene list
    referenceGeneType = "ensembl_gene_id",        # Gene ID type for background
    enrichDatabase = pathway_names,  # Database name or list of databases to enrich over
    isOutput = TRUE,                        # yes save report files saved to disk
    fdrMethod = "BH",                        # Multiple testing correction method (BH = Benjamini-Hochberg)
    sigMethod = "fdr",                       # Significance method ('fdr' = false discover rate)
    fdrThr = 0.05,                           # FDR significance threshold
    minNum = 10,                              # Minimum number of genes in a gene set to include
    maxNum = 500,                             # Maximum number of genes in a gene set to include
    outputDirectory = outdir,                  
    projectName = project,
    nThreads = 6                            # use 6 threads for faster run 
)
## Loading the functional categories...
## Warning in loadGeneSet(organism = organism, enrichDatabase = enrichDatabase, :
## Duplicate gene set names in pathway_Wikipathway_cancer have been ignored.
## Loading the ID list...
## Loading the reference list...
## Warning in dir.create(projectDir): 'WebGestaltR_results/Project_ORA_pathways'
## already exists
## Summarizing the input ID list by GO Slim data...
## Performing the enrichment analysis...
## Begin affinity propagation...
## End affinity propagation...
## Begin weighted set cover...
## End weighted set cover...
## Generate the final report...
## Results can be found in the WebGestaltR_results/Project_ORA_pathways!

The results are saved within a new folder inside our new folder WebGestaltR_results/Project_ORA_pathways. There are a number of results files, the one we will focus on is the interactive HTML summary file.

STOP: to save time for GSEA compute, skip ahead, run the code chunk labelled GSEA GO MF with redundant (it takes several minutes) then return here where we will explore the ORA HTML while the GSEA runs!!!

In the Files pane, open the folder WebGestaltR_results/Project_ORA_pathways then click on the Report_ORA_pathways.html file. Select View in Web Browser.

Some things to note:

4. Explore the ‘noRedundant’ gene ontology databases

It’s well known that the GO hierarchy, by definition, includes redundancy. When performing enrichment, higher-order terms in the hierarchy are often significant yet not particularly informative.

Tools such as topGO and REVIGO are dedicated to removing redundancy from the Gene Ontology. Add to that list WebGestaltR and (WebGestalt web of course!)

This tool runs its own redundancy filter over the GO databases to produce refined database versions:

We can read about their approach in the ‘description’ column of the database:

databases$description[databases$name == "geneontology_Molecular_Function_noRedundant"]
## [1] "The gene ontology molecular function database was downloaded from http://www.geneontology.org/. Then, we only contain the non-redundant categories by selecting the most general categories in each branch of the GO DAG structure from all categories with the number of annotated genes from 20 to 500."

Let’s run enrichment over the full and the non-redundant version of the GO MF databases, and compare the results. We expect to see fewer and more specific terms in the “noRedundant” results than the full GO MF results.

Let’s use GSEA since we have already tried ORA with this package. GSEA is slower and GO is large, so even with 7 threads these commands will take a few minutes (longer for redundant than non-redundant, of course). Feel free to use the compute time to ask questions on slack or explore the ORA pathways output some more!

There is no seed parameter for WebGestaltR GSEA as there is for clusterProfiler. We can set it in R instead with set.seed().

set.seed(123)

This is an advantage of using WebGestaltR over the web counterpart :-)

However, a note from testing: without setting the seed in R, a slightly different number of enriched terms for the GSEA below were returned over 3 replicate runs. With setting the seed, the same number and IDs of terms were significant among the replicate runs, BUT the NES and FDR were slightly different! The unadjusted ES and P values were the same.

outputDirectory <- "WebGestaltR_results" 
project <- "GSEA_GO-MF_with-redundant"
database  <- "geneontology_Molecular_Function"

suppressWarnings({ gomf <- WebGestaltR(
    organism = "hsapiens",                   # Use your species (e.g., "hsapiens" for humans)
    enrichMethod = "GSEA",                    # Perform ORA, GSEA or NTA
    interestGene = ranked,                # Your gene list
    interestGeneType = "ensembl_gene_id",         # Specify the gene ID type
    enrichDatabase = database,  # The database for enrichment analysis
    isOutput = TRUE,                        # Set to FALSE if you don't want files saved to disk
    fdrMethod = "BH",                        # Correction method (e.g., Benjamini-Hochberg)
    sigMethod = "fdr",                       # Significance method ('fdr' or 'top')
    fdrThr = 0.05,                           # FDR significance threshold
    minNum = 10,                              # Minimum number of genes per category
    maxNum = 500,                             # Maximum number of genes per category
    boxplot = TRUE,
    outputDirectory = outputDirectory,
    projectName = project,
    nThreads = 7
)
})
## Loading the functional categories...
## Loading the ID list...
## Summarizing the uploaded ID list by GO Slim data...
## Performing the enrichment analysis...
## 1000 permutations of score complete...
## Begin affinity propagation...
## End affinity propagation...
## Begin weighted set cover...
## End weighted set cover...
## Generate the final report...
## Results can be found in the WebGestaltR_results/Project_GSEA_GO_MF_with_redundant!
outputDirectory <- "WebGestaltR_results" 
project <- "GSEA_GO-MF_non-redundant"
database  <- "geneontology_Molecular_Function_noRedundant"

suppressWarnings({ gomf_nr <- WebGestaltR(
    organism = "hsapiens",                   # Use your species (e.g., "hsapiens" for humans)
    enrichMethod = "GSEA",                    # Perform ORA, GSEA or NTA
    interestGene = ranked,                # Your gene list
    interestGeneType = "ensembl_gene_id",         # Specify the gene ID type
    enrichDatabase = database,  # The database for enrichment analysis
    isOutput = TRUE,                        # Set to FALSE if you don't want files saved to disk
    fdrMethod = "BH",                        # Correction method (e.g., Benjamini-Hochberg)
    sigMethod = "fdr",                       # Significance method ('fdr' or 'top')
    fdrThr = 0.05,                           # FDR significance threshold
    minNum = 10,                              # Minimum number of genes per category
    maxNum = 500,                             # Maximum number of genes per category
    boxplot = TRUE,
    outputDirectory = outputDirectory,
    projectName = project,
    nThreads = 7
)
})
## Loading the functional categories...
## Loading the ID list...
## Summarizing the uploaded ID list by GO Slim data...
## Performing the enrichment analysis...
## 1000 permutations of score complete...
## Begin affinity propagation...
## End affinity propagation...
## Begin weighted set cover...
## End weighted set cover...
## Generate the final report...
## Results can be found in the WebGestaltR_results/Project_GSEA_GO_MF_non_redundant!

Notice in the GSEA code chunks above, the R function supressWarnings has been applied. This is to prevent every term that fails the term size filters we set from being printed out!

Now that we have both results saved in R objects, we can compare the enriched terms.

How many significant terms from each DB?

nr_terms <- gomf_nr$description
r_terms <- gomf$description

cat("Number of significant 'nonredundant' GO MF terms:", length(nr_terms), "\n")
## Number of significant 'nonredundant' GO MF terms: 29
cat("Number of significant 'with redundant' GO MF terms:", length(r_terms), "\n")
## Number of significant 'with redundant' GO MF terms: 53

Clearly we have refined the results using the WebGestaltR reduced GO MF database.

Open the 2 HTML reports files for these analyses:

Note the differences to the ORA reports we have seen.

A quick view of the redundant vs non-redundant bar charts shows similarities and differences between the ‘noRedundant’ and ’ with redundant’ analyses.

Let’s compare shared terms:

# Create unique and shared description lists
unique_nr_terms <- setdiff(nr_terms, r_terms)
unique_r_terms <- setdiff(r_terms, nr_terms)
shared_terms <- intersect(nr_terms, r_terms)

Print shared terms from both DBs:

print(shared_terms)
##  [1] "extracellular matrix structural constituent"      
##  [2] "growth factor binding"                            
##  [3] "catalytic activity, acting on DNA"                
##  [4] "DNA secondary structure binding"                  
##  [5] "histone binding"                                  
##  [6] "cargo receptor activity"                          
##  [7] "helicase activity"                                
##  [8] "phosphatidylinositol bisphosphate kinase activity"
##  [9] "antigen binding"                                  
## [10] "phosphatidylinositol 3-kinase activity"           
## [11] "cytokine binding"                                 
## [12] "isoprenoid binding"                               
## [13] "damaged DNA binding"                              
## [14] "semaphorin receptor binding"                      
## [15] "structural constituent of nuclear pore"           
## [16] "extracellular matrix binding"                     
## [17] "proteoglycan binding"                             
## [18] "Ran GTPase binding"                               
## [19] "copper ion binding"                               
## [20] "monooxygenase activity"                           
## [21] "glycosaminoglycan binding"                        
## [22] "G protein-coupled amine receptor activity"

Print terms only in non-redundant:

print(unique_nr_terms)
## [1] "hormone binding"                               
## [2] "transmembrane receptor protein kinase activity"
## [3] "catalytic activity, acting on RNA"             
## [4] "single-stranded DNA binding"                   
## [5] "metal cluster binding"                         
## [6] "exopeptidase activity"                         
## [7] "collagen binding"

Print terms only in redundant:

print(unique_r_terms)
##  [1] "chemokine activity"                                                                                         
##  [2] "DNA helicase activity"                                                                                      
##  [3] "chemokine receptor binding"                                                                                 
##  [4] "single-stranded DNA-dependent ATPase activity"                                                              
##  [5] "DNA-dependent ATPase activity"                                                                              
##  [6] "ATP-dependent helicase activity"                                                                            
##  [7] "purine NTP-dependent helicase activity"                                                                     
##  [8] "ATP-dependent DNA helicase activity"                                                                        
##  [9] "CCR chemokine receptor binding"                                                                             
## [10] "serine-type endopeptidase inhibitor activity"                                                               
## [11] "phosphatidylinositol-4,5-bisphosphate 3-kinase activity"                                                    
## [12] "growth factor activity"                                                                                     
## [13] "extracellular matrix structural constituent conferring compression resistance"                              
## [14] "scavenger receptor activity"                                                                                
## [15] "exonuclease activity"                                                                                       
## [16] "DNA replication origin binding"                                                                             
## [17] "fibroblast growth factor receptor binding"                                                                  
## [18] "retinoid binding"                                                                                           
## [19] "transforming growth factor beta binding"                                                                    
## [20] "chemorepellent activity"                                                                                    
## [21] "DNA polymerase binding"                                                                                     
## [22] "four-way junction DNA binding"                                                                              
## [23] "monocarboxylic acid binding"                                                                                
## [24] "heparan sulfate proteoglycan binding"                                                                       
## [25] "exonuclease activity, active with either ribo- or deoxyribonucleic acids and producing 5'-phosphomonoesters"
## [26] "5'-3' exonuclease activity"                                                                                 
## [27] "3'-5' DNA helicase activity"                                                                                
## [28] "DNA polymerase activity"                                                                                    
## [29] "transforming growth factor beta receptor binding"                                                           
## [30] "DNA-directed DNA polymerase activity"                                                                       
## [31] "RNA helicase activity"

Scanning the list of terms only within the full GO MF (including redundant terms) we see many terms to do with DNA activity and binding.

Significant in the ‘non-redundant’ analysis, we can see just 2 DNA activity functions: “DNA secondary structure binding” (significant in both) and “single-stranded DNA binding” (unique to GO MF NR).

By grouping so many similar terms with the non-redundant analyses, the overall number of enrichments is lower and more targeted, providing a more concise overview of the biology from your results.

For your own research, you could explore the relationships between these terms by viewing the neighborhood of GO terms on AmiGO: https://amigo.geneontology.org/amigo, or using NaviGO https://kiharalab.org/navigo/views/goset.php (enter multiple GO IDs to see their relationships).

5. Save versions and session details

Database query dates

Unlike gprofiler, WebGestaltR does not have a function to list the version of the queried databases.

For this reason, we will save the analysis date to our rendered notebook, so the external database version could be back-calculated from the date if required:

cat("Date of analysis:\n")
## Date of analysis:
print(Sys.Date())
## [1] "2024-11-20"

R version and R package versions

sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
##  [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
##  [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] dplyr_1.1.4       readr_2.1.5       WebGestaltR_0.4.6
## 
## loaded via a namespace (and not attached):
##  [1] sass_0.4.9        utf8_1.2.4        generics_0.1.3    lattice_0.22-5   
##  [5] hms_1.1.3         digest_0.6.35     magrittr_2.0.3    evaluate_0.23    
##  [9] grid_4.4.2        iterators_1.0.14  fastmap_1.1.1     foreach_1.5.2    
## [13] doParallel_1.0.17 jsonlite_1.8.8    Matrix_1.7-1      whisker_0.4.1    
## [17] httr_1.4.7        apcluster_1.4.13  fansi_1.0.6       doRNG_1.8.6      
## [21] codetools_0.2-19  jquerylib_0.1.4   cli_3.6.2         crayon_1.5.2     
## [25] rlang_1.1.3       bit64_4.0.5       withr_3.0.0       cachem_1.0.8     
## [29] yaml_2.3.8        tools_4.4.2       parallel_4.4.2    tzdb_0.4.0       
## [33] rngtools_1.5.2    curl_5.2.1        vctrs_0.6.5       R6_2.5.1         
## [37] lifecycle_1.0.4   bit_4.0.5         vroom_1.6.5       pkgconfig_2.0.3  
## [41] pillar_1.9.0      bslib_0.7.0       glue_1.7.0        Rcpp_1.0.12      
## [45] systemfonts_1.0.6 xfun_0.43         tibble_3.2.1      tidyselect_1.2.1 
## [49] rstudioapi_0.16.0 knitr_1.46        htmltools_0.5.8.1 igraph_2.0.3     
## [53] rmarkdown_2.26    svglite_2.1.3     compiler_4.4.2

RStudio version

Typically, we would simply run RStudio.Version() to print the version details. However, when we knit this document to HTML, the RStudio.Version() function is not available and will cause an error. So to make sure our version details are saved to our static record of the work, we will save to a file, then print the file contents back into the notebook.

# Get RStudio version information
rstudio_info <- RStudio.Version()

# Convert the version information to a string
rstudio_version_str <- paste(
  "RStudio Version Information:\n",
  "Version: ", rstudio_info$version, "\n",
  "Release Name: ", rstudio_info$release_name, "\n",
  "Long Version: ", rstudio_info$long_version, "\n",
  "Mode: ", rstudio_info$mode, "\n",
  "Citation: ", rstudio_info$citation,
  sep = ""
)

# Write the output to a text file
writeLines(rstudio_version_str, "rstudio_version.txt")
# Read the saved version information from the file
rstudio_version_text <- readLines("rstudio_version.txt")

# Print the version information to the document
rstudio_version_text
## [1] "RStudio Version Information:"                                                                                                                                                                                                                                                                           
## [2] "Version: 2023.6.1.524"                                                                                                                                                                                                                                                                                  
## [3] "Release Name: Mountain Hydrangea"                                                                                                                                                                                                                                                                       
## [4] "Long Version: 2023.06.1+524"                                                                                                                                                                                                                                                                            
## [5] "Mode: server"                                                                                                                                                                                                                                                                                           
## [6] "Citation: list(title = \"RStudio: Integrated Development Environment for R\", author = list(list(given = \"Posit team\", family = NULL, role = NULL, email = NULL, comment = NULL)), organization = \"Posit Software, PBC\", address = \"Boston, MA\", year = \"2023\", url = \"http://www.posit.co/\")"

6. Knit workbook to HTML

The last task is to knit the notebook. Our notebook is editable, and can be changed. Deleting code deletes the output, so we could lose valuable details. If we knit the notebook to HTML, we have a permanent static copy of the work.

On the editor pane toolbar, under Preview, select Knit to HTML.

If you have already run Preview, you will see Knit instead of Preview.

The HTML file will be saved in the same directory as the notebook, and with the same filename, but the .Rmd prefix will be replaced by .html. The knit HTML will typically open automatically once complete. If you receive a popup blocker error, click cancel, and in the Files pane of RStudio, single click the gprofiler.html file and select View in Web Browser.

Note that the notebook will only successfully knit if there are no errors in the code. You can ‘preview’ HTML with code errors.