Ensure the ‘workshop’ directory is your current working directory:
getwd()
## [1] "/home/user3/workshop"
WebGestaltR outputs results to a folder containing
multiple files. Make a parent directory for the results of this
tool:
dir.create("WebGestaltR_results")
## Warning in dir.create("WebGestaltR_results"): 'WebGestaltR_results' already
## exists
WebGestaltR supports 12 species directly, however, you can import your own database files to perform ORA and GSEA for novel species :-) We will do that in the next session.
Let’s view the list of supported organisms. We don’t have to specify any arguments, but we do need the empty brackets. Without them will print the function source code.
listOrganism()
## [1] "athaliana" "btaurus" "celegans" "cfamiliaris"
## [5] "drerio" "sscrofa" "dmelanogaster" "ggallus"
## [9] "hsapiens" "mmusculus" "rnorvegicus" "scerevisiae"
The next two commands have a default setting of
organism = "hsapiens", so running without any argument will
show the genesets (databases) and ID types (namespaces) that are
supported for human.
View databases for human. Use the black arrow on the right of the table to view the other 2 columns, and use the numbers below the table (or ‘next’) to view the next 10 rows.
listGeneSet()
And the supported human namespaces:
listIdType()
## [1] "The_Genotype-Tissue_ExpressionProjectGTEx"
## [2] "affy_Axiom_BioBank1"
## [3] "affy_Axiom_PMRA"
## [4] "affy_Axiom_tx_v1"
## [5] "affy_GenomeWideSNP_5"
## [6] "affy_GenomeWideSNP_6"
## [7] "affy_Mapping10K_Xba142"
## [8] "affy_Mapping250K_Nsp"
## [9] "affy_Mapping250K_Sty"
## [10] "affy_Mapping50K_Hind240"
## [11] "affy_Mapping50K_Xba240"
## [12] "affy_OncoScan"
## [13] "affy_RosettaMerck_Human_RSTA"
## [14] "affy_hc_g110"
## [15] "affy_hg_focus"
## [16] "affy_hg_u133_plus_2"
## [17] "affy_hg_u133a"
## [18] "affy_hg_u133a_2"
## [19] "affy_hg_u133b"
## [20] "affy_hg_u95a"
## [21] "affy_hg_u95av2"
## [22] "affy_hg_u95b"
## [23] "affy_hg_u95c"
## [24] "affy_hg_u95d"
## [25] "affy_hg_u95e"
## [26] "affy_hta_2_0"
## [27] "affy_huex_1_0_st_v2"
## [28] "affy_hugene_1_0_st_v1"
## [29] "affy_hugene_2_0_st_v1"
## [30] "affy_hugenefl"
## [31] "affy_primeview"
## [32] "affy_u133_x3p"
## [33] "agilent_cgh_44b"
## [34] "agilent_custom_SAGE_Bionetworks_GPL564"
## [35] "agilent_gpl6848"
## [36] "agilent_human_custom_GPL564"
## [37] "agilent_sureprint_g3_ge_8x60k"
## [38] "agilent_sureprint_g3_ge_8x60k_v2"
## [39] "agilent_wholegenome"
## [40] "agilent_wholegenome_4x44k_v1"
## [41] "agilent_wholegenome_4x44k_v2"
## [42] "codelink_codelink"
## [43] "dbSNP"
## [44] "embl"
## [45] "ensembl_gene_id"
## [46] "ensembl_peptide_id"
## [47] "entrezgene"
## [48] "entrezgene_protein-coding"
## [49] "genename"
## [50] "genesymbol"
## [51] "illumina_Infinium_HumanMethylation_beadchip"
## [52] "illumina_Sentrix_HumanRef-8v2_GPL2700"
## [53] "illumina_human-6v3"
## [54] "illumina_humanRef-8v2"
## [55] "illumina_human_methylation_27"
## [56] "illumina_human_methylation_450"
## [57] "illumina_humanht_12_v3"
## [58] "illumina_humanht_12_v4"
## [59] "illumina_humanref_8_v3"
## [60] "illumina_humanwg_6_v1"
## [61] "illumina_humanwg_6_v2"
## [62] "illumina_humanwg_6_v3"
## [63] "interpro"
## [64] "phalanx_onearray"
## [65] "phosphositeEnsembl"
## [66] "phosphositeRefseq"
## [67] "phosphositeSeq"
## [68] "phosphositeUniprot"
## [69] "protein_id"
## [70] "refseq_mrna"
## [71] "refseq_peptide"
## [72] "unigene"
## [73] "uniprotswissprot"
Pick your favourite species from the list of 12, using the same
spelling as shown in the listOrganism() function, and
investigate which namespaces and databases are available:
fave <- "cfamiliaris"
listIdType( organism = fave)
## [1] "affy_CanGene-1_0-st-v1" "affy_CanGene-1_1-st-v1"
## [3] "affy_Canine" "affy_canine_2"
## [5] "dbSNP" "embl"
## [7] "ensembl_gene_id" "ensembl_peptide_id"
## [9] "entrezgene" "entrezgene_protein-coding"
## [11] "genename" "genesymbol"
## [13] "interpro" "protein_id"
## [15] "refseq_mrna" "refseq_peptide"
## [17] "unigene" "uniprotswissprot"
listGeneSet(organism = "cfamiliaris")
We will use the same Pezzini RNAseq dataset as earlier. Since we have
previously saved our ranked list, DEGs and background genes to the
workshop folder, we could import those. However, clarity of
how the gene list inputs were made is retained within the notebook, and
this enhances reproducibility. Gene lists are quick and simple to
extract from the input data. If the process was slow and
compute-intensive, we would instead document the source and methods
behind the gene lists in the notebook comments instead of re-creating
them.
Load and check the input dataset:
# Full dataset
data <- read_tsv("Pezzini_DE.txt", col_names = TRUE, show_col_types = FALSE)
head(data)
Before we extract the gene lists, we need to understand what class of object is required by the enrichment function. For this package, the single enrichment function shares the package name.
Bring up the help menu for the WebGestaltR function and
spend a few minutes reviewing the parameters.
?WebGestaltR
There are quite a few! For many of them (eg gene set size filters, multiple testing correction method, P value cutoff) the default settings are suitable.
In particular, look for the parameters that control:
whether ORA, GSEA or NTA is performed
which database/s to run enrichment on
what is the namespace/gee ID type for the gene list query
how to specify the input gene list/s
Hopefully you’ve discovered that the WebGestaltR
function can intake EITHER gene lists from files (as long as the right
column format and file suffix is provided) or R objects.
Since we have decided to extract the gene lists from the DE matrix to
R objects, we need to provide the gene list object to
interestGene parameter (and referenceGene for
ORA background).
For ORA, the gene lists need to be vectors, and for GSEA, a 2-column
dataframe (unlike clusterProfiler, which requires a GSEA
vector).
Our input matrix contains ENSEMBL IDs as well as official gene
symbols, so we could use “ensembl_gene_id” or “genesymbol” for the
parameter interestGeneType. Let’s extract the ENSEMBL IDs
since they are more specific than symbol.
# Filter genes with adjusted p-value < 0.01 and absolute log2 fold change > 2 and saved as 'DEGs' vector
DEGs <- data %>%
filter(FDR < 0.01, abs(Log2FC) > 2) %>%
pull(Gene.ID)
# Extract all gene IDs as the 'background' vector
background <- data %>%
pull(Gene.ID)
# Check:
cat("Number of DEGS:", length(DEGs), "\n")
## Number of DEGS: 792
cat("Number of background genes:", length(background), "\n")
## Number of background genes: 14420
cat("First 6 DEGs:", head(DEGs), "\n")
## First 6 DEGs: ENSG00000000971 ENSG00000001617 ENSG00000002586 ENSG00000002746 ENSG00000003137 ENSG00000005243
cat("Fist 6 background genes:", head(background), "\n")
## Fist 6 background genes: ENSG00000000003 ENSG00000000419 ENSG00000000457 ENSG00000000460 ENSG00000000971 ENSG00000001036
# extract ranked dataframe, saved as 'ranked' object
ranked <- data %>%
arrange(desc(Log2FC)) %>%
dplyr::select(Gene.ID, Log2FC)
# check
head(ranked)
tail(ranked)
WebGestaltR makes it simple to enrich over many
databases at once in one run command. To do this, we just need to
provide the arguments to the enrichDatabase parameter as a
list of database names instead of a single database name.
For this task, let’s focus on the pathway gene sets. From skimming
the output of listGeneSet() there were a few. We could
manually locate these and copy them in to our list, or take advantage of
the fact that the WebGestaltR developers have been
systematic in the gene set naming, ensuring all database names are
prefixed with their type, ie geneontology_,
pathway_, network_, plus a few others.
# Save the databases for human
databases <- listGeneSet()
# Extract the the pathways from the 'name' column that start with 'pathway'
pathway_dbs <- subset(databases, grepl("^pathway", name))
# Save the pathway 'names' column to a list
pathway_names <- pathway_dbs$name
# Check the list
print(pathway_names)
## [1] "pathway_KEGG" "pathway_Panther"
## [3] "pathway_Reactome" "pathway_Wikipathway"
## [5] "pathway_Wikipathway_cancer"
This gives us the same result as
pathway_names <- c("pathway_Reactome", "pathway_KEGG", "pathway_Panther", "pathway_Wikipathway", ""pathway_Wikipathway_cancer")
but with less manual effort, and less room for typographic errors
:-)
Since these VMs have 8 cores, let’s set nThreads to 6.
For ORA, this will make very little difference, as the analysis is very
fast, yet for GSEA, it can speed up the analysis a lot.
In testing, running GSEA over these 5 pathway databases with our
ranked query and the default of 1,000 GSEA permutations required 24.1
minutes without threading, and 6.8 minutes with
nThreads = 6.
There is also a function WebGestaltRBatch and this is
for processing multiple input query lists. With this function, you can
also set isParallel = TRUE along with
nThreads = N to run your batch of query lists with multiple
threads and in parallel rather than sequentially for much faster run
times, assuming you had the compute resources to do so.
Since we don’t want to wait 7 minutes for a result, let’s run this multi-database query with ORA instead of GSEA.
# Specify output directory (must exist)
outdir <- "WebGestaltR_results"
# Specify project name
project = "ORA_pathways"
WebGestaltR(
organism = "hsapiens", # Species
enrichMethod = "ORA", # Perform ORA, GSEA or NTA
interestGene = DEGs, # Query gene list
interestGeneType = "ensembl_gene_id", # Gene ID type for gene list
referenceGene = background, # Background gene list
referenceGeneType = "ensembl_gene_id", # Gene ID type for background
enrichDatabase = pathway_names, # Database name or list of databases to enrich over
isOutput = TRUE, # yes save report files saved to disk
fdrMethod = "BH", # Multiple testing correction method (BH = Benjamini-Hochberg)
sigMethod = "fdr", # Significance method ('fdr' = false discover rate)
fdrThr = 0.05, # FDR significance threshold
minNum = 10, # Minimum number of genes in a gene set to include
maxNum = 500, # Maximum number of genes in a gene set to include
outputDirectory = outdir,
projectName = project,
nThreads = 6 # use 6 threads for faster run
)
## Loading the functional categories...
## Warning in loadGeneSet(organism = organism, enrichDatabase = enrichDatabase, :
## Duplicate gene set names in pathway_Wikipathway_cancer have been ignored.
## Loading the ID list...
## Loading the reference list...
## Warning in dir.create(projectDir): 'WebGestaltR_results/Project_ORA_pathways'
## already exists
## Summarizing the input ID list by GO Slim data...
## Performing the enrichment analysis...
## Begin affinity propagation...
## End affinity propagation...
## Begin weighted set cover...
## End weighted set cover...
## Generate the final report...
## Results can be found in the WebGestaltR_results/Project_ORA_pathways!
The results are saved within a new folder inside our new folder
WebGestaltR_results/Project_ORA_pathways. There are a
number of results files, the one we will focus on is the interactive
HTML summary file.
STOP: to save time for GSEA compute, skip ahead, run the code chunk
labelled GSEA GO MF with redundant (it takes several
minutes) then return here where we will explore the ORA HTML while the
GSEA runs!!!
In the Files pane, open the folder
WebGestaltR_results/Project_ORA_pathways then click on the
Report_ORA_pathways.html file. Select
View in Web Browser.
Some things to note:
The reports contain a GO Slim summary, which provides a high level summary of the enriched terms by grouping them into broader categories or “slims”
You can change the ‘Enrichment Results’ view from table to bar chart or volcano plot
You can increase the default view of 20 rows to ‘All’ for the
enrichment table, but this does not necessarily show all significant
enrichments! Check the output file
enrichment_results_ORA_pathways.txt and you can see 85
significant terms, yet ‘All’ view with default of 20 rows shows 30
something. To increase the number of rows included in the HTML report,
use the parameter reportNum
You can run algorithms to reduce the number of terms through
clustering, in order to make the results more manageable. This is
discussed in the WebGestalt 2019 update publication Liao et
al 2019. The authors maintain that “important biological themes are
all covered with these selected gene sets”. Built-in redundancy
handling/term clustering is a feature of WebGestaltR (and
the web version). To what extent this is appropriate for the database
you are using is up to you to determine. For example, in the next
analysis we will perform GSEA over the noRedundant GO MF
database. Applying a double layer of redundancy filters over a database
seems quite dubious to me..
Selecting a term from the ‘Enrichment Results’ table updates the term under ‘Select an enriched analyte set’, where more detailed results are shown, including the genes from your gene list present within the gene set for that term
At ‘Analyte set:
At the top right of the report, there is a ‘Result Download’ link, making it easy to share all results files with collaborators via just one shared file
It’s well known that the GO hierarchy, by definition, includes redundancy. When performing enrichment, higher-order terms in the hierarchy are often significant yet not particularly informative.
Tools such as topGO
and REVIGO are dedicated to removing
redundancy from the Gene Ontology. Add to that list
WebGestaltR and (WebGestalt web of course!)
This tool runs its own redundancy filter over the GO databases to produce refined database versions:
We can read about their approach in the ‘description’ column of the database:
databases$description[databases$name == "geneontology_Molecular_Function_noRedundant"]
## [1] "The gene ontology molecular function database was downloaded from http://www.geneontology.org/. Then, we only contain the non-redundant categories by selecting the most general categories in each branch of the GO DAG structure from all categories with the number of annotated genes from 20 to 500."
Let’s run enrichment over the full and the non-redundant version of the GO MF databases, and compare the results. We expect to see fewer and more specific terms in the “noRedundant” results than the full GO MF results.
Let’s use GSEA since we have already tried ORA with this package. GSEA is slower and GO is large, so even with 7 threads these commands will take a few minutes (longer for redundant than non-redundant, of course). Feel free to use the compute time to ask questions on slack or explore the ORA pathways output some more!
There is no seed parameter for WebGestaltR
GSEA as there is for clusterProfiler. We can set it in R
instead with set.seed().
set.seed(123)
This is an advantage of using WebGestaltR over the web
counterpart :-)
However, a note from testing: without setting the seed in R, a slightly different number of enriched terms for the GSEA below were returned over 3 replicate runs. With setting the seed, the same number and IDs of terms were significant among the replicate runs, BUT the NES and FDR were slightly different! The unadjusted ES and P values were the same.
outputDirectory <- "WebGestaltR_results"
project <- "GSEA_GO-MF_with-redundant"
database <- "geneontology_Molecular_Function"
suppressWarnings({ gomf <- WebGestaltR(
organism = "hsapiens", # Use your species (e.g., "hsapiens" for humans)
enrichMethod = "GSEA", # Perform ORA, GSEA or NTA
interestGene = ranked, # Your gene list
interestGeneType = "ensembl_gene_id", # Specify the gene ID type
enrichDatabase = database, # The database for enrichment analysis
isOutput = TRUE, # Set to FALSE if you don't want files saved to disk
fdrMethod = "BH", # Correction method (e.g., Benjamini-Hochberg)
sigMethod = "fdr", # Significance method ('fdr' or 'top')
fdrThr = 0.05, # FDR significance threshold
minNum = 10, # Minimum number of genes per category
maxNum = 500, # Maximum number of genes per category
boxplot = TRUE,
outputDirectory = outputDirectory,
projectName = project,
nThreads = 7
)
})
## Loading the functional categories...
## Loading the ID list...
## Summarizing the uploaded ID list by GO Slim data...
## Performing the enrichment analysis...
## 1000 permutations of score complete...
## Begin affinity propagation...
## End affinity propagation...
## Begin weighted set cover...
## End weighted set cover...
## Generate the final report...
## Results can be found in the WebGestaltR_results/Project_GSEA_GO_MF_with_redundant!
outputDirectory <- "WebGestaltR_results"
project <- "GSEA_GO-MF_non-redundant"
database <- "geneontology_Molecular_Function_noRedundant"
suppressWarnings({ gomf_nr <- WebGestaltR(
organism = "hsapiens", # Use your species (e.g., "hsapiens" for humans)
enrichMethod = "GSEA", # Perform ORA, GSEA or NTA
interestGene = ranked, # Your gene list
interestGeneType = "ensembl_gene_id", # Specify the gene ID type
enrichDatabase = database, # The database for enrichment analysis
isOutput = TRUE, # Set to FALSE if you don't want files saved to disk
fdrMethod = "BH", # Correction method (e.g., Benjamini-Hochberg)
sigMethod = "fdr", # Significance method ('fdr' or 'top')
fdrThr = 0.05, # FDR significance threshold
minNum = 10, # Minimum number of genes per category
maxNum = 500, # Maximum number of genes per category
boxplot = TRUE,
outputDirectory = outputDirectory,
projectName = project,
nThreads = 7
)
})
## Loading the functional categories...
## Loading the ID list...
## Summarizing the uploaded ID list by GO Slim data...
## Performing the enrichment analysis...
## 1000 permutations of score complete...
## Begin affinity propagation...
## End affinity propagation...
## Begin weighted set cover...
## End weighted set cover...
## Generate the final report...
## Results can be found in the WebGestaltR_results/Project_GSEA_GO_MF_non_redundant!
Notice in the GSEA code chunks above, the R function
supressWarnings has been applied. This is to prevent every
term that fails the term size filters we set from being printed out!
Now that we have both results saved in R objects, we can compare the enriched terms.
How many significant terms from each DB?
nr_terms <- gomf_nr$description
r_terms <- gomf$description
cat("Number of significant 'nonredundant' GO MF terms:", length(nr_terms), "\n")
## Number of significant 'nonredundant' GO MF terms: 29
cat("Number of significant 'with redundant' GO MF terms:", length(r_terms), "\n")
## Number of significant 'with redundant' GO MF terms: 53
Clearly we have refined the results using the WebGestaltR reduced GO MF database.
Open the 2 HTML reports files for these analyses:
WebGestaltR_results/GSEA_GO-MF_non-redundant/Report_GSEA_GO_MF_non_redundant.htmlWebGestaltR_results/GSEA_GO-MF_with-redundant/Report_GSEA_GO_MF_with_redundant.htmlNote the differences to the ORA reports we have seen.
The bar chart and volcano plots show positive and negative NES, indicating whether the leading edge genes were from the top (upregulated) or bottom (downregulated) end of the list
Each enriched set has a GSEA plot, and these are also saved
locally as image files within the project folder under a new folder
ending with _GSEA eg
Project_GSEA_GO_MF_non_redundant/Project_GSEA_GO_MF_non_redundant_GSEA
A quick view of the redundant vs non-redundant bar charts shows similarities and differences between the ‘noRedundant’ and ’ with redundant’ analyses.
Let’s compare shared terms:
# Create unique and shared description lists
unique_nr_terms <- setdiff(nr_terms, r_terms)
unique_r_terms <- setdiff(r_terms, nr_terms)
shared_terms <- intersect(nr_terms, r_terms)
Print shared terms from both DBs:
print(shared_terms)
## [1] "extracellular matrix structural constituent"
## [2] "growth factor binding"
## [3] "catalytic activity, acting on DNA"
## [4] "DNA secondary structure binding"
## [5] "histone binding"
## [6] "cargo receptor activity"
## [7] "helicase activity"
## [8] "phosphatidylinositol bisphosphate kinase activity"
## [9] "antigen binding"
## [10] "phosphatidylinositol 3-kinase activity"
## [11] "cytokine binding"
## [12] "isoprenoid binding"
## [13] "damaged DNA binding"
## [14] "semaphorin receptor binding"
## [15] "structural constituent of nuclear pore"
## [16] "extracellular matrix binding"
## [17] "proteoglycan binding"
## [18] "Ran GTPase binding"
## [19] "copper ion binding"
## [20] "monooxygenase activity"
## [21] "glycosaminoglycan binding"
## [22] "G protein-coupled amine receptor activity"
Print terms only in non-redundant:
print(unique_nr_terms)
## [1] "hormone binding"
## [2] "transmembrane receptor protein kinase activity"
## [3] "catalytic activity, acting on RNA"
## [4] "single-stranded DNA binding"
## [5] "metal cluster binding"
## [6] "exopeptidase activity"
## [7] "collagen binding"
Print terms only in redundant:
print(unique_r_terms)
## [1] "chemokine activity"
## [2] "DNA helicase activity"
## [3] "chemokine receptor binding"
## [4] "single-stranded DNA-dependent ATPase activity"
## [5] "DNA-dependent ATPase activity"
## [6] "ATP-dependent helicase activity"
## [7] "purine NTP-dependent helicase activity"
## [8] "ATP-dependent DNA helicase activity"
## [9] "CCR chemokine receptor binding"
## [10] "serine-type endopeptidase inhibitor activity"
## [11] "phosphatidylinositol-4,5-bisphosphate 3-kinase activity"
## [12] "growth factor activity"
## [13] "extracellular matrix structural constituent conferring compression resistance"
## [14] "scavenger receptor activity"
## [15] "exonuclease activity"
## [16] "DNA replication origin binding"
## [17] "fibroblast growth factor receptor binding"
## [18] "retinoid binding"
## [19] "transforming growth factor beta binding"
## [20] "chemorepellent activity"
## [21] "DNA polymerase binding"
## [22] "four-way junction DNA binding"
## [23] "monocarboxylic acid binding"
## [24] "heparan sulfate proteoglycan binding"
## [25] "exonuclease activity, active with either ribo- or deoxyribonucleic acids and producing 5'-phosphomonoesters"
## [26] "5'-3' exonuclease activity"
## [27] "3'-5' DNA helicase activity"
## [28] "DNA polymerase activity"
## [29] "transforming growth factor beta receptor binding"
## [30] "DNA-directed DNA polymerase activity"
## [31] "RNA helicase activity"
Scanning the list of terms only within the full GO MF (including redundant terms) we see many terms to do with DNA activity and binding.
Significant in the ‘non-redundant’ analysis, we can see just 2 DNA activity functions: “DNA secondary structure binding” (significant in both) and “single-stranded DNA binding” (unique to GO MF NR).
By grouping so many similar terms with the non-redundant analyses, the overall number of enrichments is lower and more targeted, providing a more concise overview of the biology from your results.
For your own research, you could explore the relationships between these terms by viewing the neighborhood of GO terms on AmiGO: https://amigo.geneontology.org/amigo, or using NaviGO https://kiharalab.org/navigo/views/goset.php (enter multiple GO IDs to see their relationships).
Unlike gprofiler, WebGestaltR does not have
a function to list the version of the queried databases.
For this reason, we will save the analysis date to our rendered notebook, so the external database version could be back-calculated from the date if required:
cat("Date of analysis:\n")
## Date of analysis:
print(Sys.Date())
## [1] "2024-11-20"
sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
## [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
## [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] dplyr_1.1.4 readr_2.1.5 WebGestaltR_0.4.6
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.9 utf8_1.2.4 generics_0.1.3 lattice_0.22-5
## [5] hms_1.1.3 digest_0.6.35 magrittr_2.0.3 evaluate_0.23
## [9] grid_4.4.2 iterators_1.0.14 fastmap_1.1.1 foreach_1.5.2
## [13] doParallel_1.0.17 jsonlite_1.8.8 Matrix_1.7-1 whisker_0.4.1
## [17] httr_1.4.7 apcluster_1.4.13 fansi_1.0.6 doRNG_1.8.6
## [21] codetools_0.2-19 jquerylib_0.1.4 cli_3.6.2 crayon_1.5.2
## [25] rlang_1.1.3 bit64_4.0.5 withr_3.0.0 cachem_1.0.8
## [29] yaml_2.3.8 tools_4.4.2 parallel_4.4.2 tzdb_0.4.0
## [33] rngtools_1.5.2 curl_5.2.1 vctrs_0.6.5 R6_2.5.1
## [37] lifecycle_1.0.4 bit_4.0.5 vroom_1.6.5 pkgconfig_2.0.3
## [41] pillar_1.9.0 bslib_0.7.0 glue_1.7.0 Rcpp_1.0.12
## [45] systemfonts_1.0.6 xfun_0.43 tibble_3.2.1 tidyselect_1.2.1
## [49] rstudioapi_0.16.0 knitr_1.46 htmltools_0.5.8.1 igraph_2.0.3
## [53] rmarkdown_2.26 svglite_2.1.3 compiler_4.4.2
Typically, we would simply run RStudio.Version() to
print the version details. However, when we knit this document to HTML,
the RStudio.Version() function is not available and will
cause an error. So to make sure our version details are saved to our
static record of the work, we will save to a file, then print the file
contents back into the notebook.
# Get RStudio version information
rstudio_info <- RStudio.Version()
# Convert the version information to a string
rstudio_version_str <- paste(
"RStudio Version Information:\n",
"Version: ", rstudio_info$version, "\n",
"Release Name: ", rstudio_info$release_name, "\n",
"Long Version: ", rstudio_info$long_version, "\n",
"Mode: ", rstudio_info$mode, "\n",
"Citation: ", rstudio_info$citation,
sep = ""
)
# Write the output to a text file
writeLines(rstudio_version_str, "rstudio_version.txt")
# Read the saved version information from the file
rstudio_version_text <- readLines("rstudio_version.txt")
# Print the version information to the document
rstudio_version_text
## [1] "RStudio Version Information:"
## [2] "Version: 2023.6.1.524"
## [3] "Release Name: Mountain Hydrangea"
## [4] "Long Version: 2023.06.1+524"
## [5] "Mode: server"
## [6] "Citation: list(title = \"RStudio: Integrated Development Environment for R\", author = list(list(given = \"Posit team\", family = NULL, role = NULL, email = NULL, comment = NULL)), organization = \"Posit Software, PBC\", address = \"Boston, MA\", year = \"2023\", url = \"http://www.posit.co/\")"
The last task is to knit the notebook. Our notebook is editable, and can be changed. Deleting code deletes the output, so we could lose valuable details. If we knit the notebook to HTML, we have a permanent static copy of the work.
On the editor pane toolbar, under Preview, select Knit to HTML.
If you have already run Preview, you will see Knit instead of Preview.
The HTML file will be saved in the same directory as the notebook, and with the same filename, but the .Rmd prefix will be replaced by .html. The knit HTML will typically open automatically once complete. If you receive a popup blocker error, click cancel, and in the Files pane of RStudio, single click the gprofiler.html file and select View in Web Browser.
Note that the notebook will only successfully knit if there are no errors in the code. You can ‘preview’ HTML with code errors.