0. Working directory

Ensure the ‘workshop’ directory is your current working directory:

getwd()
## [1] "/home/user3/workshop"

1. Load input data

Raw data from Pezzini et al 2017 was subjected to differential gene expression analysis with Degust and the results file saved to Pezzini_DE.txt.

The input data file is within the current working directory so we do not need to specify its directory path.

# save data file to an R object called 'data' 
data <- read_tsv("Pezzini_DE.txt", col_names = TRUE, show_col_types = FALSE)

# view the first few lines
head(data)

The dataframe shows genes with fold change and FDR values, along with some normalised counts values for the 6 samples (2 groups with 3 replicates each).

Look on the environment pane of RStudio, and you can see a description ‘14420 obs. of 10 variables’ - this shows your dataframe consists of 10 columns and 14,420 genes.

2. Get ORA gene list

Now we need to filter for differentially expressed genes (DEGs), and we will apply the thresholds adjusted P values/FDR < 0.01, and log2fold change of 2.

We will use ENSEMBL gene IDs (column 1).

# Filter DEGs and save to an object named 'degs'
degs <- data %>%
  filter(FDR <= 0.01 & abs(Log2FC) >= 2) %>%
  pull(Gene.ID)
cat("Number of genes passing FDR and fold change filter:", length(degs), "\n")
## Number of genes passing FDR and fold change filter: 792
# Save the DEG gene list to disk: 
title <- "Pezzini_DEGs.txt"
write.table(degs, "Pezzini_DEGs.txt", row.names = FALSE, col.names = FALSE, quote = FALSE)

cat("Table saved to", title, "\n")
## Table saved to Pezzini_DEGs.txt

We have 792 genes passing our filters.

3. Get background gene list

Recall from the webinar and day 1 of the workshop that an experimental background gene list is crucial to avoiding false positives and minimising tissue bias with ORA.

The analysis in Degust has already removed lowly expressed genes, so we can simply extract all genes from this data matrix as our background gene list and save it as our ‘background’ object, as well as save to disk so that we can include it within the supplementary materials of any resultant publications for reproducibility.

# select the column labelled 'Gene.ID' from the 'data' dataframe, save to object named 'background'
background <- data$Gene.ID

cat("Number of background genes:", length(background), "\n")
## Number of background genes: 14420
# Save the background gene list to disk:
title <- "Pezzini_background_genes.txt"
write.table(background, title, row.names = FALSE, col.names = FALSE, quote = FALSE)

cat("Table saved to", title, "\n")
## Table saved to Pezzini_background_genes.txt

4. Run ORA with gost function

Before running the below code chunk, review the parameters for the gost ORA function. We can do this easily by bringing up the help page for the function in the Help pane of RStudio. This is the same information that is displayed in the gprofiler2 user guide.

?gprofiler2::gost

Observe the similarities to the parameters available on the g:Profiler web interface, for example organism, the correction method (g:Profiler’s custom g_scs method), and domain scope (background genes).

Run the below code which explicitly includes all available gost parameters. Including all parameters, even if the defaults suit your needs, makes your parameter choices explicit. Sometimes, default settings can change between versions!

An error-free gost run should produce no console output. As the code is running, there wll be a green bar to the elft of the code chunk.

Our results are saved in the R object ora.

ora <- gost( 
    degs, # 'degs' gene list object 
    organism = "hsapiens", # human data
    ordered_query = FALSE, 
    multi_query = FALSE, 
    significant = TRUE, # only print significant terms
    exclude_iea = FALSE, # exclude GO electronic annotations
    measure_underrepresentation = FALSE, 
    evcodes = FALSE,    # don't include evidence codes in the results - good to have, but will make it run slower
    user_threshold = 0.05, # adj P value cutoff for terms 
    correction_method = "g_SCS",  # gprofiler's custom multiple testing correctionmethod (recommended)
    domain_scope = "custom_annotated", # custom background, restrict to only annotated genes 
    custom_bg = background, # 'background' gene list object
    numeric_ns = "", # we don't have numeric IDs 
    sources = NULL, # use all databases
    as_short_link = FALSE, # save our results here not as a weblink to gprofiler
    highlight = TRUE # highlight driver terms (will add a 'highlighted' column with TRUE/FALSE)
)

View the top-most significant enrichments with the R head command. Only significant enrichments passing your specified threshold (adjusted P value < 0.05) are included in the results object because we have included significant = TRUE.

Use the black arrow on the right of the table to scroll to other columns.

head(ora$result)

Let’s give our query a name:

# reassign query name to something more specific 
ora$result$query <- 'DEGs_Padj0.05_FC2'
head(ora$result)

We can obtain a list of queried databases:

unique(ora$result$source)
##  [1] "GO:BP" "GO:CC" "GO:MF" "HP"    "HPA"   "KEGG"  "MIRNA" "REAC"  "TF"   
## [10] "WP"

Same as the web tool, we have enrichment results for GP (BP, CC, MF), HP (human phenotype), HPA (human protein atlas), KEGG, MiRNA, Reactome, Transcription Factors, and WikiPathways.

5. Save the results to a file

Let’s save the results file to disk. This is handy when you want to export results elsewhere for further analysis, for example Excel.

First, we will re-order the columns so the output more closely matches the tables that are downloaded from the web version of the tool.

# reorder the table columns 
ora_reordered <- ora$result[, c("source", "term_name", "term_id", "p_value", "term_size", "query_size", "intersection_size", "effective_domain_size")]

# check the first few lines of the output
head(ora_reordered)
# print to CSV
title <- "gprofiler_ORA_results.csv"
write.csv(ora_reordered, title, row.names = FALSE)

cat("Table saved to", title, "\n")
## Table saved to gprofiler_ORA_results.csv

gprofiler has a function to print tables that mimic the web tool called publish_gosttable. These are image files, so not for importing to Excel like the CSV we just created.

Let’s extract the results for Reactome and save to a gosttable.

# Filter results for the 'Reactome' database
reactome_results <- ora$result %>% filter(source == "REAC")

# Extract all term_ids for Reactome
reac <- reactome_results$term_id

# Create the GOST table for Reactome terms
filename <- "gprofiler_Reactome_gosttable.pdf"

publish_gosttable(ora,
  highlight_terms = reac,
  use_colors = TRUE,
  show_columns = c("source", "term_name", "term_size", "intersection_size"),
  filename)
## The image is saved to gprofiler_Reactome_gosttable.pdf

6. Visualise the results

Manhattan plots

The gostplot function creates a Manhattan plot similar to the one shown on the web tool. By applying the parameter interactive=TRUE we can hover over the data points to see enriched term details.

The parameter capped = TRUE is an indicator whether the -log10(p-values) would be capped at 16 if bigger than 16. This fixes the scale of y-axis to keep Manhattan plots from different queries comparable and is also intuitive since p-values smaller than that can all be summarised as ‘highly significant’.

gostplot(ora, 
  capped = TRUE, 
  interactive = TRUE, 
  pal = c(`GO:MF` = "#dc3912", 
    `GO:BP` = "#ff9900", 
    `GO:CC` = "#109618", 
    KEGG = "#dd4477", 
    REAC = "#3366cc", 
    WP = "#0099c6", 
    TF = "#5574a6", 
    MIRNA = "#22aa99", 
    HPA = "#6633cc", 
    CORUM = "#66aa00", 
    HP = "#990099") 
)

There are a lot of significant enrichments for GO biological processes. Many of these are probably terms containing a large number of genes, so not particularly informative. Other R tools have default settings limiting the minimum and maximum number of genes in a geneset to be included in the analysis. Since there is no direct parameter to restrict term size to gostplot, we can filter the ORA results before plotting. Let’s apply a maximum gene set size of 500, and a minimum gene set size of 10, which are the default setting used by clusterProfiler.

# Filter the results for GO:BP terms with term_size <= 500 and >= 10
# save the filtered results in a new object called 'ora_filter_termsize'
ora_filter_termsize <- ora
ora_filter_termsize$result <- ora$result %>% filter(term_size <= 500) %>% filter(term_size >= 10)
# Plot with gostplot using the filtered results
gostplot(ora_filter_termsize, 
  capped = TRUE, 
  interactive = TRUE, 
  pal = c(
    `GO:MF` = "#dc3912", 
    `GO:BP` = "#ff9900", 
    `GO:CC` = "#109618", 
    KEGG = "#dd4477", 
    REAC = "#3366cc", 
    WP = "#0099c6", 
    TF = "#5574a6", 
    MIRNA = "#22aa99", 
    HPA = "#6633cc", 
    CORUM = "#66aa00", 
    HP = "#990099"
  )
)

This has cleaned up ‘Biological Process’ a little bit, enabling signals of more specific terms to be highlighted.

gprofiler2 includes a function for creating a publication-ready image that can optionally highlight specific terms. We need to first produce a plot with interactice = FALSE, save it to an object, and then provide that plot object to the publish_gostplot function.

# Plot with gostplot using the filtered results, save to object called 'plot'
plot <- gostplot(ora_filter_termsize, 
  capped = TRUE, 
  interactive = FALSE, 
  pal = c(
    `GO:MF` = "#dc3912", 
    `GO:BP` = "#ff9900", 
    `GO:CC` = "#109618", 
    KEGG = "#dd4477", 
    REAC = "#3366cc", 
    WP = "#0099c6", 
    TF = "#5574a6", 
    MIRNA = "#22aa99", 
    HPA = "#6633cc", 
    CORUM = "#66aa00", 
    HP = "#990099"
  )
)

The publish_gostplot parameter highlight_terms enables you to highlight specific terms on the plot, with a table showing enrichment details below for those highlighted terms.

Let’s highlight some selected terms manually. You need to provide the term ID not term name.

#specify term IDs for tmers of interest: 'Collagen degradation' and 'Collagen formation'
highlight <- c("REAC:R-HSA-1442490", "REAC:R-HSA-1474290")
filename <- "gprofiler_collagen_gostplot.pdf"

publish_gostplot(plot, 
  highlight_terms = highlight, 
  filename, 
  width = 10, 
  height = 10 )
## The image is saved to gprofiler_collagen_gostplot.pdf

Like g:Profiler web, the coloured boxes on the table are by adjusted P value, with darker colours indicating more significant results. Colours range from yellow through green to dark blue.

You can use R grepl function to search for terms with names matching some keyword. Let’s highlight all terms related to receptors. The code chunk applies an increased figure height, to ensure we can see the whole plot within the notebook.

# extract from ora results all terms containing "receptor" keyword and create a list of those term IDs
highlight <- ora$result %>% 
  filter(grepl("receptor", term_name, ignore.case = TRUE)) %>% 
  pull(term_id) 
filename <- "gprofiler_receptors_gostplot.pdf"

publish_gostplot(plot, 
  highlight_terms = highlight, 
  filename, 
  width = 10, 
  height = 10 )
## The image is saved to gprofiler_receptors_gostplot.pdf

Dotplots

One of the advantages of working in R is flexibility with visualisations. While the interactive Manhattan plots and publish_gostplot options are nice, it can also be useful to visualise P values against all term descriptions.

One way to do this is with a dotplot. We can loop through all databases and use the R package ggplot2 to make a dotplot for each database with significantly enriched terms for our gene list.

# List of databases
dbs <- unique(ora$result$source)

# Loop over databse list, and print a plot if there are significant enrichments, or else print a message
for (db in dbs) {
  # Extract results for this database, and filter by term size
  db_results <- ora$result %>% filter(source == db, term_size >= 10, term_size <= 500)
  
  # Check if there are any terms left after filtering
  if (nrow(db_results) > 0) {
    # Create the dot plot for this database
    p <- ggplot(db_results, aes(x = reorder(term_name, -p_value), y = -log10(p_value))) +
      geom_point(aes(size = term_size, color = significant)) +
      labs(title = paste(db), 
           x = "Term", 
           y = "-log10(p-value)") +
      theme_minimal() +
      coord_flip()  # Flips the coordinates for better visibility
    
    # Print the plot
    print(p)
  } else {
    # Print a message if there are no significant enrichments
    message("No significant enrichments for database: ", db)
  }
}

## No significant enrichments for database: HP

## No significant enrichments for database: MIRNA

## No significant enrichments for database: TF

For plots with a lot of enriched terms, such as GO Biological Process, the display within the notebook is less than ideal. Saving the plot to an image file enables better resolution:

# Filter for the GO:BP database
go_bp_results <- ora$result %>% filter(source == "GO:BP",term_size >= 10, term_size <= 500)

# Create the plot for GO:BP
p_go_bp <- ggplot(go_bp_results, aes(x = reorder(term_name, -p_value), y = -log10(p_value))) +
  geom_point(aes(size = term_size, color = significant)) +
  labs(title = "GO:BP", 
       x = "Term", 
       y = "-log10(p-value)") +
  theme_minimal() +
  coord_flip()  # Flips the coordinates for better visibility


# Open a PDF device to save the plot as a full size A4: 
title <- "gprofiler_GO_BP_dotplot.pdf"
pdf(title, width = 8.27, height = 11.69)  # A4 portrait size in inches

# Print the plot to the device
print(p_go_bp)

# Close the device (this saves the plot)
dev.off()
## png 
##   2
cat("Table saved to", title, "\n")
## Table saved to gprofiler_GO_BP_dotplot.pdf

Open this plot by clicking it from the ‘Files’ pane of RStudio. Notice how the term names are now readable :-)

7. Run a gost multi-query

By providing more than one gene list and setting multi_query = TRUE, results from all of the gene lists are grouped by term IDs for easier comparison. This can be handy when you have multiple comparisons within an experiment, or when you want to investigate enrichments within the up and down regulated genes separately.

First, we need to extracts separate gene lists for up-regulated and down-regulated genes.

# make an object for upregualted genes 
up_degs <- data %>%
  filter(FDR < 0.01 & Log2FC >= 2) %>%
  pull(Gene.ID)
cat("Number of upregulated DEGs:", length(up_degs), "\n")
## Number of upregulated DEGs: 577
# Save the DEG gene list to disk: 
title <- "up_DEGs.txt"
write.table(up_degs, title, row.names = FALSE, col.names = FALSE, quote = FALSE)
cat("Up-regulated DEGs saved to", title, "\n")
## Up-regulated DEGs saved to up_DEGs.txt
# make an object for downregualted genes 
down_degs <- data %>%
  filter(FDR < 0.01 & Log2FC <= -2) %>%
  pull(Gene.ID)
cat("Number of downregulated DEGs:", length(down_degs), "\n")
## Number of downregulated DEGs: 215
# Save the DEG gene list to disk: 
title <- "down_DEGs.txt"
write.table(down_degs, title, row.names = FALSE, col.names = FALSE, quote = FALSE)

cat("Down-regulated DEGs saved to", title, "\n")
## Down-regulated DEGs saved to down_DEGs.txt

Now run gost as multi-query. This may take a few moments.

The changes required for multi-query are providing a list of gene list objects to the query parameter instead of a single gene list object, and setting multi_query = TRUE, which is FALSE by default. By including all genes as well, we can efficiently compare up vs down vs no separation.

ora_multi <- gost( 
    query = list("upregulated" = up_degs, "downregulated" = down_degs, "all_DEGs" = degs), 
    organism = "hsapiens", 
    ordered_query = FALSE, 
    multi_query = TRUE, 
    significant = TRUE, 
    exclude_iea = FALSE, 
    measure_underrepresentation = FALSE, 
    evcodes = FALSE, 
    user_threshold = 0.05, 
    correction_method = "g_SCS", 
    domain_scope = "custom_annotated", 
    custom_bg = background, 
    numeric_ns = "", 
    sources = NULL, 
    as_short_link = FALSE, 
    highlight = FALSE 
)

Now create a multi-query interactive Manhattan plot with gostplot:

gostplot(ora_multi, capped = TRUE, interactive = TRUE)

Unfortunately, notebook view squashes the top plot over the bottom one, and adjusting figure height or plot layout options doesn’t seem to help. Plotting as non-interactive or plotting from the console to the plots pane both produce a correct looking plot.

p <- gostplot(ora_multi, capped = TRUE, interactive = FALSE)
filename <- "gprofiler_ORA_multiquery.pdf"

publish_gostplot(p, 
  highlight_terms = NULL, 
  filename, 
  width = 10, 
  height = 10 )
## The image is saved to gprofiler_ORA_multiquery.pdf

To access the tabular results separately, they need to be split, as a number of the columns are comma-delimited lists with one value for each of the 3 queries.

head(ora_multi$result)

If you run the command head(ora_multi$result) directly in the console (not from the notebook) you can see the list values.

One might want to explore the comparisons in more detail for example viewing the separate P values in Excel, so having these results as a file would be handy. The columns that are lists need to be converted to character format before printing to TSV.

sapply(ora_multi$result, class)
##               term_id              p_values           significant 
##           "character"                "list"                "list" 
##             term_size           query_sizes    intersection_sizes 
##             "integer"                "list"                "list" 
##                source             term_name effective_domain_size 
##           "character"           "character"             "integer" 
##          source_order               parents 
##             "integer"                "list"

Convert the columns that are lists to characters

#  convert lists into characters
list_columns <- names(ora_multi$result)[sapply(ora_multi$result, class) == "list"]
ora_multi$result[list_columns] <- lapply(ora_multi$result[list_columns], function(col) {
  sapply(col, function(x) paste(x, collapse = ","))
})
sapply(ora_multi$result, class)
##               term_id              p_values           significant 
##           "character"           "character"           "character" 
##             term_size           query_sizes    intersection_sizes 
##             "integer"           "character"           "character" 
##                source             term_name effective_domain_size 
##           "character"           "character"             "integer" 
##          source_order               parents 
##             "integer"           "character"

View the new table format:

head(ora_multi$result)
# Print TSV file
title <- "gprofiler_ORA_ora_multiquery.tsv"
write.table(ora_multi$result, title, row.names = F, quote = F, sep="\t") 

cat("ORA multi-query results written to", title, "\n")
## ORA multi-query results written to gprofiler_ORA_ora_multiquery.tsv

8. Compare gprofiler2 R results to g:Profiler web results

In day 1 of the workshop, you ran ORA with g:Profiler web tool and saved the results to a CSV. Let’s compare the results to those we have generated in R. Do we expect the results to be identical or differ slightly?

The input file here is one that we have created, but should match yours as long as you used the same P filters and gprofiler parameters.

web <- read.csv("gProfiler_hsapiens_07-11-2024_11-27-09__intersections.csv")
head(web)

Check the numbers: are there any terms significant from one tool but not the other?

# Extract significant term names
web_terms <- web$term_name
ora_terms <- ora$result$term_name

paste0("Number of significant terms from web: ", length(web_terms))
## [1] "Number of significant terms from web: 273"
paste0("Number of significant terms from R: ", length(ora_terms))
## [1] "Number of significant terms from R: 273"
# Find command and unique terms
common_terms <- intersect(web_terms, ora_terms)
if (length(common_terms) == length(web_terms) && length(common_terms) == length(ora_terms)) {
  # If the lengths match, all terms are shared
  print("All terms are shared")
} else {
  # If there are differences, report the number of terms
  unique_web <- setdiff(web_terms, ora_terms)
  unique_ora <- setdiff(ora_terms, web_terms)
  
  print(paste("Number of terms unique to web:", length(unique_web)))
  print(paste("Number of terms unique to gprofiler2 (R):", length(unique_ora)))
}
## [1] "All terms are shared"

That’s a good start! Do the P values differ? Let’s look closely at the GO ‘Molecular Function’ P values via a barplot.

Format the P values for plotting:

# Filter for GO:MF terms
go_mf_web <- web %>% filter(source == "GO:MF")
go_mf_r <- ora$result %>% filter(source == "GO:MF")

# Extract term names and p values
comparison_data_go_mf <- data.frame(
  term_name = go_mf_web$term_name,
  p_value_web = go_mf_web$adjusted_p_value,
  p_value_r = go_mf_r$p_value
)

# Reshape the data to long format
comparison_data_long <- comparison_data_go_mf %>%
  pivot_longer(cols = starts_with("p_value"), 
               names_to = "source", 
               values_to = "p_value")

Create barplot to compare P values web vs R:

# Create the bar plot with -log10 transformed p-values
print(ggplot(comparison_data_long, aes(x = term_name, y = -log10(p_value), fill = source)) +
  geom_bar(stat = "identity", position = "dodge", width = 0.7) +  # Side-by-side bars
  labs(title = "Adjusted P value comparison for GO:MF enrichments",
       x = "Term Name",
       y = "-log10(P-value)") +
  scale_fill_manual(values = c("p_value_web" = "#ff9900", "p_value_r" = "#3366cc")) +  # Custom colors
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) )  

Great! We know we applied the same parameters, and used the same input gene lists. The identical results must mean that gprofiler2 is using the same database version as g:Profiler web.

9. Save versions and session details

gProfiler database version

Let’s check: yesterday when you ran ORA on the web, hopefully you saved your ‘query parameters’ as well as your results.

From my run, I can see version as ‘e111_eg58_p18_f463989d’.

Let’s report the g:Profiler database version used in our analysis:

paste0("g:Profiler database version: ", ora$meta$version)
## [1] "g:Profiler database version: e111_eg58_p18_f463989d"
paste0("gprofiler2 package version: ", packageVersion("gprofiler2"))
## [1] "gprofiler2 package version: 0.2.3"

R version and R package versions

We can also capture the version of R and other session details including all loaded packages and versions with the sessionInfo() function:

sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
##  [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
##  [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] tidyr_1.3.1      ggplot2_3.5.1    gprofiler2_0.2.3 dplyr_1.1.4     
## [5] readr_2.1.5     
## 
## loaded via a namespace (and not attached):
##  [1] plotly_4.10.4     sass_0.4.9        utf8_1.2.4        generics_0.1.3   
##  [5] bitops_1.0-7      hms_1.1.3         digest_0.6.35     magrittr_2.0.3   
##  [9] evaluate_0.23     grid_4.4.2        fastmap_1.1.1     jsonlite_1.8.8   
## [13] promises_1.3.0    gridExtra_2.3     httr_1.4.7        purrr_1.0.2      
## [17] fansi_1.0.6       crosstalk_1.2.1   viridisLite_0.4.2 scales_1.3.0     
## [21] textshaping_0.3.7 lazyeval_0.2.2    jquerylib_0.1.4   shiny_1.8.1.1    
## [25] cli_3.6.2         rlang_1.1.3       crayon_1.5.2      bit64_4.0.5      
## [29] munsell_0.5.1     withr_3.0.0       cachem_1.0.8      yaml_2.3.8       
## [33] tools_4.4.2       parallel_4.4.2    tzdb_0.4.0        colorspace_2.1-0 
## [37] httpuv_1.6.15     mime_0.12         vctrs_0.6.5       R6_2.5.1         
## [41] lifecycle_1.0.4   htmlwidgets_1.6.4 bit_4.0.5         vroom_1.6.5      
## [45] ragg_1.3.1        pkgconfig_2.0.3   later_1.3.2       pillar_1.9.0     
## [49] bslib_0.7.0       gtable_0.3.5      Rcpp_1.0.12       data.table_1.15.4
## [53] glue_1.7.0        systemfonts_1.0.6 highr_0.10        xfun_0.43        
## [57] tibble_3.2.1      tidyselect_1.2.1  rstudioapi_0.16.0 knitr_1.46       
## [61] farver_2.1.2      xtable_1.8-4      htmltools_0.5.8.1 labeling_0.4.3   
## [65] rmarkdown_2.26    compiler_4.4.2    RCurl_1.98-1.14

RStudio version

Typically, we would simply run RStudio.Version() to print the version details. However, when we knit this document to HTML, the RStudio.Version() function is not available and will cause an error. So to make sure our version details are saved to our static record of the work, we will save to a file, then print the file contents back into the notebook.

# Get RStudio version information
rstudio_info <- RStudio.Version()

# Convert the version information to a string
rstudio_version_str <- paste(
  "RStudio Version Information:\n",
  "Version: ", rstudio_info$version, "\n",
  "Release Name: ", rstudio_info$release_name, "\n",
  "Long Version: ", rstudio_info$long_version, "\n",
  "Mode: ", rstudio_info$mode, "\n",
  "Citation: ", rstudio_info$citation,
  sep = ""
)

# Write the output to a text file
writeLines(rstudio_version_str, "rstudio_version.txt")
# Read the saved version information from the file
rstudio_version_text <- readLines("rstudio_version.txt")

# Print the version information to the document
rstudio_version_text
## [1] "RStudio Version Information:"                                                                                                                                                                                                                                                                           
## [2] "Version: 2023.6.1.524"                                                                                                                                                                                                                                                                                  
## [3] "Release Name: Mountain Hydrangea"                                                                                                                                                                                                                                                                       
## [4] "Long Version: 2023.06.1+524"                                                                                                                                                                                                                                                                            
## [5] "Mode: server"                                                                                                                                                                                                                                                                                           
## [6] "Citation: list(title = \"RStudio: Integrated Development Environment for R\", author = list(list(given = \"Posit team\", family = NULL, role = NULL, email = NULL, comment = NULL)), organization = \"Posit Software, PBC\", address = \"Boston, MA\", year = \"2023\", url = \"http://www.posit.co/\")"

10. Knit workbook to HTML

Make sure your document is saved if you have made any changes! (there will be an asterisk next to the filename on editor pane if unsaved changes are present).

The last task is to knit the notebook. Our notebook is editable, and can be changed. Deleting code deletes the output, so we could lose valuable details. If we knit the notebook to HTML, we have a permanent static copy of the work.

On the editor pane toolbar, under Preview, select Knit to HTML.

If you have already run Preview, you will see Knit instead of Preview.

The HTML file will be saved in the same directory as the notebook, and with the same filename, but the .Rmd prefix will be replaced by .html. The knit HTML will typically open automatically once complete. If you receive a popup blocker error, click cancel, and in the Files pane of RStudio, single click the gprofiler.html file and select View in Web Browser.

Note that the notebook will only successfully knit if there are no errors in the code. You can ‘preview’ HTML with code errors.