Ensure the ‘workshop’ directory is your current working directory:
getwd()
## [1] "/home/user3/workshop"
Raw data from Pezzini
et al 2017 was subjected to differential gene expression analysis
with Degust and the results
file saved to Pezzini_DE.txt.
The input data file is within the current working directory so we do not need to specify its directory path.
# save data file to an R object called 'data'
data <- read_tsv("Pezzini_DE.txt", col_names = TRUE, show_col_types = FALSE)
# view the first few lines
head(data)
The dataframe shows genes with fold change and FDR values, along with some normalised counts values for the 6 samples (2 groups with 3 replicates each).
Look on the environment pane of RStudio, and you can see a description ‘14420 obs. of 10 variables’ - this shows your dataframe consists of 10 columns and 14,420 genes.
Now we need to filter for differentially expressed genes (DEGs), and we will apply the thresholds adjusted P values/FDR < 0.01, and log2fold change of 2.
We will use ENSEMBL gene IDs (column 1).
# Filter DEGs and save to an object named 'degs'
degs <- data %>%
filter(FDR <= 0.01 & abs(Log2FC) >= 2) %>%
pull(Gene.ID)
cat("Number of genes passing FDR and fold change filter:", length(degs), "\n")
## Number of genes passing FDR and fold change filter: 792
# Save the DEG gene list to disk:
title <- "Pezzini_DEGs.txt"
write.table(degs, "Pezzini_DEGs.txt", row.names = FALSE, col.names = FALSE, quote = FALSE)
cat("Table saved to", title, "\n")
## Table saved to Pezzini_DEGs.txt
We have 792 genes passing our filters.
Recall from the webinar and day 1 of the workshop that an experimental background gene list is crucial to avoiding false positives and minimising tissue bias with ORA.
The analysis in Degust has already removed lowly expressed genes, so we can simply extract all genes from this data matrix as our background gene list and save it as our ‘background’ object, as well as save to disk so that we can include it within the supplementary materials of any resultant publications for reproducibility.
# select the column labelled 'Gene.ID' from the 'data' dataframe, save to object named 'background'
background <- data$Gene.ID
cat("Number of background genes:", length(background), "\n")
## Number of background genes: 14420
# Save the background gene list to disk:
title <- "Pezzini_background_genes.txt"
write.table(background, title, row.names = FALSE, col.names = FALSE, quote = FALSE)
cat("Table saved to", title, "\n")
## Table saved to Pezzini_background_genes.txt
gost functionBefore running the below code chunk, review the parameters for the
gost ORA function. We can do this easily by bringing up the
help page for the function in the Help pane of RStudio.
This is the same information that is displayed in the gprofiler2
user guide.
?gprofiler2::gost
Observe the similarities to the parameters available on the
g:Profiler web interface, for example organism, the correction method
(g:Profiler’s custom g_scs method), and domain scope
(background genes).
Run the below code which explicitly includes all available
gost parameters. Including all parameters, even if the
defaults suit your needs, makes your parameter choices explicit.
Sometimes, default settings can change between versions!
An error-free gost run should produce no console output.
As the code is running, there wll be a green bar to the elft of the code
chunk.
Our results are saved in the R object ora.
ora <- gost(
degs, # 'degs' gene list object
organism = "hsapiens", # human data
ordered_query = FALSE,
multi_query = FALSE,
significant = TRUE, # only print significant terms
exclude_iea = FALSE, # exclude GO electronic annotations
measure_underrepresentation = FALSE,
evcodes = FALSE, # don't include evidence codes in the results - good to have, but will make it run slower
user_threshold = 0.05, # adj P value cutoff for terms
correction_method = "g_SCS", # gprofiler's custom multiple testing correctionmethod (recommended)
domain_scope = "custom_annotated", # custom background, restrict to only annotated genes
custom_bg = background, # 'background' gene list object
numeric_ns = "", # we don't have numeric IDs
sources = NULL, # use all databases
as_short_link = FALSE, # save our results here not as a weblink to gprofiler
highlight = TRUE # highlight driver terms (will add a 'highlighted' column with TRUE/FALSE)
)
View the top-most significant enrichments with the R
head command. Only significant enrichments passing your
specified threshold (adjusted P value < 0.05) are included in the
results object because we have included
significant = TRUE.
Use the black arrow on the right of the table to scroll to other columns.
head(ora$result)
Let’s give our query a name:
# reassign query name to something more specific
ora$result$query <- 'DEGs_Padj0.05_FC2'
head(ora$result)
We can obtain a list of queried databases:
unique(ora$result$source)
## [1] "GO:BP" "GO:CC" "GO:MF" "HP" "HPA" "KEGG" "MIRNA" "REAC" "TF"
## [10] "WP"
Same as the web tool, we have enrichment results for GP (BP, CC, MF), HP (human phenotype), HPA (human protein atlas), KEGG, MiRNA, Reactome, Transcription Factors, and WikiPathways.
Let’s save the results file to disk. This is handy when you want to export results elsewhere for further analysis, for example Excel.
First, we will re-order the columns so the output more closely matches the tables that are downloaded from the web version of the tool.
# reorder the table columns
ora_reordered <- ora$result[, c("source", "term_name", "term_id", "p_value", "term_size", "query_size", "intersection_size", "effective_domain_size")]
# check the first few lines of the output
head(ora_reordered)
# print to CSV
title <- "gprofiler_ORA_results.csv"
write.csv(ora_reordered, title, row.names = FALSE)
cat("Table saved to", title, "\n")
## Table saved to gprofiler_ORA_results.csv
gprofiler has a function to print tables that mimic the
web tool called publish_gosttable. These are image files,
so not for importing to Excel like the CSV we just created.
Let’s extract the results for Reactome and save to a
gosttable.
# Filter results for the 'Reactome' database
reactome_results <- ora$result %>% filter(source == "REAC")
# Extract all term_ids for Reactome
reac <- reactome_results$term_id
# Create the GOST table for Reactome terms
filename <- "gprofiler_Reactome_gosttable.pdf"
publish_gosttable(ora,
highlight_terms = reac,
use_colors = TRUE,
show_columns = c("source", "term_name", "term_size", "intersection_size"),
filename)
## The image is saved to gprofiler_Reactome_gosttable.pdf
The gostplot function creates a Manhattan plot similar
to the one shown on the web tool. By applying the parameter
interactive=TRUE we can hover over the data points to see
enriched term details.
The parameter capped = TRUE is an indicator whether the
-log10(p-values) would be capped at 16 if bigger than 16. This fixes the
scale of y-axis to keep Manhattan plots from different queries
comparable and is also intuitive since p-values smaller than that can
all be summarised as ‘highly significant’.
gostplot(ora,
capped = TRUE,
interactive = TRUE,
pal = c(`GO:MF` = "#dc3912",
`GO:BP` = "#ff9900",
`GO:CC` = "#109618",
KEGG = "#dd4477",
REAC = "#3366cc",
WP = "#0099c6",
TF = "#5574a6",
MIRNA = "#22aa99",
HPA = "#6633cc",
CORUM = "#66aa00",
HP = "#990099")
)
There are a lot of significant enrichments for GO biological
processes. Many of these are probably terms containing a large number of
genes, so not particularly informative. Other R tools have default
settings limiting the minimum and maximum number of genes in a geneset
to be included in the analysis. Since there is no direct parameter to
restrict term size to gostplot, we can filter the ORA
results before plotting. Let’s apply a maximum gene set size of 500, and
a minimum gene set size of 10, which are the default setting used by
clusterProfiler.
# Filter the results for GO:BP terms with term_size <= 500 and >= 10
# save the filtered results in a new object called 'ora_filter_termsize'
ora_filter_termsize <- ora
ora_filter_termsize$result <- ora$result %>% filter(term_size <= 500) %>% filter(term_size >= 10)
# Plot with gostplot using the filtered results
gostplot(ora_filter_termsize,
capped = TRUE,
interactive = TRUE,
pal = c(
`GO:MF` = "#dc3912",
`GO:BP` = "#ff9900",
`GO:CC` = "#109618",
KEGG = "#dd4477",
REAC = "#3366cc",
WP = "#0099c6",
TF = "#5574a6",
MIRNA = "#22aa99",
HPA = "#6633cc",
CORUM = "#66aa00",
HP = "#990099"
)
)
This has cleaned up ‘Biological Process’ a little bit, enabling signals of more specific terms to be highlighted.
gprofiler2 includes a function for creating a
publication-ready image that can optionally highlight specific terms. We
need to first produce a plot with interactice = FALSE, save
it to an object, and then provide that plot object to the
publish_gostplot function.
# Plot with gostplot using the filtered results, save to object called 'plot'
plot <- gostplot(ora_filter_termsize,
capped = TRUE,
interactive = FALSE,
pal = c(
`GO:MF` = "#dc3912",
`GO:BP` = "#ff9900",
`GO:CC` = "#109618",
KEGG = "#dd4477",
REAC = "#3366cc",
WP = "#0099c6",
TF = "#5574a6",
MIRNA = "#22aa99",
HPA = "#6633cc",
CORUM = "#66aa00",
HP = "#990099"
)
)
The publish_gostplot parameter
highlight_terms enables you to highlight specific terms on
the plot, with a table showing enrichment details below for those
highlighted terms.
Let’s highlight some selected terms manually. You need to provide the term ID not term name.
#specify term IDs for tmers of interest: 'Collagen degradation' and 'Collagen formation'
highlight <- c("REAC:R-HSA-1442490", "REAC:R-HSA-1474290")
filename <- "gprofiler_collagen_gostplot.pdf"
publish_gostplot(plot,
highlight_terms = highlight,
filename,
width = 10,
height = 10 )
## The image is saved to gprofiler_collagen_gostplot.pdf
Like g:Profiler web, the coloured boxes on the table are by adjusted P value, with darker colours indicating more significant results. Colours range from yellow through green to dark blue.
You can use R grepl function to search for terms with
names matching some keyword. Let’s highlight all terms related to
receptors. The code chunk applies an increased figure height, to ensure
we can see the whole plot within the notebook.
# extract from ora results all terms containing "receptor" keyword and create a list of those term IDs
highlight <- ora$result %>%
filter(grepl("receptor", term_name, ignore.case = TRUE)) %>%
pull(term_id)
filename <- "gprofiler_receptors_gostplot.pdf"
publish_gostplot(plot,
highlight_terms = highlight,
filename,
width = 10,
height = 10 )
## The image is saved to gprofiler_receptors_gostplot.pdf
One of the advantages of working in R is flexibility with
visualisations. While the interactive Manhattan plots and
publish_gostplot options are nice, it can also be useful to
visualise P values against all term descriptions.
One way to do this is with a dotplot. We can loop through all
databases and use the R package ggplot2 to make a dotplot
for each database with significantly enriched terms for our gene
list.
# List of databases
dbs <- unique(ora$result$source)
# Loop over databse list, and print a plot if there are significant enrichments, or else print a message
for (db in dbs) {
# Extract results for this database, and filter by term size
db_results <- ora$result %>% filter(source == db, term_size >= 10, term_size <= 500)
# Check if there are any terms left after filtering
if (nrow(db_results) > 0) {
# Create the dot plot for this database
p <- ggplot(db_results, aes(x = reorder(term_name, -p_value), y = -log10(p_value))) +
geom_point(aes(size = term_size, color = significant)) +
labs(title = paste(db),
x = "Term",
y = "-log10(p-value)") +
theme_minimal() +
coord_flip() # Flips the coordinates for better visibility
# Print the plot
print(p)
} else {
# Print a message if there are no significant enrichments
message("No significant enrichments for database: ", db)
}
}
## No significant enrichments for database: HP
## No significant enrichments for database: MIRNA
## No significant enrichments for database: TF
For plots with a lot of enriched terms, such as GO Biological Process, the display within the notebook is less than ideal. Saving the plot to an image file enables better resolution:
# Filter for the GO:BP database
go_bp_results <- ora$result %>% filter(source == "GO:BP",term_size >= 10, term_size <= 500)
# Create the plot for GO:BP
p_go_bp <- ggplot(go_bp_results, aes(x = reorder(term_name, -p_value), y = -log10(p_value))) +
geom_point(aes(size = term_size, color = significant)) +
labs(title = "GO:BP",
x = "Term",
y = "-log10(p-value)") +
theme_minimal() +
coord_flip() # Flips the coordinates for better visibility
# Open a PDF device to save the plot as a full size A4:
title <- "gprofiler_GO_BP_dotplot.pdf"
pdf(title, width = 8.27, height = 11.69) # A4 portrait size in inches
# Print the plot to the device
print(p_go_bp)
# Close the device (this saves the plot)
dev.off()
## png
## 2
cat("Table saved to", title, "\n")
## Table saved to gprofiler_GO_BP_dotplot.pdf
Open this plot by clicking it from the ‘Files’ pane of RStudio. Notice how the term names are now readable :-)
By providing more than one gene list and setting
multi_query = TRUE, results from all of the gene lists are
grouped by term IDs for easier comparison. This can be handy when you
have multiple comparisons within an experiment, or when you want to
investigate enrichments within the up and down regulated genes
separately.
First, we need to extracts separate gene lists for up-regulated and down-regulated genes.
# make an object for upregualted genes
up_degs <- data %>%
filter(FDR < 0.01 & Log2FC >= 2) %>%
pull(Gene.ID)
cat("Number of upregulated DEGs:", length(up_degs), "\n")
## Number of upregulated DEGs: 577
# Save the DEG gene list to disk:
title <- "up_DEGs.txt"
write.table(up_degs, title, row.names = FALSE, col.names = FALSE, quote = FALSE)
cat("Up-regulated DEGs saved to", title, "\n")
## Up-regulated DEGs saved to up_DEGs.txt
# make an object for downregualted genes
down_degs <- data %>%
filter(FDR < 0.01 & Log2FC <= -2) %>%
pull(Gene.ID)
cat("Number of downregulated DEGs:", length(down_degs), "\n")
## Number of downregulated DEGs: 215
# Save the DEG gene list to disk:
title <- "down_DEGs.txt"
write.table(down_degs, title, row.names = FALSE, col.names = FALSE, quote = FALSE)
cat("Down-regulated DEGs saved to", title, "\n")
## Down-regulated DEGs saved to down_DEGs.txt
Now run gost as multi-query. This may take a few
moments.
The changes required for multi-query are providing a list of gene
list objects to the query parameter instead of a single
gene list object, and setting multi_query = TRUE, which is
FALSE by default. By including all genes as well, we can efficiently
compare up vs down vs no separation.
ora_multi <- gost(
query = list("upregulated" = up_degs, "downregulated" = down_degs, "all_DEGs" = degs),
organism = "hsapiens",
ordered_query = FALSE,
multi_query = TRUE,
significant = TRUE,
exclude_iea = FALSE,
measure_underrepresentation = FALSE,
evcodes = FALSE,
user_threshold = 0.05,
correction_method = "g_SCS",
domain_scope = "custom_annotated",
custom_bg = background,
numeric_ns = "",
sources = NULL,
as_short_link = FALSE,
highlight = FALSE
)
Now create a multi-query interactive Manhattan plot with
gostplot:
gostplot(ora_multi, capped = TRUE, interactive = TRUE)
Unfortunately, notebook view squashes the top plot over the bottom one, and adjusting figure height or plot layout options doesn’t seem to help. Plotting as non-interactive or plotting from the console to the plots pane both produce a correct looking plot.
p <- gostplot(ora_multi, capped = TRUE, interactive = FALSE)
filename <- "gprofiler_ORA_multiquery.pdf"
publish_gostplot(p,
highlight_terms = NULL,
filename,
width = 10,
height = 10 )
## The image is saved to gprofiler_ORA_multiquery.pdf
To access the tabular results separately, they need to be split, as a number of the columns are comma-delimited lists with one value for each of the 3 queries.
head(ora_multi$result)
If you run the command head(ora_multi$result) directly
in the console (not from the notebook) you can see the list values.
One might want to explore the comparisons in more detail for example
viewing the separate P values in Excel, so having these results as a
file would be handy. The columns that are lists need to be converted to
character format before printing to TSV.
sapply(ora_multi$result, class)
## term_id p_values significant
## "character" "list" "list"
## term_size query_sizes intersection_sizes
## "integer" "list" "list"
## source term_name effective_domain_size
## "character" "character" "integer"
## source_order parents
## "integer" "list"
Convert the columns that are lists to characters
# convert lists into characters
list_columns <- names(ora_multi$result)[sapply(ora_multi$result, class) == "list"]
ora_multi$result[list_columns] <- lapply(ora_multi$result[list_columns], function(col) {
sapply(col, function(x) paste(x, collapse = ","))
})
sapply(ora_multi$result, class)
## term_id p_values significant
## "character" "character" "character"
## term_size query_sizes intersection_sizes
## "integer" "character" "character"
## source term_name effective_domain_size
## "character" "character" "integer"
## source_order parents
## "integer" "character"
View the new table format:
head(ora_multi$result)
# Print TSV file
title <- "gprofiler_ORA_ora_multiquery.tsv"
write.table(ora_multi$result, title, row.names = F, quote = F, sep="\t")
cat("ORA multi-query results written to", title, "\n")
## ORA multi-query results written to gprofiler_ORA_ora_multiquery.tsv
In day 1 of the workshop, you ran ORA with g:Profiler web tool and saved the results to a CSV. Let’s compare the results to those we have generated in R. Do we expect the results to be identical or differ slightly?
The input file here is one that we have created, but should match yours as long as you used the same P filters and gprofiler parameters.
web <- read.csv("gProfiler_hsapiens_07-11-2024_11-27-09__intersections.csv")
head(web)
Check the numbers: are there any terms significant from one tool but not the other?
# Extract significant term names
web_terms <- web$term_name
ora_terms <- ora$result$term_name
paste0("Number of significant terms from web: ", length(web_terms))
## [1] "Number of significant terms from web: 273"
paste0("Number of significant terms from R: ", length(ora_terms))
## [1] "Number of significant terms from R: 273"
# Find command and unique terms
common_terms <- intersect(web_terms, ora_terms)
if (length(common_terms) == length(web_terms) && length(common_terms) == length(ora_terms)) {
# If the lengths match, all terms are shared
print("All terms are shared")
} else {
# If there are differences, report the number of terms
unique_web <- setdiff(web_terms, ora_terms)
unique_ora <- setdiff(ora_terms, web_terms)
print(paste("Number of terms unique to web:", length(unique_web)))
print(paste("Number of terms unique to gprofiler2 (R):", length(unique_ora)))
}
## [1] "All terms are shared"
That’s a good start! Do the P values differ? Let’s look closely at the GO ‘Molecular Function’ P values via a barplot.
Format the P values for plotting:
# Filter for GO:MF terms
go_mf_web <- web %>% filter(source == "GO:MF")
go_mf_r <- ora$result %>% filter(source == "GO:MF")
# Extract term names and p values
comparison_data_go_mf <- data.frame(
term_name = go_mf_web$term_name,
p_value_web = go_mf_web$adjusted_p_value,
p_value_r = go_mf_r$p_value
)
# Reshape the data to long format
comparison_data_long <- comparison_data_go_mf %>%
pivot_longer(cols = starts_with("p_value"),
names_to = "source",
values_to = "p_value")
Create barplot to compare P values web vs R:
# Create the bar plot with -log10 transformed p-values
print(ggplot(comparison_data_long, aes(x = term_name, y = -log10(p_value), fill = source)) +
geom_bar(stat = "identity", position = "dodge", width = 0.7) + # Side-by-side bars
labs(title = "Adjusted P value comparison for GO:MF enrichments",
x = "Term Name",
y = "-log10(P-value)") +
scale_fill_manual(values = c("p_value_web" = "#ff9900", "p_value_r" = "#3366cc")) + # Custom colors
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) )
Great! We know we applied the same parameters, and used the same input gene lists. The identical results must mean that gprofiler2 is using the same database version as g:Profiler web.
Let’s check: yesterday when you ran ORA on the web, hopefully you saved your ‘query parameters’ as well as your results.
From my run, I can see version as ‘e111_eg58_p18_f463989d’.
Let’s report the g:Profiler database version used in our analysis:
paste0("g:Profiler database version: ", ora$meta$version)
## [1] "g:Profiler database version: e111_eg58_p18_f463989d"
paste0("gprofiler2 package version: ", packageVersion("gprofiler2"))
## [1] "gprofiler2 package version: 0.2.3"
We can also capture the version of R and other session details
including all loaded packages and versions with the
sessionInfo() function:
sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
## [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
## [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] tidyr_1.3.1 ggplot2_3.5.1 gprofiler2_0.2.3 dplyr_1.1.4
## [5] readr_2.1.5
##
## loaded via a namespace (and not attached):
## [1] plotly_4.10.4 sass_0.4.9 utf8_1.2.4 generics_0.1.3
## [5] bitops_1.0-7 hms_1.1.3 digest_0.6.35 magrittr_2.0.3
## [9] evaluate_0.23 grid_4.4.2 fastmap_1.1.1 jsonlite_1.8.8
## [13] promises_1.3.0 gridExtra_2.3 httr_1.4.7 purrr_1.0.2
## [17] fansi_1.0.6 crosstalk_1.2.1 viridisLite_0.4.2 scales_1.3.0
## [21] textshaping_0.3.7 lazyeval_0.2.2 jquerylib_0.1.4 shiny_1.8.1.1
## [25] cli_3.6.2 rlang_1.1.3 crayon_1.5.2 bit64_4.0.5
## [29] munsell_0.5.1 withr_3.0.0 cachem_1.0.8 yaml_2.3.8
## [33] tools_4.4.2 parallel_4.4.2 tzdb_0.4.0 colorspace_2.1-0
## [37] httpuv_1.6.15 mime_0.12 vctrs_0.6.5 R6_2.5.1
## [41] lifecycle_1.0.4 htmlwidgets_1.6.4 bit_4.0.5 vroom_1.6.5
## [45] ragg_1.3.1 pkgconfig_2.0.3 later_1.3.2 pillar_1.9.0
## [49] bslib_0.7.0 gtable_0.3.5 Rcpp_1.0.12 data.table_1.15.4
## [53] glue_1.7.0 systemfonts_1.0.6 highr_0.10 xfun_0.43
## [57] tibble_3.2.1 tidyselect_1.2.1 rstudioapi_0.16.0 knitr_1.46
## [61] farver_2.1.2 xtable_1.8-4 htmltools_0.5.8.1 labeling_0.4.3
## [65] rmarkdown_2.26 compiler_4.4.2 RCurl_1.98-1.14
Typically, we would simply run RStudio.Version() to
print the version details. However, when we knit this document to HTML,
the RStudio.Version() function is not available and will
cause an error. So to make sure our version details are saved to our
static record of the work, we will save to a file, then print the file
contents back into the notebook.
# Get RStudio version information
rstudio_info <- RStudio.Version()
# Convert the version information to a string
rstudio_version_str <- paste(
"RStudio Version Information:\n",
"Version: ", rstudio_info$version, "\n",
"Release Name: ", rstudio_info$release_name, "\n",
"Long Version: ", rstudio_info$long_version, "\n",
"Mode: ", rstudio_info$mode, "\n",
"Citation: ", rstudio_info$citation,
sep = ""
)
# Write the output to a text file
writeLines(rstudio_version_str, "rstudio_version.txt")
# Read the saved version information from the file
rstudio_version_text <- readLines("rstudio_version.txt")
# Print the version information to the document
rstudio_version_text
## [1] "RStudio Version Information:"
## [2] "Version: 2023.6.1.524"
## [3] "Release Name: Mountain Hydrangea"
## [4] "Long Version: 2023.06.1+524"
## [5] "Mode: server"
## [6] "Citation: list(title = \"RStudio: Integrated Development Environment for R\", author = list(list(given = \"Posit team\", family = NULL, role = NULL, email = NULL, comment = NULL)), organization = \"Posit Software, PBC\", address = \"Boston, MA\", year = \"2023\", url = \"http://www.posit.co/\")"
Make sure your document is saved if you have made any changes! (there will be an asterisk next to the filename on editor pane if unsaved changes are present).
The last task is to knit the notebook. Our notebook is editable, and can be changed. Deleting code deletes the output, so we could lose valuable details. If we knit the notebook to HTML, we have a permanent static copy of the work.
On the editor pane toolbar, under Preview, select Knit to HTML.
If you have already run Preview, you will see Knit instead of Preview.
The HTML file will be saved in the same directory as the notebook, and with the same filename, but the .Rmd prefix will be replaced by .html. The knit HTML will typically open automatically once complete. If you receive a popup blocker error, click cancel, and in the Files pane of RStudio, single click the gprofiler.html file and select View in Web Browser.
Note that the notebook will only successfully knit if there are no errors in the code. You can ‘preview’ HTML with code errors.