The code contained in this .Rmd document performs differential expression (DE) and enrichment analysis. It is performed after processing raw sequencing reads to analysis ready raw counts, e.g. with nfcore/rnaseq as we did in Part 1 of this workshop.
How to use this file:
Yes, more set up!
Load all the R libraries below.
suppressMessages({
library("DESeq2")
library("edgeR")
library("limma")
library("RColorBrewer")
library("gplots")
library("ggplot2")
library("factoextra")
library("devtools")
library("rstudioapi")
library("dplyr")
library("tibble")
library("tidyverse")
library("pheatmap")
library("biomaRt")
library("annotables")
library("org.Mm.eg.db")
library("biobroom")
library("clusterProfiler")
library("pathfindR")
})
Install pheatmap: There is one package (pheatmap) which has not been installed in our singularity container (whoopsie!). You should see a yellow banner at the top of this page warning you about this - please click install
Set the path so that R knows where our data lives and where we will write our output files.
setwd("/home/training/base_directory/working_directory")
The count matrix file contains raw (unnormalized) counts for every gene (rows, i) in mm10 and every sample (columns, j). The value in the i-th row and the j-th column of the matrix tells how many raw reads were assigned to gene i in sample j.
We created a count matrix file with subsetted data in Part 1 of this workshop using nf-core/rnaseq (salmon.merged.gene_counts.tsv). In Part 2, we will use a count matrix file produced from the full dataset (including more descriptive sample identifiers!), in order to have enough data to perform functional enrichment analysis.
Load the count matrix file and format the data frame:
# Read in the full count matrix file
# This file was included in the data we downloaded for Part 1
counttable_original <- read.delim("FULL_count_matrix.txt",
header = T,
row.names = 1)
# This is not in the exact format required by the functions used later in the analysis
# Data format is very important to ensure that functions read and analyse data correctly!
# Put gene symbol in the first column
counttable <- counttable_original[,c("Symbol","WT1","WT2","WT3","KO1","KO2","KO3")]
# We don't need the ensembl IDs - get rid of the rownames
row.names(counttable) <- NULL
# Make the gene symbol column rownames instead
rownames(counttable) <- counttable$Symbol
counttable <- counttable[,c("WT1","WT2","WT3","KO1","KO2","KO3")]
We now have our count matrix file (genes - rows, samples - columns) ready for analysis.
In the console, inspect counttable
DE requires some metadata that tells our R libraries about the experimental design of the study, so that it knows how to handle the data. In this analysis, we have two experimental groups, the wildtype (“Wild”) and the knockout (“KO”) groups.
We will create and store this metadata in a specific format required by the R libraries that we will use later. The samples are in rows (sample IDs as rownames), and columns are the experimental groupings. You can have more than one column, but need a minimum of one that describes your experimental groups.
# Define a condition variable, ensuring they match the order of sample IDs in counttable
condition = c("Wild","Wild","Wild","KO","KO","KO")
# Create a dataframe called meta with condition and sample IDs as rownames (taken from counttable)
meta <- data.frame(row.names = colnames(counttable), condition)
In the console, inspect meta
Before performing any DE analysis, it is good to explore and visualise our data. This helps us to understand and get a sense of the quality of our data at this stage of the analysis.
This session will be done in breakout rooms. Work through the code and challenge questions with your facilitators.
For each sample, check the total library size. This is essentially the total number of reads which we have aligned to each sample.
# Sum raw gene counts for each sample (each column)
colSums(counttable)
## WT1 WT2 WT3 KO1 KO2 KO3
## 25726848 20480345 20655220 18571112 28027328 20111698
How do you think the differences in total library size could affect DE analysis?
Here we plot a boxplot of gene level raw counts (y axis) for each sample (x axis).
boxplot(counttable,
col = "red")
How would you describe what you are seeing in the plot?
# Add 1 to make sure all values are > 0
boxplot(log2((counttable) +1),
col = "red")
The distribution of counts across samples is not comparable (although not too dissimilar in this case!). This is a consideration to take if you plan to use any statistical tests that assume an equal distribution of counts across samples.
The DESeq2 package contains functions that perform normalization, data transformation, visualization and DE. This is a highly regarded and popular tool. We will use some of its functions to perform exploratory analysis.
This is a large package and we will only scratch the surface of key concepts in this workshop. We recommend you read the DESeq2 paper and manual before performing your own analysis.
In order for DESeq2 to perform DE, we need to store data in a DESeqDataSet object (dds) which contains:
For exploratory analysis, we set design = ~ 1 which tells DESeq2 to be blind to experimental groups. We do not want DESeq2 to account for any within group variability during exploratory analysis and quality checking. This will allow us to observe for any unexpected batch effects.
We will spend more time understanding the dds object later in this workshop.
# We will call this object by name 'dds' as this is a standard practice
dds <- DESeqDataSetFromMatrix(countData = counttable,
colData = meta,
design = ~1)
Count data is transformed with regularized log (rlog) or variance stabilising transformation (vst), required before performing exploratory data analysis such as visualisation and clustering (e.g. PCA). Both methods produce data on the log2 scale, and normalize for other factors such as library size.
rlog performs slightly better, but can take a lot longer than vst if you have many samples. We will set blind = TRUE so that DESeq2 is blind to experimental groups, for the same reasons as previously described.
# Calculate rlog and store it in a dds-like object
rlog <- rlog(dds, blind = TRUE)
rlog.data <- assay(rlog)
We can check the effect of transformation by again using boxplots.
boxplot(rlog.data,
col = "red")
Notice that the count distribution across samples is much more comparable with rlog transformed data.
Principal component analysis (PCA) is a dimensionality reduction method that summarises high dimensionality data into a number of principal components (PCs). For RNA-Seq, our highly dimensional data is our per sample gene-expression data and the variance that exists across samples. We can observe the relationship between these in a 2D space, by plotting two components at a time (usually the top two that account for most of the variance).
Create a PCA plot using rlog transformed data. By default, DESeq2::plotPCA() will colour each dot by the sample’s experimental group, but we have included some additional code to remove these for our discussion :-).
# DESeq2's plotPCA function will create a PCA plot using an object that has rlog or vst values
pcaData <- plotPCA(rlog, returnData=TRUE)
# Convert to percentages
percentVar <- round(100 * attr(pcaData, "percentVar"))
# Plot table
ggplot(pcaData, aes(PC1, PC2)) +
geom_point(size = 3) +
xlab(paste0("PC1: ",percentVar[1],"% variance")) +
ylab(paste0("PC2: ",percentVar[2],"% variance")) +
coord_fixed()
Each dot represents one sample. Samples that appear closer together have a more similar gene expression profile. Can you speculate which samples belong to the same experimental group?
Let’s recreate the plot, now colouring samples by their experimental group.
# DESeq2's plotPCA function will create a PCA plot using an object that has rlog or vst values
DESeq2::plotPCA(rlog)
Can you comment on how the samples cluster together in the plot?
If you saw one red dot cluster more closely with the blue dots, what might this suggest?
Apart from experimental groups, what other relationships might be revealed when looking at PCA plots?
How much of the overall variance is explained by PC1 & PC2?
Plotting the other PCs is something you may want to do until you have explored most of the variation in the dataset and what their potential sources might be. We will not have time to cover this in the workshop, but do recommend you look into scree plots and observing the genes contributing to each PC.
Another way to visualise how similar or dissimilar our samples are is to plot sample distances in a heatmap and a hierarchical cluster.
# dist() calculates eucleadean distance, which requires data to be in a specific format
sampleDists <- dist(t(assay(rlog)))
sampleDistMatrix <- as.matrix(sampleDists)
# Get some pretty blue colours
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
# Plot the sampleDistMatrix in a heatmap
# pheatmap also calculates and plots hierachical clusters
pheatmap(sampleDistMatrix,
clustering_distance_rows = sampleDists,
clustering_distance_cols = sampleDists,
col = colors)
Dark blue indicates samples that are more similar (distance = 0).
What do you notice about sample KO3?
This is the end of the breakout room session
We are happy with what we have observed in our exploratory analysis and are finally ready to start DE analysis.
In order for DESeq2 to perform DE, we need to revisit the DESeqDataSet object (dds), this time telling it our experimental design. In our case, this will be the column “condition”, taken from “meta”.
Note: design formulas can be much, much more complex! This gives you the power to model and account for other variation (e.g. you could model batch effects using something like ~ condition + batch)
dds <- DESeqDataSetFromMatrix(countData = counttable,
colData = meta,
design = ~ condition)
Let’s stop here and take some time to understand dds. In the console, type in the code below:
dds
counts(dds)
colData(dds)
design(dds)
When we perform differential expression and interpret the results, we want to see what changed in the knockout mice (“treatment”) compared to the wildtype mice (“control”) - not the other way around!
The experimental design information in dds is stored as a factor in R (check by running class(dds$condition - without the backslash). By default, R will choose a reference level for factors based on alphabetical order. That means, the knockout group is currently our baseline (check by typing in the console: dds$condition, without the backslash).
Note: the backslashes are required to escape the “$” as they are interpreted differently in Markdown vs R.
We will need to explicitly set “Wild” as the baseline level for easier interpretation of results.
# Set Wild to base level, using relevel
dds$condition <- relevel(dds$condition, "Wild")
# Check that Wild appears as the first level
dds$condition
## [1] Wild Wild Wild KO KO KO
## Levels: Wild KO
Before we commence with DE, there are some key concepts that you should know.
All DE methods account for the above in their own way. In this workshop, we will use and explore DESeq2’s method.
We are finally ready to perform DE analysis with DESeq2’s DESeq() function. This performs a number of steps required to perform DE - the console output gives you a clue as to what these steps are doing.
# Perform DE and store the results back in the dds object
dds <- DESeq(dds)
# Save the results to res
res <- results(dds)
In brief, by default, DESeq2 is:
Get a summary of results by running the code below.
summary(res)
##
## out of 19859 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up) : 1860, 9.4%
## LFC < 0 (down) : 1531, 7.7%
## outliers [1] : 8, 0.04%
## low counts [2] : 5006, 25%
## (mean count < 8)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
Order by the smallest adjusted p value, and have a look at the top 5/bottom 5 DE genes.
res <- res[order(res$padj), ]
res
## log2 fold change (MLE): condition KO vs Wild
## Wald test p-value: condition KO vs Wild
## DataFrame with 19859 rows and 6 columns
## baseMean log2FoldChange lfcSE stat pvalue
## <numeric> <numeric> <numeric> <numeric> <numeric>
## Sp110 11885.49 7.86072 0.303680 25.8849 9.85207e-148
## Echdc3 1406.80 4.52230 0.218622 20.6854 4.68414e-95
## Krt2 32524.23 -4.47847 0.218163 -20.5281 1.20752e-93
## Tie1 5235.08 3.58214 0.175835 20.3722 2.94985e-92
## Dhtkd1 2395.38 4.34892 0.215641 20.1674 1.89282e-90
## ... ... ... ... ... ...
## Ttll8 0.571604 1.180333 3.05766 0.386024 0.699479
## Trav6-4 0.133636 1.050453 4.08047 0.257434 0.796844
## Gm5313 2.446997 0.623172 1.28897 0.483467 0.628764
## Vmn1r-ps66 0.202760 1.050453 4.08047 0.257434 0.796844
## Gm8546 0.520375 2.543885 2.60963 0.974808 0.329656
## padj
## <numeric>
## Sp110 1.46254e-143
## Echdc3 3.47680e-91
## Krt2 5.97521e-90
## Tie1 1.09476e-88
## Dhtkd1 5.61979e-87
## ... ...
## Ttll8 NA
## Trav6-4 NA
## Gm5313 NA
## Vmn1r-ps66 NA
## Gm8546 NA
From the above, we can see that DE was performed for KO vs Wild samples for 19,859 genes and 6 columns (6 samples). We then see a table of DE results. The column headers include:
Differentially expressed genes are usually defined by cut-offs for two metrics, which are the adjusted p-value and the fold change value. We commonly see differential expression defined as genes with:
This is somewhat arbitrary - we need to have just the right number of differentially expressed genes to perform functional enrichment analysis (around 100 - 3,000 is a general guide). Gene expression should be thought of in a biological context - we care about the “top” most differentially expressed genes.
Here we will use padj < 0.05 as our cut-off value for significance and use these genes for enrichment analysis.
# Redefine the significance cut-off used for independent filtering (default = 0.1).
# This should be done if we want to use p adj to a value other than 0.1
res_padj0.05 <- results(dds, alpha = 0.05)
# Subset the results and write these to an output file
resSig005_subset <- subset(res_padj0.05, padj < 0.05)
write.table(resSig005_subset,
"res_DeSeq2_FDR0.05_comparison_Wild_vs_KO_FUllMatrix.tab",
sep = "\t",
col.names = NA,
quote = F)
# Reformat the output results into a data.frame
resSig005_subset <- data.frame(genes = row.names(resSig005_subset), resSig005_subset)
# We can also order padj-filtered results by log fold change
resSig005_subset_lfc <- resSig005_subset[order(resSig005_subset$log2FoldChange), ]
# Notice how our summary of results has changed slightly now
summary(res_padj0.05)
##
## out of 19859 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up) : 1353, 6.8%
## LFC < 0 (down) : 984, 5%
## outliers [1] : 8, 0.04%
## low counts [2] : 5006, 25%
## (mean count < 8)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
Normalized count data can be used for visualisation/other analyses. The code below extracts and prints normalized counts to file.
# Extract normalized counts from dds
normalised_counts <- counts(dds, normalized = TRUE)
# Save normalized counts (tab separated) to file
write.table(normalised_counts,
"normalised_all_samples_DeSeq2_FUllMatrix.tab",
sep = "\t",
col.names = NA,
quote = F)
The volcano plot is a scatterplot that shows magnitude of change (fold change, x axis) against statistical significance (p-value, y axis). It provides an overall visual snapshot of the number of up and downregulated genes that are statistically significant.
# Create a basic volcano plot (scatter plot) with x axis = LogFC, y axis = -log10(pvalue)
resdata <- as.data.frame(res)
# Define whether genes are significantly DE or not and store this in a new column called DE
resdata$Significant <- "No"
resdata$Significant[resdata$log2FoldChange > 1 & resdata$pvalue < 0.05 ] <- "Upregulated"
resdata$Significant[resdata$log2FoldChange < -1 & resdata$pvalue < 0.05 ] <- "Downregulated"
# Create the volcano plot
p <- ggplot(data=resdata,
aes(x=log2FoldChange, y=-log10(pvalue), col=Significant)) + geom_point()
# Add significance lines at log2FoldChange -1, 1 and pvalue 0.05
p2 <- p + geom_vline(xintercept=c(-1, 1), col = "red") +
geom_hline(yintercept=-log10(0.05), col = "red")
# Print the plot
p2
We have applied low read-count filtering followed by appropriate statistical tests using the DESeq2 package for identification of the differentially expressed geens across our conditions of interest.
However we recommend that you visualise a few genes (of specific interest or otherwise) to check if the identification of these genes is supported by sufficient read-counts.
Use plotCounts function to plot normalized counts for a single gene of interest. Here we plot
plotCounts(dds,
gene="Dip2b",
intgroup="condition")
Choose a significant gene that is downregulated in the knockout mice. Enter the plotCounts code in the grey box below to plot the normalized counts for each sample for the gene you have chosen.
Before we get too excited about our results, we need to confirm that DESeq2’s assumptions were met and that statistical analysis was performed appropriately. We will explore a few plots and concepts to better understand what is happening under the hood.
The MA plot provides an overview of the relationship between significantly differentially expressed genes, gene expression and log fold change in the form of a scatter plot. Each dot on the plot represents a single gene, significant genes are coloured as a blue dot. The average gene expression is on the x axis (expressed as a mean of normalized counts) and the log fold change is on the Y axis.
# There is another R function called plotMA, so we need to specify to use DESeq2's plotMA
DESeq2::plotMA(res, ylim = c(-12, 12))
There are a few things we notice:
The above describe the expected characteristics of an MA plot - if you have something that looks different, you will need to investigate further and confirm you have no unwanted technical/biological artefacts skewing the results.
The dispersion plot is useful to examine whether your data is meeting DESeq2’s assumptions around heteroskedasticity and that the data fits DESeq2’s model well. Dispersion is how DESeq2 quantifies variability in the data. It considers variance and mean expression within each experimental group.
Let’s use plotDispEsts() to generate the dispersion plot and discuss what this means.
# Plot dispersion estimates using dds
# Note - we have set our experimental design to ~ condition and it is using this to estimate dispersion
plotDispEsts(dds)
There are a few things to note:
There is a lot happening here, but the main point is that our dispersion plot looks as expected and plots should generally appear like this. Check this website for a deeper explanation of this concept, and for examples of what bad dispersion plots look like.
Remember that for every gene, we perform a statistical test to determine whether gene expression is significantly different in the knockout samples, compared to the wildtype. This results in thousands (~20,000 genes in the mouse genome) of p-values. We can look at the histogram of p-values to see how our well our statistical test behaves before we apply correction for multiple testing.
# Bin frequency of p-value counts by 0.05 incremets (i.e plot 20 columns from p-value of 0 to 1)
hist(res$pvalue,
breaks = 20,
col = "grey")
A nice histogram of p-values will have a peak at the 0.05 end, and a uniform frequency at all other p-value bins. Think back to your null and alternate hypothesis. Under the null hypothesis, there is a 5% chance of genes will fall under p-value 0.05, 10 % for p-value under 0.1, etc. The high peak at the first bin (p-value 0 - 0.5) represents genes that reject the null hypothesis (in addition to all the false discoveries - hence our need to adjust for multiple testing!).
A histogram of p-values that looks anything other than what is described above means that something weird has happened and you may need to contact your local statistician/bioinformatician.
This blog post has a nice explanation of each scenario if you want to explore this further.
We now have a list of significant DE genes. To gain greater biological insights on the DE genes we can determine if there is enrichment of known biological functions, pathways or interactions.
There are different methods for identifying functional enrichments, e.g.:
Over representation analysis (ORA) In ORA methods, we first identify a list of differentially expressed (DE) genes by applying an arbitrary statistical cutoff of say padj<0.05, similar to what we have done today. The list of DE genes are then grouped into specific categories such as gene ontology, pathways etc and these groups are checked for enrichment probability against the total number of genes belonging to the same category from the genome of the organism of interest, and check statistical significance.
Gene set enrichment analysis (GSEA) GSEA methods work on the premise that not only can large changes in individual genes have significant effects on functional categories such as gene ontolgies/pathways (as detected using ORA methods), but the weaker but coordinated changes in sets of functionally related genes can also be of high significance. GSEA methods do not use a set an arbitrary thresholds to identify ‘significant genes’ but rather use all genes for enrichment analysis. You can read more about the GSEA method at https://www.gsea-msigdb.org/gsea/index.jsp
Enrichment analysis can be done using both- freely available web-tools or R-based tools. Today we will use a R-package “clusterprofiler” to perform Over representation analysis (ORA) against the category gene ontology.
One of the most widely-used categorizations is Gene Ontology (GO) established by the Gene Ontology project. To describe the functional roles of genes and gene products, they are categorized into GO terms. GO terms are organized into three independent controlled vocabularies (ontologies) in a species-independent manner: - Biological process: Refers to the biological role involving the gene or gene product, and could include “transcription”, “signal transduction”, and “apoptosis”. A biological process generally involves a chemical or physical change of the starting material or input. - Molecular function: Represents the biochemical activity of the gene product, such activities could include “ligand”, “GTPase”, and “transporter”. - Cellular component: Refers to the location in the cell of the gene product. Cellular components could include “nucleus”, “lysosome”, and “plasma membrane”.
GO enrichment analysis tools will determine GO terms that are enriched when you supply it will a list of DE genes.
The clusterProfiler package implements methods to analyze and visualize functional profiles of genes. The clusterProfiler R-library supports functional characteristics of both coding and non-coding genomics data for thousands of species with up-to-date gene annotation. It provides a tidy interface to access, manipulate, and visualize enrichment results to help users achieve efficient data interpretation.
We will use clusterProfiler to perform enrichment analysis for upregulated and downregulated significant DE genes separately. We can use the resSig005_subset_lfc dataframe created earlier and prepare the data for clusterProfiler.
# We will use the resSig005_subset_lfc dataframe created earlier
# This has been filtered for padj < 0.05 and |LFC| > 1 DE genes
# Upregulated genes
# Filter for significant upregulated genes by log2 fold change > 1. Remove NAs.
sig.up <- resSig005_subset_lfc[resSig005_subset_lfc$log2FoldChange > 1, ]
sig.up <- na.omit(sig.up)
# Create list by decreasing log fold change for upregulated genes
sig.up.LFC <- sig.up$log2FoldChange
names(sig.up.LFC) <- rownames(sig.up)
# Sort by LFC, decreasing
sig.up.LFC <- sort(sig.up.LFC, decreasing = TRUE)
# Downregulated genes - let's do the same thing
sig.dn <- resSig005_subset_lfc[resSig005_subset_lfc$log2FoldChange < -1, ]
# Filter for significant upregulated genes by log2 fold change < -1. Remove NAs.
sig.dn <- na.omit(sig.dn)
# Create list by decreasing log fold change for upregulated genes
sig.dn.LFC <- sig.dn$log2FoldChange
names(sig.dn.LFC) <- rownames(sig.dn)
# Sort by LFC, decreasing
sig.dn.LFC <- sort(sig.dn.LFC, decreasing = TRUE)
You can check that you have correctly prepared the data by inspecting sig.up.LFC and sig.dn.LFC in the console. Use class() to check what type of R class we have stored the data in.
clusterprofiler implements the function enrichGO() for gene ontology over-representation test. Let’s run the analysis for up-regulated genes.
You can check what the different parameters in the function do by running the command ?enrichGO. The command will take a few minutes to run.
ego.up <- enrichGO(gene = names(sig.up.LFC),
OrgDb = org.Mm.eg.db,
keyType = 'SYMBOL',
readable = FALSE,
ont = "ALL",
pAdjustMethod = "BH",
pvalueCutoff = 0.05,
qvalueCutoff = 0.05)
The bar plot is a commonly used method to visualize enriched terms. It depicts the enrichment scores (e.g. p-adj values) and gene count or ratio as bar height and colour.
barplot(ego.up,
showCategory = 20)
A dot plot is similar to a scatter plot and bar plot with the capability to encode another score as dot size. In R the dot plot displays the index (each category) in the vertical axis and the corresponding value in the horizontal axis, so you can see the value of each observation following a horizontal line from the label.
dotplot(ego.up,
showCategory = 20,
font.size = 8)
Both the barplot and dotplot only displayed most significant enriched terms, while users may want to know which genes are involved in these significant terms. The cnetplot depicts the linkages of genes and biological concepts (e.g. GO terms or KEGG pathways) as a network.
cnetplot(ego.up,
categorySize = "pvalue",
foldChange = sig.up.LFC,
cex_label_gene = 0.7,
showCategory = 5,
cex_label_category = 1.2,
shadowtext = 'category')
The heatplot is similar to cnetplot, while displaying the relationships as a heatmap. The gene-concept network may become too complicated if user want to show a large number significant terms. The heatplot can simplify the result and more easy to identify expression patterns.
heatplot(ego.up,
showCategory = 20,
foldChange = sig.up.LFC)
A breakout room task: Try re-doing the above steps for downregulated genes. Please remember to rename the variable differently so as to not over-write them e.g. ego.dn in place of ego.up etc…*
This will be a good time to familiarise yourself with the various functions you have used. Remember, you can try and run ?Function_NAME e.g. ?enrichGO to get the help manual pages for that function. You can play with pvalue and qvalue cutoffs, categories etc. Try and interpret the results. We will discuss them at the end of this session.
ClusterProfiler has other functions:
And many more …
Urgent need for consistent standards in functional enrichment analysis https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009935#sec007
It is good practice to record the version of R and all tools you are using for reproducibility purposes (and for the methods section in your paper!). R’s sessionInfo() prints all of this information.
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] pathfindR_1.6.1 pathfindR.data_1.1.1
## [3] clusterProfiler_4.2.2 biobroom_1.26.0
## [5] broom_0.7.9 org.Mm.eg.db_3.14.0
## [7] AnnotationDbi_1.56.2 annotables_0.1.91
## [9] biomaRt_2.50.3 pheatmap_1.0.12
## [11] forcats_0.5.1 stringr_1.4.0
## [13] purrr_0.3.4 readr_2.0.0
## [15] tidyr_1.1.3 tidyverse_1.3.1
## [17] tibble_3.1.3 dplyr_1.0.7
## [19] rstudioapi_0.13 devtools_2.4.2
## [21] usethis_2.0.1 factoextra_1.0.7
## [23] ggplot2_3.3.5 gplots_3.1.1
## [25] RColorBrewer_1.1-2 edgeR_3.36.0
## [27] limma_3.50.3 DESeq2_1.34.0
## [29] SummarizedExperiment_1.24.0 Biobase_2.54.0
## [31] MatrixGenerics_1.6.0 matrixStats_0.62.0-9000
## [33] GenomicRanges_1.46.1 GenomeInfoDb_1.30.1
## [35] IRanges_2.28.0 S4Vectors_0.32.4
## [37] BiocGenerics_0.40.0
##
## loaded via a namespace (and not attached):
## [1] utf8_1.2.2 tidyselect_1.1.1 RSQLite_2.2.7
## [4] grid_4.1.0 BiocParallel_1.28.3 scatterpie_0.1.6
## [7] munsell_0.5.0 codetools_0.2-18 withr_2.4.2
## [10] colorspace_2.0-2 GOSemSim_2.20.0 filelock_1.0.2
## [13] highr_0.9 knitr_1.33 DOSE_3.20.1
## [16] labeling_0.4.2 GenomeInfoDbData_1.2.7 polyclip_1.10-0
## [19] bit64_4.0.5 farver_2.1.0 rprojroot_2.0.2
## [22] downloader_0.4 treeio_1.18.1 vctrs_0.3.8
## [25] generics_0.1.0 xfun_0.25 BiocFileCache_2.2.1
## [28] doParallel_1.0.16 R6_2.5.0 graphlayouts_0.7.1
## [31] locfit_1.5-9.4 bitops_1.0-7 cachem_1.0.5
## [34] fgsea_1.20.0 DelayedArray_0.20.0 assertthat_0.2.1
## [37] scales_1.1.1 ggraph_2.0.5 enrichplot_1.14.2
## [40] gtable_0.3.0 processx_3.5.2 tidygraph_1.2.0
## [43] rlang_0.4.11 genefilter_1.76.0 splines_4.1.0
## [46] lazyeval_0.2.2 BiocManager_1.30.16 yaml_2.2.1
## [49] reshape2_1.4.4 modelr_0.1.8 backports_1.2.1
## [52] qvalue_2.26.0 tools_4.1.0 ellipsis_0.3.2
## [55] jquerylib_0.1.4 sessioninfo_1.1.1 Rcpp_1.0.7
## [58] plyr_1.8.6 progress_1.2.2 zlibbioc_1.40.0
## [61] RCurl_1.98-1.3 ps_1.6.0 prettyunits_1.1.1
## [64] viridis_0.6.1 haven_2.4.3 ggrepel_0.9.1
## [67] fs_1.5.0 magrittr_2.0.1 data.table_1.14.0
## [70] DO.db_2.9 reprex_2.0.1 ggnewscale_0.4.5
## [73] pkgload_1.2.1 hms_1.1.0 patchwork_1.1.1
## [76] evaluate_0.14 xtable_1.8-4 XML_3.99-0.6
## [79] readxl_1.3.1 gridExtra_2.3 testthat_3.0.4
## [82] compiler_4.1.0 shadowtext_0.0.8 KernSmooth_2.23-20
## [85] crayon_1.4.1 htmltools_0.5.1.1 ggfun_0.0.2
## [88] tzdb_0.1.2 geneplotter_1.72.0 aplot_0.0.6
## [91] lubridate_1.7.10 DBI_1.1.1 tweenr_1.0.2
## [94] dbplyr_2.1.1 MASS_7.3-54 rappdirs_0.3.3
## [97] Matrix_1.3-3 cli_3.0.1 parallel_4.1.0
## [100] igraph_1.2.6 pkgconfig_2.0.3 rvcheck_0.1.8
## [103] foreach_1.5.1 xml2_1.3.2 ggtree_3.2.1
## [106] annotate_1.72.0 bslib_0.2.5.1 XVector_0.34.0
## [109] rvest_1.0.1 yulab.utils_0.0.5.001 callr_3.7.0
## [112] digest_0.6.27 Biostrings_2.62.0 rmarkdown_2.10
## [115] cellranger_1.1.0 fastmatch_1.1-3 tidytree_0.3.4
## [118] curl_4.3.2 gtools_3.9.2 lifecycle_1.0.0
## [121] nlme_3.1-152 jsonlite_1.7.2 desc_1.3.0
## [124] viridisLite_0.4.0 fansi_0.5.0 pillar_1.6.2
## [127] lattice_0.20-44 KEGGREST_1.34.0 fastmap_1.1.0
## [130] httr_1.4.2 pkgbuild_1.2.0 survival_3.2-11
## [133] GO.db_3.14.0 glue_1.4.2 remotes_2.4.0
## [136] iterators_1.0.13 png_0.1-7 bit_4.0.4
## [139] ggforce_0.3.3 stringi_1.7.3 sass_0.4.0
## [142] blob_1.2.2 caTools_1.18.2 memoise_2.0.0
## [145] ape_5.5
If you want to knit this .Rmd file to a HTML file, click Knit > Knit to HTML. You will then need to copy the HTML file from Nimbus to your local computer.
Open a terminal on you local computer. Navigate to a path of your choice.
Run the following command in the terminal on your local computer (make sure to replace the IP address with your own)
scp training@146.118.64.117:~/base_directory/working_directory/RNASeq_Part2_*.html ./