DEprot
analyses examples - RIMEEnvironment details
# Working directory structure
dir.create(path = "./RIME/objects", recursive = T, showWarnings = F)
dir.create(path = "./RIME/tables", recursive = T, showWarnings = F)
dir.create(path = "./RIME/plots", recursive = T, showWarnings = F)
# Required libraries
require(DEprot)
require(dplyr)
require(ggplot2)
# Optional attachment
require(data.table)
require(patchwork)
require(ggpubr)
require(purrr)
# Not included in the DEprot dependencies
## To install:
## BiocManager::install("AnnotationDbi")
require(AnnotationDbi)
Data have been obtained from N.Eickhoff et al. (Communications Biology, 2025) and data are accessible at the ProteomeXchange Consortium through the PRIDE partner repository with the identifier PXD058914.
LFQ intensities, and corresponding metadata, for a selection of
samples is available in the ./source folder. This dataset
involves experiments of Rapid immunoprecipitation mass spectrometry of
endogenous proteins (RIME) for the
Androgen Receptor (AR) in prostate cancer cells. Notably, these data
have been generated in 3 different batches: each batch has been analyzed
independently using MaxQuant, later
the 3 unimputed intensities tables have been merged using accession
number as merging-key.
The data involve LNCaP and LNCaP-abl (resistant to hormone deprivation but still expressing AR) prostate cancer cells. LNCaP are either in the wild-type (WT) version, or monoclonal knock-out for the TRIM33 protein (TRIM33-5#MC-C2) - and respective non-targeting control (NT-3) -. Cells have been cultured using medium supplied with regular Fetal-Bovine Serum (FBS) or hormone deprived for 3 days before being stimulated with 5nM R1881 (synthetic androgen) for 4 hours. For each condition immunoprecipitation with AR and IgG (negative control) was performed in 4 biological replicates. Pulled-down proteins were detected using Label-Free Quantitation (LFQ) mass spectrometry. Peptide mixtures were analyzed by nanoLC-MS/MS on an Orbitrap Exploris 480 Mass Spectrometer equipped with an EASY-NLC 1200 system (Thermo Scientific).
## Import metadata
metadata <- data.table::fread("./sources/RIME_metadata_combined.txt", data.table = F)
metadata
## Import combined LFQ-intensities
lfq <- data.table::fread("./sources/RIME_LFQ_intensities_unimputed_combined.txt", data.table = F)
rownames(lfq) <- lfq$gene.id
head(lfq)
NOTE: LFQ-intensities contain NAs because they belong to unimputed data. Further,
gene.iddoes not contain unique values, therfore a themake.uniquefunction was applied: duplicated gene IDs contain a.Xindex and can be used as rownames/unique identifiers.
DEprot object (dpo)As above mentioned, the data belong to 3 different batches, therefore a batch correction will be required. For this reason, the data must be loaded as “raw” data.
dpo <- load.counts(counts = lfq[,-c(1:2)], # removing IDs
metadata = metadata,
log.base = 2,
imputation = NA,
normalization.method = NA,
column.id = "column.id")
dpo
## DEprot object:
## Samples: 32
## Proteins: 3285
## Counts available: raw
## Log transformation: log2
## Metadata columns: column.id, batch, cell, condition, treatment, target, group, replicate
Proteomics is highly susceptible to batch effects. Hereafter we use
the HarmonizR
package, developed by H.Voß &
S.Schlumbohm (Nat.Commun., 2022). This tool besides
handling multiple experiments, allows for the batch correction of data
derived by the combination of both DIA (Data-Independent Acquisition)
and DDA (Data-Dependent Acquisition).
To use the the harmonize.batches function, it is
sufficient to provide a DEprot object containing a combined
table of multiple experiments and indicate the identifier column in the
metadata table which corresponds to the batch annotation (in our
example: batch).
HarmonizRis not a required dependency, therefore if this function is used andHarmonizRis not already installed, a warning will indicate the requred installation. The package can be installed via:BiocManager::install("HarmonizR"), or alternativelydevtools::install_github("https://github.com/SimonSchlumbohm/HarmonizR/", subdir="HarmonizR").
dpo <- harmonize.batches(DEprot.object = dpo,
batch.column = "batch",
cores = 5)
dpo
## DEprot object:
## Samples: 32
## Proteins: 3153
## Counts available: raw, normalized
## Log transformation: log2
## Metadata columns: column.id, batch, cell, condition, treatment, target, group, replicate
The batch-corrected data will be stored in the “norm.counts” slot of the dpo object (notice the “Counts available: raw, normalized”).
It is possible to compare the distribution of the data before and
after normalization using the boxplots already created in the dpo object
or using the function plot.counts:
patchwork::wrap_plots(dpo@boxplot.raw + xlab(NULL),
dpo@boxplot.norm + xlab(NULL),
ncol = 1)
Often many NA/NaN values are present in the
LFQ tables due to the technical limitations of the protein detection by
Mass Spectrometry (MS) experiments, or due to biological reasons (e.g.,
IgG negative control in RIMEs).
protein.counts.bySample <-
protein.summary(DEprot.object = dpo,
group.column = "column.id",
n.labels = "counts",
show.frequency = T,
x.label.angle = 0,
title = "**# Proteins identified per _Sample_**",
colors = c("#9c9ab5", "navy")) +
theme(axis.text.x = element_text(angle = 30, hjust = 1))
protein.counts.bySample
Notably, visualization of the amount of proteins detected per group shows that IgG samples have - as expected - less reproducible protein detection:
protein.counts.byGroup <-
protein.summary(DEprot.object = dpo,
group.column = "group",
n.labels = "percentage",
show.frequency = T,
x.label.angle = 0,
title = "**# Proteins identified per _Group_**") +
theme(axis.text.x = element_text(angle = 30, hjust = 1))
protein.counts.byGroup
DEprot allows for the possibility to use 9 different
methods for the imputation of the data:
missForest:
developed by DJ.Stekhoven &
P.Buehlmann (Bioinformatics, 2012), this tool will impute
the NaN and assign and estimated value. It also yields an out-of-bag
(OOB) imputation error estimate (general, or per each sample). Moreover,
it can be run parallel to save computation time (both examples reported
here after).Of note, there are cases in which the missing data are not due to random missingness but rather consequential to real biological effects (i.e. knock-down, knock-out, degron systems and PROTACs, etc.). This may lead to the issue that a protein completely missing in all the replicates of an individual condition will be imputed with relatively high values This would ultimately lead to the incorrect definition of the differential status of this protein.
In the specific case of our RIME dataset these “non-random” missing values can be found in two cases: * IgG control: the IgG pull-down “background” proteins that might be detected in a non reproducible manner (e.g. 1 out of 4 replicates); * TRIM33 knock-out: TRIM33 protein is “biologically” not present in the LNCaP_TRIM33-5#MC-C2 cell line.
To overcome this hurdle, it is possible to use the function
randomize.missing.values to assign random values from the
bottom distribution of the full dataset. The user can define a
percentage threshold of minimal missing values within the group; for
instance if the threshold is 75% and in each group there are 4 samples
(as in our dataset), it means that only proteins with at least 3 out of
4 missing values will be imputed with this method. It will be sufficient
to indicate the ID of a column from the metadata to assign the
groups/conditions alongside the percentage of bottom distribution from
which to select random numbers (“tail.percentage” parameter).
The imputation will overwrite the normalized data adding the imputed values. For this reason we will store the unimputed results in another variable in order to be able to compare the data before and after random-imputation.
## Store old data
dpo_unimputed <- dpo
## Imputation non-random missing values
dpo <- randomize.missing.values(DEprot.object = dpo,
group.column = "group",
percentage.missing = 75, # 3~4 out of 4 missing
tail.percentage = 3,
seed = 168,
verbose = F)
Hereafter we will compare the differences upon non-random imputation for the two cases mentioned before.
In this comparison we will show the differences between an AR-RIME and the IgG for LNCaP cells (batch A). 25 random proteins will be displayed.
## Define random proteins to show
set.seed(168)
proteins.to.show <- sample(rownames(dpo@norm.counts), size = 25)
rime_unimputed <-
heatmap.counts(DEprot.object = dpo_unimputed,
which.data = "normalized",
sample.subset = metadata[metadata$batch == "A", "column.id"],
protein.subset = proteins.to.show,
clust.columns = F,
clust.rows = F,
show.protein.names = T,
na.color = "firebrick",
color.limits = c(19,31),
title = "Unimputed")
rime_random <-
heatmap.counts(DEprot.object = dpo,
which.data = "normalized",
sample.subset = metadata[metadata$batch == "A", "column.id"],
protein.subset = proteins.to.show,
clust.columns = F,
clust.rows = F,
show.protein.names = F,
na.color = "firebrick",
color.limits = c(19,31),
title = "Random")
patchwork::wrap_plots(rime_unimputed@heatmap, rime_random@heatmap)
In this comparison we will show the differences of TRIM33 intensities in the AR-RIME for LNCaP_NT-3 (control) vs LNCaP_TRIM33-5#MC-C2 (batch C, AR target).
rime_unimputed_trim33 <-
heatmap.counts(DEprot.object = dpo_unimputed,
which.data = "normalized",
sample.subset = metadata[metadata$batch == "C" & metadata$target == "AR", "column.id"],
protein.subset = "TRIM33",
clust.columns = F,
clust.rows = F,
show.protein.names = T,
na.color = "firebrick",
color.limits = c(22,28),
title = "Unimputed")
rime_random_trim33 <-
heatmap.counts(DEprot.object = dpo,
which.data = "normalized",
sample.subset = metadata[metadata$batch == "C" & metadata$target == "AR", "column.id"],
protein.subset = "TRIM33",
clust.columns = F,
clust.rows = F,
show.protein.names = T,
na.color = "firebrick",
color.limits = c(22,28),
title = "Random")
patchwork::wrap_plots(rime_unimputed_trim33@heatmap,
rime_random_trim33@heatmap +
theme(axis.text.x = element_blank(),
legend.position = "none"),
ncol = 1)
DEprot provides 9 methods to imputed data: missForest,
kNN, corkNN, tkNN, LLS, SVD, RegImpute, bPCA, pPCA.
We implemented
the function compare.imp.methods to compare the different
imputation methods and choose the best one for your data type. Indeed,
your dataset is subset and NAs are introduce simulating distribution and
proportion of the original data.
rime_imp_comparison <- compare.imp.methods(DEprot.object = dpo,
percentage.test = 100,
correlation.method = "pearson",
sample.group.column = "group",
seed = 1234,
pcaMethods.nPCs.to.test = 3)
saveRDS(object = rime_imp_comparison,
file = "./RIME/objects/rime_imputation_comparison.Rds")
rime_imp_comparison
pdf("./RIME/plots/rime_imputation_comparison_radar.plot.pdf",
width = 10, height = 6)
replayPlot(radar_plot_imp)
invisible(dev.off())
Considering the RMSE and correlation coefficients the best method for
these RIME data is RegImpute.
RegImpute imputationIn order to be able to compare the impact of
randomize.missing.values function on knock-out proteins, we
will impute the data on both the dpo and
dpo_unimputed objects.
## Before randomize.missing.values
dpo_unimputed <- impute.counts(DEprot.object = dpo_unimputed,
method = "RegImpute",
use.normalized.data = TRUE,
verbose = TRUE)
## After randomize.missing.values
dpo <- impute.counts(DEprot.object = dpo,
method = "RegImpute",
use.normalized.data = TRUE,
verbose = TRUE)
dpo
### Export objects
saveRDS(object = dpo_unimputed, "./RIME/objects/dpo_imputedRegImpute.wo.random.Rds")
saveRDS(object = dpo, "./RIME/objects/dpo_random.and.imputedRegImpute.Rds")
Similarly to the post-normalization data, it is now possible to compare the distribution of the data post-imputation using boxplots:
patchwork::wrap_plots(dpo@boxplot.raw + xlab(NULL),
dpo@boxplot.norm + xlab(NULL),
dpo@boxplot.imputed + xlab(NULL),
ncol = 1)
It is appreciable how the IgG negative controls - as expected - display a lower distribution compared to AR-RIME.
Remarkably, comparing the 2-steps imputation (random + RegImpute) and the RegImpute alone, is possible to observe that the double imputation method shows a strikingly different distribution for the bottom tale compare to the RegImpute-alone one. This implies that RegImpute alone is not able to properly impute non-random missing values and tends to “homogenize” the intensities.
comp_density_imputation =
ggpubr::ggdensity(reshape2::melt(data = dpo@raw.counts)[,3],
fill = "gray",
title = "Raw data") +
xlab("LFQ") +
theme(aspect.ratio = 1) +
ggpubr::ggdensity(reshape2::melt(data = dpo_unimputed@imputed.counts)[,3],
fill = "gray",
title = "Imputed RegImpute") +
xlab("LFQ") +
theme(aspect.ratio = 1) +
ggpubr::ggdensity(reshape2::melt(data = dpo@imputed.counts)[,3],
fill = "gray",
title = "Imputed Random + RegImpute") +
theme(aspect.ratio = 1) +
xlab("LFQ")
comp_density_imputation
Now it is possible to add a third heatmap (imputed counts) to the previous comparisons.
rime_random_reg <-
heatmap.counts(DEprot.object = dpo,
which.data = "imputed",
sample.subset = metadata[metadata$batch == "A", "column.id"],
protein.subset = proteins.to.show,
clust.columns = FALSE,
clust.rows = FALSE,
show.protein.names = FALSE,
na.color = "firebrick",
color.limits = c(19,31),
title = "Random + RegImpute")
patchwork::wrap_plots(rime_unimputed@heatmap, rime_random@heatmap, rime_random_reg@heatmap)
Here we can compare the effects of omitting the
randomize.missing.values step for proteins that are missing
not at random.
rime_RegImpute.only_trim33 <-
heatmap.counts(DEprot.object = dpo_unimputed,
which.data = "imputed",
sample.subset = metadata[metadata$batch == "C" & metadata$target == "AR", "column.id"],
protein.subset = "TRIM33",
clust.columns = F,
clust.rows = F,
show.protein.names = T,
na.color = "firebrick",
color.limits = c(22,28),
title = "RegImpute only")
patchwork::wrap_plots(rime_random_trim33@heatmap,
rime_RegImpute.only_trim33@heatmap +
theme(axis.text.x = element_blank(),
legend.position = "none"),
ncol = 1)
Once the data are normalized and imputed, it is good practice to
validate the quality and reproducibility of the data.
DEprot can generate correlations and perform principal
component analyses (PCA) between samples.
The correlations can be performed globally as well as within sample subgroups. The clustering distance is equivalent to the \(1 - correlation\).
correlation_all <-
plot.correlation.heatmap(DEprot.object = dpo,
correlation.method = "spearman",
which.data = "imputed",
display.values = FALSE,
plot.subtitle = "All samples")
correlation_all
correlation_AR <-
plot.correlation.heatmap(DEprot.object = dpo,
correlation.method = "spearman",
which.data = "imputed",
sample.subset = metadata[metadata$target == "AR", "column.id"],
display.values = TRUE,
plot.subtitle = "AR RIME")
correlation_IgG <-
plot.correlation.heatmap(DEprot.object = dpo,
correlation.method = "spearman",
which.data = "imputed",
sample.subset = metadata[metadata$target == "IgG", "column.id"],
display.values = TRUE,
plot.subtitle = "IgG RIME")
patchwork::wrap_plots(correlation_AR@heatmap, correlation_IgG@heatmap)
Similarly to the correlation, it is possible to asses the quality of the the data analyzing the segregation of the data in principal-component space.
## Compute PCs for all samples
PCA <- perform.PCA(DEprot.object = dpo, which.data = "imputed")
## Plotting PC1-vs-PC2 and PC2-vs-PC3 (all samples)
plot.PC.scatter.123(DEprot.PCA.object = PCA,
color.column = "group",
shape.column = "batch",
label.column = "replicate",
plot.zero.line.x.12 = FALSE,
dot.colors = rev(RColorBrewer::brewer.pal(n = 8, name = "Paired")))
AR-RIMEs only
## Compute PCs for AR-RIMES
PCA_AR <- perform.PCA(DEprot.object = dpo,
which.data = "imputed",
sample.subset = metadata[metadata$target == "AR", "column.id"])
## Plotting PC1-vs-PC2 and PC2-vs-PC3 (all samples)
plot.PC.scatter.123(DEprot.PCA.object = PCA_AR,
color.column = "group",
shape.column = "batch",
label.column = "replicate",
plot.zero.line.x.12 = FALSE,
dot.colors = rev(RColorBrewer::brewer.pal(n = 8, name = "Paired"))[c(1,3,5,7)])
Two possible type of differential analysis are available in
DEprot:
limma: limma is a software
originally developed for the analyses of microarray expression data; it
fits the data into a negative binomial model in order to define
differences between groups.In the next examples we made use of limma to establish
differential protein enrichment. However, limma assumes
that the distribution of the intensity is normal. To validate this
assumption, the function check.normality can be employed to
perform an Anderson-Darling normality test per each sample, and plot Q-Q
plots and data distributions.
## Test normality
norm.test <- check.normality(DEprot.object = dpo, p.threshold = 0.5, which.data = "imputed")
## All samples display a normal distribution.
patchwork::wrap_plots(norm.test@qqplots)
patchwork::wrap_plots(norm.test@densities)
limmaHereafter, we will compare each AR-RIME with the corresponding IgG control, study the effects of TRIM33-KO on AR-interactome composition comparing LNCaP_NT-3_FBS_AR vs LNCaP_TRIM33-5#MC-C2_FBS_AR, and finally evaluate the AR-interactome when the cells are hormone-deprived LNCaP_WT_R1881_AR vs LNCaP-abl_WT_R1881_AR.
dpo_analyses <-
diff.analyses.limma(DEprot.object = dpo,
contrast.list = list(c("group", "LNCaP_NT-3_FBS_AR", "LNCaP_NT-3_FBS_IgG"),
c("group", "LNCaP_TRIM33-5#MC-C2_FBS_AR", "LNCaP_TRIM33-5#MC-C2_FBS_IgG"),
c("group", "LNCaP_WT_R1881_AR", "LNCaP_WT_R1881_IgG"),
c("group", "LNCaP-abl_WT_R1881_AR", "LNCaP-abl_WT_R1881_IgG"),
c("group", "LNCaP_TRIM33-5#MC-C2_FBS_AR", "LNCaP_NT-3_FBS_AR"),
c("group", "LNCaP_WT_R1881_AR", "LNCaP-abl_WT_R1881_AR")),
linear.FC.th = 2,
padj.th = 0.05,
padj.method = "BH", #Benjamini-Hochberg
fitting.method = "ls",
include.rep.model = TRUE,
replicate.column = "replicate",
which.data = "imputed")
saveRDS(dpo_analyses, "./RIME/objects/limma_differential_analyses.Rds")
dpo_analyses
## DEprot.analyses object:
## Counts used: imputed
## Fold Change threshold: 2 (linear)
## FC unresponsive range: [0.9090909,1.1] (linear)
## padj threshold: 0.05 (linear)
## padj method: BH
##
##
## Differential results summary:
## contrast.id
## 1 group: LNCaP_NT-3_FBS_AR vs LNCaP_NT-3_FBS_IgG
## 2 group: LNCaP_NT-3_FBS_AR vs LNCaP_NT-3_FBS_IgG
## 3 group: LNCaP_NT-3_FBS_AR vs LNCaP_NT-3_FBS_IgG
## 4 group: LNCaP_NT-3_FBS_AR vs LNCaP_NT-3_FBS_IgG
## 5 group: LNCaP_TRIM33-5#MC-C2_FBS_AR vs LNCaP_TRIM33-5#MC-C2_FBS_IgG
## 6 group: LNCaP_TRIM33-5#MC-C2_FBS_AR vs LNCaP_TRIM33-5#MC-C2_FBS_IgG
## 7 group: LNCaP_TRIM33-5#MC-C2_FBS_AR vs LNCaP_TRIM33-5#MC-C2_FBS_IgG
## 8 group: LNCaP_TRIM33-5#MC-C2_FBS_AR vs LNCaP_TRIM33-5#MC-C2_FBS_IgG
## 9 group: LNCaP_WT_R1881_AR vs LNCaP_WT_R1881_IgG
## 10 group: LNCaP_WT_R1881_AR vs LNCaP_WT_R1881_IgG
## 11 group: LNCaP_WT_R1881_AR vs LNCaP_WT_R1881_IgG
## 12 group: LNCaP_WT_R1881_AR vs LNCaP_WT_R1881_IgG
## 13 group: LNCaP-abl_WT_R1881_AR vs LNCaP-abl_WT_R1881_IgG
## 14 group: LNCaP-abl_WT_R1881_AR vs LNCaP-abl_WT_R1881_IgG
## 15 group: LNCaP-abl_WT_R1881_AR vs LNCaP-abl_WT_R1881_IgG
## 16 group: LNCaP-abl_WT_R1881_AR vs LNCaP-abl_WT_R1881_IgG
## 17 group: LNCaP_TRIM33-5#MC-C2_FBS_AR vs LNCaP_NT-3_FBS_AR
## 18 group: LNCaP_TRIM33-5#MC-C2_FBS_AR vs LNCaP_NT-3_FBS_AR
## 19 group: LNCaP_TRIM33-5#MC-C2_FBS_AR vs LNCaP_NT-3_FBS_AR
## 20 group: LNCaP_TRIM33-5#MC-C2_FBS_AR vs LNCaP_NT-3_FBS_AR
## 21 group: LNCaP_WT_R1881_AR vs LNCaP-abl_WT_R1881_AR
## 22 group: LNCaP_WT_R1881_AR vs LNCaP-abl_WT_R1881_AR
## 23 group: LNCaP_WT_R1881_AR vs LNCaP-abl_WT_R1881_AR
## 24 group: LNCaP_WT_R1881_AR vs LNCaP-abl_WT_R1881_AR
## group.factor group1 group2
## 1 group LNCaP_NT-3_FBS_AR LNCaP_NT-3_FBS_IgG
## 2 group LNCaP_NT-3_FBS_AR LNCaP_NT-3_FBS_IgG
## 3 group LNCaP_NT-3_FBS_AR LNCaP_NT-3_FBS_IgG
## 4 group LNCaP_NT-3_FBS_AR LNCaP_NT-3_FBS_IgG
## 5 group LNCaP_TRIM33-5#MC-C2_FBS_AR LNCaP_TRIM33-5#MC-C2_FBS_IgG
## 6 group LNCaP_TRIM33-5#MC-C2_FBS_AR LNCaP_TRIM33-5#MC-C2_FBS_IgG
## 7 group LNCaP_TRIM33-5#MC-C2_FBS_AR LNCaP_TRIM33-5#MC-C2_FBS_IgG
## 8 group LNCaP_TRIM33-5#MC-C2_FBS_AR LNCaP_TRIM33-5#MC-C2_FBS_IgG
## 9 group LNCaP_WT_R1881_AR LNCaP_WT_R1881_IgG
## 10 group LNCaP_WT_R1881_AR LNCaP_WT_R1881_IgG
## 11 group LNCaP_WT_R1881_AR LNCaP_WT_R1881_IgG
## 12 group LNCaP_WT_R1881_AR LNCaP_WT_R1881_IgG
## 13 group LNCaP-abl_WT_R1881_AR LNCaP-abl_WT_R1881_IgG
## 14 group LNCaP-abl_WT_R1881_AR LNCaP-abl_WT_R1881_IgG
## 15 group LNCaP-abl_WT_R1881_AR LNCaP-abl_WT_R1881_IgG
## 16 group LNCaP-abl_WT_R1881_AR LNCaP-abl_WT_R1881_IgG
## 17 group LNCaP_TRIM33-5#MC-C2_FBS_AR LNCaP_NT-3_FBS_AR
## 18 group LNCaP_TRIM33-5#MC-C2_FBS_AR LNCaP_NT-3_FBS_AR
## 19 group LNCaP_TRIM33-5#MC-C2_FBS_AR LNCaP_NT-3_FBS_AR
## 20 group LNCaP_TRIM33-5#MC-C2_FBS_AR LNCaP_NT-3_FBS_AR
## 21 group LNCaP_WT_R1881_AR LNCaP-abl_WT_R1881_AR
## 22 group LNCaP_WT_R1881_AR LNCaP-abl_WT_R1881_AR
## 23 group LNCaP_WT_R1881_AR LNCaP-abl_WT_R1881_AR
## 24 group LNCaP_WT_R1881_AR LNCaP-abl_WT_R1881_AR
## paired.test diff.status n median.FoldChange
## 1 TRUE LNCaP_NT-3_FBS_IgG 131 -1.905494859
## 2 TRUE LNCaP_NT-3_FBS_AR 868 2.840001726
## 3 TRUE unresponsive 362 0.001355977
## 4 TRUE null 1792 -0.179844475
## 5 TRUE LNCaP_TRIM33-5#MC-C2_FBS_IgG 130 -1.827148750
## 6 TRUE LNCaP_TRIM33-5#MC-C2_FBS_AR 922 2.936158396
## 7 TRUE unresponsive 370 -0.002209037
## 8 TRUE null 1731 -0.167189320
## 9 TRUE LNCaP_WT_R1881_IgG 185 -1.923493012
## 10 TRUE LNCaP_WT_R1881_AR 1011 2.267629438
## 11 TRUE unresponsive 302 0.003921392
## 12 TRUE null 1655 0.228013526
## 13 TRUE LNCaP-abl_WT_R1881_IgG 133 -1.636583027
## 14 TRUE LNCaP-abl_WT_R1881_AR 992 2.676888784
## 15 TRUE unresponsive 354 -0.012761041
## 16 TRUE null 1674 -0.143302902
## 17 TRUE LNCaP_NT-3_FBS_AR 46 -2.862151277
## 18 TRUE LNCaP_TRIM33-5#MC-C2_FBS_AR 30 2.655605197
## 19 TRUE unresponsive 1122 0.002161318
## 20 TRUE null 1955 -0.162504964
## 21 TRUE LNCaP-abl_WT_R1881_AR 43 -2.226926883
## 22 TRUE LNCaP_WT_R1881_AR 1056 3.027185641
## 23 TRUE unresponsive 774 0.011976388
## 24 TRUE null 1280 0.233719743
The RIME is a technique that involves nuclear isolation, this implies
that the presence of cytoplasmic proteins is most likely to be
attributed to “cytoplasmic contamination”. For this reason we will
select only nuclear proteins, and remove the ribosomal and mitochondrial
ones in the dpo_analyses object.
nucleus <- AnnotationDbi::select(org.Hs.eg.db::org.Hs.eg.db,
keytype = "GOALL",
keys = "GO:0005634", #nucleus
columns = c("SYMBOL", "UNIPROT"))
ribo_mito <- AnnotationDbi::select(org.Hs.eg.db::org.Hs.eg.db,
keytype = "GOALL",
keys = c("GO:0005840", "GO:0005739"), #ribosome, mitochondria
columns = c("SYMBOL", "UNIPROT"))
## Certain proteins have the .X suffix, so we collect the full names
nucleus_full <- rownames(dpo@raw.counts)[gsub("[.].*", "", rownames(dpo@raw.counts)) %in% nucleus$SYMBOL]
ribo_mito_full <- rownames(dpo@raw.counts)[gsub("[.].*", "", rownames(dpo@raw.counts)) %in% ribo_mito$SYMBOL]
## Filter dpo_analyses project
dpo_analyses_nuclear <- filter.proteins(DEprot.object = dpo_analyses,
proteins = nucleus_full,
mode = "keep")
dpo_analyses_nuclear <- filter.proteins(DEprot.object = dpo_analyses_nuclear,
proteins = ribo_mito_full,
mode = "remove")
Now the analyses can be exported using the automated function:
export.analyses(DEprot.analyses.object = dpo_analyses_nuclear, output.folder = "./RIME/export")
For the comparison of AR-RIMEs vs the respective IgG (contrasts 1-4),
we first expect that the AR-RIME displays more proteins than the IgG
negative control, and secondly that AR is among the top-enriched
proteins in the AR-RIME itself.
To verify this we will generate
volcano plots of the log2(Fold Change) AR-vs-IgG for each
group and highlight AR.
volcano_IgG <-
purrr::pmap(.l = list(contrast = 1:4,
up.color = rev(RColorBrewer::brewer.pal(n = 8, name = "Paired"))[c(1,3,5,7)]),
.f = function(contrast, up.color) {
plot.volcano(DEprot.analyses.object = dpo_analyses_nuclear,
contrast = contrast,
up.color = up.color,
down.color = "black",
dot.labels = "AR",
label.font.size = 5)
})
patchwork::wrap_plots(volcano_IgG)
On the other hand, for the TRIM33-KO we expect that the interaction of TRIM33 with AR is lost when TRIM33 is knocked-out.
plot.volcano(DEprot.analyses.object = dpo_analyses_nuclear,
contrast = 5,
up.color = "#FF7F00",
down.color = "#E31A1C",
dot.labels = c("AR", "TRIM33"),
label.font.size = 5)
As internal control, being the two conditions biologically similar, we do expect comparable results for the AR-RIME in LNCaP_WT_R1881_AR and LNCaP_NT-3_FBS_AR. For this we will compare the protein fold change for AR-vs-IgG for the two conditions and the overlap of differential proteins.
contrast.scatter(DEprot.analyses.object = dpo_analyses_nuclear,
contrast.x = 1,
contrast.y = 3,
correlation.method = "pearson")
plot.upset(DEprot.analyses.object = dpo_analyses_nuclear,
contrast.subset = c(1,3),
title = "Overalp AR-interactome",
subtitle = "LNCaP_WT_R1881 vs LNCaP_NT-3_FBS")
We are going to perform ORA analyses on the differential proteins between LNCaP-WT and the LNCaP-abl in order to understand how long-term hormone deprivation impacts the AR-interactome composition.
For these analyses a geneSet table is required as a reference. The
latter must be a data.frame with two columns: “gs_name” (geneSet name)
and “gene_symbol” (corresponding to the protein id).
In
DEprot we provide a ready-to-use list of protein complexes
obtained by the CORUM database.
Notice that the “gene_symbol” values must match the “prot.id” in the result tables. To this goal, the
geneset.enrichmentfunction contains the parametergsub.pattern.prot.idthat will be used to remove a pattern from the “prot.id”s (e.g., the duplicates suffix).
## Make CORUM geneSet
data("corum_v5.0", package = "DEprot")
corum_geneSet =
corum_v5.0 %>%
dplyr::filter(organism == "Human") %>%
dplyr::rename(gs_name = complex.name,
gene_symbol = protein.members) %>%
dplyr::select(gs_name, gene_symbol)
head(corum_geneSet)
## Run ORA
ORA.results =
geneset.enrichment(DEprot.analyses.object = dpo_analyses_nuclear,
contrast = 6,
TERM2GENE = corum_geneSet,
enrichment.type = "ORA",
gsub.pattern.prot.id = "[.].*",
pvalueCutoff = 0.05,
qvalueCutoff = 0.05,
pAdjustMethod = "BH",
diff.status.category = "LNCaP_WT_R1881_AR",
dotplot.n = 10)
The ORA object contains several plots that can help the interpretation of the results. These include networks of proteins and complexes, and dot plots of GeneRatio and FoldEnrichment:
ORA.results@protein.network
ORA.results@pathway.network$plot
patchwork::wrap_plots(ORA.results@dotplot_gene.ratio, ORA.results@dotplot_fold.enrichment)
## $`R version:`
## [1] R version 4.4.3 (2025-02-28)
##
## $`Base packages:`
## [1] base datasets graphics grDevices methods stats stats4
## [8] utils
##
## $`Other attached packages:`
## [1] AnnotationDbi_1.66.0 Biobase_2.64.0 BiocGenerics_0.50.0
## [4] data.table_1.17.8 DEprot_0.1.0 dplyr_1.1.4
## [7] ggplot2_4.0.0 ggpubr_0.6.2 IRanges_2.38.1
## [10] patchwork_1.3.2 purrr_1.1.0 S4Vectors_0.42.1
##
## $`Loaded via a namespace (and not attached):`
## [1] abind_1.4-8 apcluster_1.4.14 ape_5.8-1
## [4] aPEAR_1.0.0 aplot_0.2.9 arules_1.7-9
## [7] backports_1.5.0 bayesbio_1.0.0 BiocManager_1.30.26
## [10] BiocParallel_1.38.0 Biostrings_2.72.1 bit_4.6.0
## [13] bit64_4.6.0-1 bitops_1.0-9 blob_1.2.4
## [16] boot_1.3-31 broom_1.0.10 bslib_0.9.0
## [19] cachem_1.1.0 car_3.1-3 carData_3.0-5
## [22] class_7.3-23 cli_3.6.5 clusterProfiler_4.12.6
## [25] codetools_0.2-20 colorspace_2.1-2 commonmark_2.0.0
## [28] compiler_4.4.3 ComplexUpset_1.3.3 conflicted_1.2.0
## [31] cowplot_1.2.0 crayon_1.5.3 curl_7.0.0
## [34] DBI_1.2.3 DEoptimR_1.1-4 dichromat_2.0-0.1
## [37] digest_0.6.37 doParallel_1.0.17 doRNG_1.8.6.2
## [40] DOSE_3.30.5 e1071_1.7-16 enrichplot_1.24.4
## [43] evaluate_1.0.5 expm_1.0-0 farver_2.1.2
## [46] fastmap_1.2.0 fastmatch_1.1-6 fgsea_1.30.0
## [49] fmsb_0.7.6 fontBitstreamVera_0.1.1 fontLiberation_0.1.0
## [52] fontquiver_0.2.1 forcats_1.0.1 foreach_1.5.2
## [55] Formula_1.2-5 fs_1.6.6 gdtools_0.4.4
## [58] generics_0.1.4 GenomeInfoDb_1.40.1 GenomeInfoDbData_1.2.12
## [61] ggdendro_0.2.0 ggforce_0.5.0 ggfun_0.2.0
## [64] ggiraph_0.9.2 ggplotify_0.1.3 ggraph_2.2.2
## [67] ggrepel_0.9.6 ggridges_0.5.7 ggsignif_0.6.4
## [70] ggtext_0.1.2 ggtree_3.12.0 glmnet_4.1-10
## [73] glue_1.8.0 GO.db_3.19.1 GOSemSim_2.30.2
## [76] graphlayouts_1.2.2 grid_4.4.3 gridExtra_2.3
## [79] gridGraphics_0.5-1 gridtext_0.1.5 gson_0.1.0
## [82] gtable_0.3.6 HarmonizR_1.2.0 hms_1.1.3
## [85] htmltools_0.5.8.1 htmlwidgets_1.6.4 httr_1.4.7
## [88] httr2_1.2.1 igraph_2.1.4 iterators_1.0.14
## [91] itertools_0.1-3 jquerylib_0.1.4 jsonlite_2.0.0
## [94] KEGGREST_1.44.1 knitr_1.50 labeling_0.4.3
## [97] laeken_0.5.3 lattice_0.22-6 lazyeval_0.2.2
## [100] legendry_0.2.4 lifecycle_1.0.4 limma_3.60.6
## [103] litedown_0.7 lmtest_0.9-40 lsa_0.73.3
## [106] lubridate_1.9.4 magrittr_2.0.4 markdown_2.0
## [109] MASS_7.3-64 Matrix_1.7-2 MBQN_2.16.0
## [112] MCL_1.0 memoise_2.0.1 mgcv_1.9-1
## [115] missForest_1.5 nlme_3.1-167 nnet_7.3-20
## [118] nortest_1.0-4 org.Hs.eg.db_3.19.1 parallel_4.4.3
## [121] pcaMethods_1.96.0 pheatmap_1.0.13 pillar_1.11.1
## [124] pkgconfig_2.0.3 plotly_4.11.0 plyr_1.8.9
## [127] png_0.1-8 polyclip_1.10-7 prettyunits_1.2.0
## [130] progress_1.2.3 prolfqua_1.3.9 proxy_0.4-27
## [133] qvalue_2.36.0 R.methodsS3_1.8.2 R.oo_1.27.1
## [136] R.utils_2.13.0 R6_2.6.1 randomForest_4.7-1.2
## [139] ranger_0.17.0 rappdirs_0.3.3 RColorBrewer_1.1-3
## [142] Rcpp_1.1.0 RCurl_1.98-1.17 reshape2_1.4.4
## [145] rlang_1.1.6 rmarkdown_2.30 rngtools_1.5.2
## [148] robustbase_0.99-6 RSQLite_2.4.3 rstatix_0.7.3
## [151] rstudioapi_0.17.1 rvcheck_0.2.1 S7_0.2.0
## [154] sass_0.4.10 scales_1.4.0 scatterpie_0.2.6
## [157] shadowtext_0.1.6 shape_1.4.6.1 SnowballC_0.7.1
## [160] sp_2.2-0 splines_4.4.3 statmod_1.5.1
## [163] stringi_1.8.7 stringr_1.5.2 survival_3.8-3
## [166] systemfonts_1.3.1 tibble_3.3.0 tidygraph_1.3.1
## [169] tidyr_1.3.1 tidyselect_1.2.1 tidytree_0.4.6
## [172] timechange_0.3.0 tools_4.4.3 treeio_1.28.0
## [175] tweenr_2.0.3 UCSC.utils_1.0.0 vcd_1.4-13
## [178] vctrs_0.6.5 VIM_6.2.6 viridis_0.6.5
## [181] viridisLite_0.4.2 withr_3.0.2 xfun_0.53
## [184] xml2_1.4.1 XVector_0.44.0 yaml_2.3.10
## [187] yulab.utils_0.2.1 zlibbioc_1.50.0 zoo_1.8-14