Transcriptomic Analyses

Transcriptomic Analyses for Deng et al. 2026

This R notebook contains the code used for performing differential expression analysis as well as for creating the figures and tables related to transcriptomic data in Deng et al., 2026. The code uses output from Homer and can be run in RStudio. The code was tested with Windows 11 Pro; no non-standard hardware is required.

Packages can be installed as below (for “example_package”). Estimated time for installing all packages and running all code on typical desktop computer: 2-4 hrs. Instructions are included as comments on relevant lines of code, and quantitative results that are used in the figures and listed in supplementary data tables can be reproduced by following the respective sections.

#install.packages("example_package")
#OR
#if (!require("BiocManager", quietly = TRUE))
#    install.packages("BiocManager")
#BiocManager::install("example_package")

DESeq2 differential expression analysis

library(htmltools)
library(DESeq2)

## Loading required package: S4Vectors

## Loading required package: stats4

## Loading required package: BiocGenerics

## 
## Attaching package: 'BiocGenerics'

## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs

## The following objects are masked from 'package:base':
## 
##     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
##     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
##     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
##     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
##     Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
##     table, tapply, union, unique, unsplit, which.max, which.min

## 
## Attaching package: 'S4Vectors'

## The following objects are masked from 'package:base':
## 
##     expand.grid, I, unname

## Loading required package: IRanges

## 
## Attaching package: 'IRanges'

## The following object is masked from 'package:grDevices':
## 
##     windows

## Loading required package: GenomicRanges

## Loading required package: GenomeInfoDb

## Loading required package: SummarizedExperiment

## Loading required package: MatrixGenerics

## Loading required package: matrixStats

## 
## Attaching package: 'MatrixGenerics'

## The following objects are masked from 'package:matrixStats':
## 
##     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
##     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
##     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
##     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
##     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
##     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
##     colWeightedMeans, colWeightedMedians, colWeightedSds,
##     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
##     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
##     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
##     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
##     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
##     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
##     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
##     rowWeightedSds, rowWeightedVars

## Loading required package: Biobase

## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.

## 
## Attaching package: 'Biobase'

## The following object is masked from 'package:MatrixGenerics':
## 
##     rowMedians

## The following objects are masked from 'package:matrixStats':
## 
##     anyMissing, rowMedians

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.2.3

library(factoextra)

## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

library(FactoMineR)

#Read in Homer data and metadata files, process for various pairwise comparisons
countData <- read.csv("raw.csv") # Raw counts file output from Homer
countData <- countData[,-(1:7)] #only keep annotation
countData[,(2:20)] <- ceiling(countData[,(2:20)]) #rounded up, to avoid non-integers in DESeq


metaData <- read.csv("RNAseq_metadata.csv", header = TRUE, sep = ",")
metaData_wtonly <- metaData[(10:19),]
metaData_koonly <- metaData[(1:9),]
metaData_crenegonly <- metaData[metaData$Cre=='minus',]
metaData_comp5 <- metaData[c(12,14,17,19,3,4,6,9),]

countData_wtonly <- countData[,-(2:10)]
countData_koonly <- countData[,-(11:20)]
negative_columns <-  c(TRUE,metaData$Cre=='minus') #need extra column in countData
countData_crenegonly <- countData[,negative_columns]
positive_columns <-  c(TRUE,metaData$Cre=='plus') #need extra column in countData
countData_comp5 <- countData[,c(1,13,15,18,20,4,5,7,10)]



dds_wtonly <- DESeqDataSetFromMatrix(countData = countData_wtonly, colData = metaData_wtonly, design = ~Cre, tidy = TRUE)

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

dds_wtonly <- DESeq(dds_wtonly)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

rownames(dds_wtonly) <- sub("\\|.*", "", row.names(dds_wtonly))

dds_koonly <- DESeqDataSetFromMatrix(countData = countData_koonly, colData = metaData_koonly, design = ~Cre, tidy = TRUE)

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

dds_koonly <- DESeq(dds_koonly)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

rownames(dds_koonly) <- sub("\\|.*", "", row.names(dds_koonly))

dds_crenegonly <- DESeqDataSetFromMatrix(countData = countData_crenegonly, colData = metaData_crenegonly, design = ~Fmr, tidy = TRUE)

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

dds_crenegonly$Fmr <- relevel(dds_crenegonly$Fmr, ref = "WT")
dds_crenegonly <- DESeq(dds_crenegonly)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

rownames(dds_crenegonly) <- sub("\\|.*", "", row.names(dds_crenegonly))

dds_comp5 <- DESeqDataSetFromMatrix(countData = countData_comp5, colData = metaData_comp5, design = ~Cre, tidy = TRUE)

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

dds_comp5 <- DESeq(dds_comp5)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

rownames(dds_comp5) <- sub("\\|.*", "", row.names(dds_comp5))


#Create results tables
res_wtonly <- results(dds_wtonly)
res_koonly <- results(dds_koonly)
res_crenegonly <- results(dds_crenegonly)
res_comp5 <- results(dds_comp5)


#Sort results by adjusted p-value
res_wtonly <- res_wtonly[order(res_wtonly$padj),]
res_koonly <- res_koonly[order(res_koonly$padj),]
res_crenegonly <- res_crenegonly[order(res_crenegonly$padj),]
res_comp5 <- res_comp5[order(res_comp5$padj),]

Create Supplementary Table 2

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following object is masked from 'package:Biobase':
## 
##     combine

## The following object is masked from 'package:matrixStats':
## 
##     count

## The following objects are masked from 'package:GenomicRanges':
## 
##     intersect, setdiff, union

## The following object is masked from 'package:GenomeInfoDb':
## 
##     intersect

## The following objects are masked from 'package:IRanges':
## 
##     collapse, desc, intersect, setdiff, slice, union

## The following objects are masked from 'package:S4Vectors':
## 
##     first, intersect, rename, setdiff, setequal, union

## The following objects are masked from 'package:BiocGenerics':
## 
##     combine, intersect, setdiff, union

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tibble)
library(writexl)

## Warning: package 'writexl' was built under R version 4.2.3

tpmData <- read.csv("tpm.csv") #TPM output file from Homer
tpmFinal <- tpmData[,-(2:7)] #Remove extraneous annotation information
tpmFinal <- tpmFinal[,c(1,2,14,16,19,21,12,13,15,17,18,20,3,4,7,9,10,5,6,8,11)]

rownames(tpmFinal) <- sub("\\|.*", "", tpmFinal$Annotation)

tpmFinal <- rownames_to_column(tpmFinal)

## DESeq2 with standard parameters was used to perform pairwise comparisons:
## (1) WT to FXS [termed "crenegonly"]
## (2) FXS to FXS;Smad4 cKO [termed "koonly"]
## (3) WT to WT;Smad4 cKO [termed "wtonly"]
## (4) WT to FXS;Smad4 cKO [termed "kokowtwt" or "comp5"]

##load the DESeq_results.RData workspace
load("DESeq_results.Rdata")

res_wtonly_frame <- rownames_to_column(as.data.frame(res_wtonly)) #DESeq output comparing WT to WT;Smad4 cKO
res_wtonly_frame <- res_wtonly_frame[,-c(2,4,5,6)]
tpmFinal <- full_join(tpmFinal, res_wtonly_frame, by='rowname')

res_koonly_frame <- rownames_to_column(as.data.frame(res_koonly)) #DESeq output comparing FXS to FXS;Smad4 cKO
res_koonly_frame <- res_koonly_frame[,-c(2,4,5,6)]
tpmFinal <- full_join(tpmFinal, res_koonly_frame, by='rowname')

res_crenegonly_frame <- rownames_to_column(as.data.frame(res_crenegonly)) #DESeq output comparing WT to FXS
res_crenegonly_frame <- res_crenegonly_frame[,-c(2,4,5,6)]
tpmFinal <- full_join(tpmFinal, res_crenegonly_frame, by='rowname')

colnames(tpmFinal) <- c('Gene','Transcript','Annotation','WT1','WT2','WT3','WT4',
                        'WT;Smad4cKO1','WT;Smad4cKO2','WT;Smad4cKO3','WT;Smad4cKO4','WT;Smad4cKO5','WT;Smad4cKO6',
                        'FXS1','FXS2','FXS3','FXS4','FXS5',
                        'FXS;Smad4cKO1','FXS;Smad4cKO2','FXS;Smad4cKO3','FXS;Smad4cKO4',
                        'L2FC WT;Smad4cKO vs. WT', 'padj WT;Smad4cKO vs. WT',
                        'L2FC FXS;Smad4cKO vs. FXS', 'padj FXS;Smad4cKO vs. FXS',
                        'L2FC FXS vs WT', 'padj FXS vs. WT')

tpmFinal$"WT AvgTPM" <- rowMeans(tpmFinal[4:7])
tpmFinal$"WT;Smad4cKO AvgTPM" <- rowMeans(tpmFinal[8:13])
tpmFinal$"FXS AvgTPM" <- rowMeans(tpmFinal[14:18])
tpmFinal$"FXS;Smad4cKO AvgTPM" <- rowMeans(tpmFinal[19:22])

tpmFinal <- tpmFinal[order(tpmFinal$Gene),] 
tpmFinal$all_above_1 <- rowMaxs(as.matrix(tpmFinal[,c(29:32)]))>1
#length(which(tpmFinal$all_above_1==TRUE)) #15560 of the 24537 genes have at least one group TPM average > 1

tpmFinal_forpaper <- tpmFinal[,c(1:22)]

write_xlsx(tpmFinal_forpaper, path = "Supplementary Table 2.xlsx")

Creating Figure S5b

library(pheatmap)
library(viridis)

## Loading required package: viridisLite

library(ggplot2)

# cell type specificity 
cell_type <- read.csv("Boisvert_cell type genes.csv") #from Boisvert et al., 2018.

cell_type_astro <- tpmFinal[tpmFinal$Gene %in% cell_type$Astrocyte,]
cell_type_astro <- rbind(cell_type_astro, tpmFinal[tpmFinal$Gene %in% c("Kcnj10","Gja1","Itgb5","Gpc5","Apoe","Clu","Ezr","Glul"),])
cell_type_astro$type <- "astro"

cell_type_endo <- tpmFinal[tpmFinal$Gene %in% cell_type$Endo,]
cell_type_endo$type <- "endo"

cell_type_microglia <- tpmFinal[tpmFinal$Gene %in% cell_type$Microglia,]
cell_type_microglia$type <- "microglia"

cell_type_neuron <- tpmFinal[tpmFinal$Gene %in% cell_type$Neuron,]
cell_type_neuron <- rbind(cell_type_neuron, tpmFinal[tpmFinal$Gene %in% c("Rbfox3","Nefl","Eno2"),])
cell_type_neuron$type <- "neuron"

cell_type_opc <- tpmFinal[tpmFinal$Gene %in% cell_type$OPC,]
cell_type_opc$type <- "opc"

cell_type_oligo <- tpmFinal[tpmFinal$Gene %in% cell_type$Oligo,]
cell_type_oligo$type <- "oligo"

cell_type_combined <- rbind(cell_type_astro,cell_type_neuron,cell_type_oligo,cell_type_opc,cell_type_endo,cell_type_microglia)

cell_type_combined$"log2(TPM)" <- log2(cell_type_combined$`WT AvgTPM`)


# Heatmaps of cell type
rowInfo <- cell_type_combined[,c("Gene", "type")]
row.names(rowInfo) <- rowInfo$Gene
rowInfo <- dplyr::select(rowInfo, -c("Gene"))

data <- cell_type_combined[,c("Gene", "log2(TPM)")]
row.names(data) <- data$Gene
data<- dplyr::select(data, -c("Gene"))


mybreaks <- c(seq(0,10, length =50))

map <- pheatmap((data), 
                annotation_row = rowInfo,
                annotation_col = NA,
                annotation_colors = NA,
                color = inferno(50),
                breaks = mybreaks,
                show_rownames = FALSE, 
                show_colnames = TRUE,
                annotation_legend = TRUE,
                fontsize_row = 6,
                fontsize = 8,
                cluster_rows = FALSE,
                cluster_cols =  FALSE,
                cellwidth = 20,
                cellheight = 1,
                width = 7, 
                height = 7, 
                gaps_row = c(110,205,231,247,277))

ggsave("celltype.pdf", map, width = 2, height = 6, unit = "in")

Figure 2c: filtering for DEGs

##Filters were: |FC| > 1.5, p-adj < 0.05, TPM in WT > 1

res_wtonly_up <- as.data.frame(subset(subset(res_wtonly, log2FoldChange > 0.585), padj<0.05))
res_wtonly_up <- subset(res_wtonly_up, tpmFinal[row.names(res_wtonly_up),29]>1)
res_wtonly_up <- res_wtonly_up[order(res_wtonly_up$padj),]


res_wtonly_down <- as.data.frame(subset(subset(res_wtonly, log2FoldChange < -0.585), padj<0.05))
res_wtonly_down <- subset(res_wtonly_down, tpmFinal[row.names(res_wtonly_down),29]>1)
res_wtonly_down <- res_wtonly_down[order(res_wtonly_down$padj),]


res_koonly_up <- as.data.frame(subset(subset(res_koonly, log2FoldChange > 0.585), padj<0.05))
res_koonly_up <- subset(res_koonly_up, tpmFinal[row.names(res_koonly_up),29]>1)
res_koonly_up  <- res_koonly_up[order(res_koonly_up$padj),]


res_koonly_down <- as.data.frame(subset(subset(res_koonly, log2FoldChange < -0.585), padj<0.05))
res_koonly_down <- subset(res_koonly_down, tpmFinal[row.names(res_koonly_down),29]>1)
res_koonly_down <- res_koonly_down[order(res_koonly_down$padj),]


res_crenegonly_up <- as.data.frame(subset(subset(res_crenegonly, log2FoldChange > 0.585), padj<0.05))
res_crenegonly_up <- subset(res_crenegonly_up, tpmFinal[row.names(res_crenegonly_up),29]>1)
res_crenegonly_up <- res_crenegonly_up[order(res_crenegonly_up$padj),]


res_crenegonly_down <- as.data.frame(subset(subset(res_crenegonly, log2FoldChange < -0.585), padj<0.05))
res_crenegonly_down <- subset(res_crenegonly_down, tpmFinal[row.names(res_crenegonly_down),29]>1)
res_crenegonly_down <- res_crenegonly_down[order(res_crenegonly_down$padj),]


res_comp5_up <- as.data.frame(subset(subset(res_comp5, log2FoldChange > 0.585), padj<0.05))
res_comp5_up <- subset(res_comp5_up, tpmFinal[row.names(res_comp5_up),29]>1)
res_comp5_up <- res_comp5_up[order(res_comp5_up$padj),]


res_comp5_down <- as.data.frame(subset(subset(res_comp5, log2FoldChange < -0.585), padj<0.05))
res_comp5_down <- subset(res_comp5_down, tpmFinal[row.names(res_comp5_down),29]>1)
res_comp5_down <- res_comp5_down[order(res_comp5_down$padj),]

Create Supplementary Table 3

library(tibble)
library(scales)

## Warning: package 'scales' was built under R version 4.2.3

## 
## Attaching package: 'scales'

## The following object is masked from 'package:viridis':
## 
##     viridis_pal

library(fasstr)

## Warning: package 'fasstr' was built under R version 4.2.3

library(writexl)

rownames(tpmFinal) <- sub("\\|.*", "", tpmFinal$Annotation)

res_wtonly_up_final <- res_wtonly_up[,-c(1,3,4,5)]
res_wtonly_up_final$"WT AvgTPM" <- tpmFinal[rownames(res_wtonly_up_final),29]
res_wtonly_up_final$"WT;Smad4cKO AvgTPM" <- tpmFinal[rownames(res_wtonly_up_final),30]
res_wtonly_up_final <- res_wtonly_up_final[order(res_wtonly_up_final$padj),]
res_wtonly_up_final <- rownames_to_column(res_wtonly_up_final)
colnames(res_wtonly_up_final) <- c('Gene','log2FoldChange','padj','WT Avg TPM','WT;Smad4cKO Avg TPM')
res_wtonly_up_final[,c(2,4,5)] <- round(res_wtonly_up_final[,c(2,4,5)], digits=3)
res_wtonly_up_final[,c(3)] <- scientific(res_wtonly_up_final[,c(3)], digits=3)


res_wtonly_down_final <- res_wtonly_down[,-c(1,3,4,5)]
res_wtonly_down_final$"WT AvgTPM" <- tpmFinal[rownames(res_wtonly_down_final),29]
res_wtonly_down_final$"WT;Smad4cKO AvgTPM" <- tpmFinal[rownames(res_wtonly_down_final),30]
res_wtonly_down_final <- res_wtonly_down_final[order(res_wtonly_down_final$padj),]
res_wtonly_down_final <- rownames_to_column(res_wtonly_down_final)
colnames(res_wtonly_down_final) <- c('Gene','log2FoldChange','padj','WT Avg TPM','WT;Smad4cKO Avg TPM')
res_wtonly_down_final[,c(2,4,5)] <- round(res_wtonly_down_final[,c(2,4,5)], digits=3)
res_wtonly_down_final[,c(3)] <- scientific(res_wtonly_down_final[,c(3)], digits=3)


res_koonly_up_final <- res_koonly_up[,-c(1,3,4,5)]
res_koonly_up_final$"FXS AvgTPM" <- tpmFinal[rownames(res_koonly_up_final),31]
res_koonly_up_final$"FXS;Smad4cKO AvgTPM" <- tpmFinal[rownames(res_koonly_up_final),32]
res_koonly_up_final <- res_koonly_up_final[order(res_koonly_up_final$padj),]
res_koonly_up_final <- rownames_to_column(res_koonly_up_final)
colnames(res_koonly_up_final) <- c('Gene','log2FoldChange','padj','FXS Avg TPM','FXS;Smad4cKO Avg TPM')
res_koonly_up_final[,c(2,4,5)] <- round(res_koonly_up_final[,c(2,4,5)], digits=3)
res_koonly_up_final[,c(3)] <- scientific(res_koonly_up_final[,c(3)], digits=3)



res_koonly_down_final <- res_koonly_down[,-c(1,3,4,5)]
res_koonly_down_final$"FXS AvgTPM" <- tpmFinal[rownames(res_koonly_down_final),31]
res_koonly_down_final$"FXS;Smad4cKO AvgTPM" <- tpmFinal[rownames(res_koonly_down_final),32]
res_koonly_down_final <- res_koonly_down_final[order(res_koonly_down_final$padj),]
res_koonly_down_final <- rownames_to_column(res_koonly_down_final)
colnames(res_koonly_down_final) <- c('Gene','log2FoldChange','padj','FXS Avg TPM','FXS;Smad4cKO Avg TPM')
res_koonly_down_final[,c(2,4,5)] <- round(res_koonly_down_final[,c(2,4,5)], digits=3)
res_koonly_down_final[,c(3)] <- scientific(res_koonly_down_final[,c(3)], digits=3)



res_crenegonly_up_final <- res_crenegonly_up[,-c(1,3,4,5)]
res_crenegonly_up_final$"WT AvgTPM" <- tpmFinal[rownames(res_crenegonly_up_final),29]
res_crenegonly_up_final$"FXS AvgTPM" <- tpmFinal[rownames(res_crenegonly_up_final),31]
res_crenegonly_up_final <- res_crenegonly_up_final[order(res_crenegonly_up_final$padj),]
res_crenegonly_up_final <- rownames_to_column(res_crenegonly_up_final)
colnames(res_crenegonly_up_final) <- c('Gene','log2FoldChange','padj','WT Avg TPM','FXS Avg TPM')
res_crenegonly_up_final[,c(2,4,5)] <- round(res_crenegonly_up_final[,c(2,4,5)], digits=3)
res_crenegonly_up_final[,c(3)] <- scientific(res_crenegonly_up_final[,c(3)], digits=3)


res_crenegonly_down_final <- res_crenegonly_down[,-c(1,3,4,5)]
res_crenegonly_down_final$"WT AvgTPM" <- tpmFinal[rownames(res_crenegonly_down_final),29]
res_crenegonly_down_final$"FXS AvgTPM" <- tpmFinal[rownames(res_crenegonly_down_final),31]
res_crenegonly_down_final <- res_crenegonly_down_final[order(res_crenegonly_down_final$padj),]
res_crenegonly_down_final <- rownames_to_column(res_crenegonly_down_final)
colnames(res_crenegonly_down_final) <- c('Gene','log2FoldChange','padj','WT Avg TPM','FXS Avg TPM')
res_crenegonly_down_final[,c(2,4,5)] <- round(res_crenegonly_down_final[,c(2,4,5)], digits=3)
res_crenegonly_down_final[,c(3)] <- scientific(res_crenegonly_down_final[,c(3)], digits=3)



res_comp5_up_final <- res_comp5_up[,-c(1,3,4,5)]
res_comp5_up_final$"WT AvgTPM" <- tpmFinal[rownames(res_comp5_up_final),29]
res_comp5_up_final$"FXS;Smad4cKO AvgTPM" <- tpmFinal[rownames(res_comp5_up_final),32]
res_comp5_up_final <- res_comp5_up_final[order(res_comp5_up_final$padj),]
res_comp5_up_final <- rownames_to_column(res_comp5_up_final)
colnames(res_comp5_up_final) <- c('Gene','log2FoldChange','padj','WT Avg TPM','FXS;Smad4cKO Avg TPM')
res_comp5_up_final[,c(2,4,5)] <- round(res_comp5_up_final[,c(2,4,5)], digits=3)
res_comp5_up_final[,c(3)] <- scientific(res_comp5_up_final[,c(3)], digits=3)



res_comp5_down_final <- res_comp5_down[,-c(1,3,4,5)]
res_comp5_down_final$"WT AvgTPM" <- tpmFinal[rownames(res_comp5_down_final),29]
res_comp5_down_final$"FXS;Smad4cKO AvgTPM" <- tpmFinal[rownames(res_comp5_down_final),32]
res_comp5_down_final <- res_comp5_down_final[order(res_comp5_down_final$padj),]
res_comp5_down_final <- rownames_to_column(res_comp5_down_final)
colnames(res_comp5_down_final) <- c('Gene','log2FoldChange','padj','WT Avg TPM','FXS;Smad4cKO Avg TPM')
res_comp5_down_final[,c(2,4,5)] <- round(res_comp5_down_final[,c(2,4,5)], digits=3)
res_comp5_down_final[,c(3)] <- scientific(res_comp5_down_final[,c(3)], digits=3)




list_of_DEGs <- list(Comparison_1_UP = res_crenegonly_up_final,
                     Comparison_1_DOWN = res_crenegonly_down_final,
                     Comparison_2_UP = res_koonly_up_final,
                     Comparison_2_DOWN = res_koonly_down_final,
                     Comparison_3_UP = res_wtonly_up_final,
                     Comparison_3_DOWN = res_wtonly_down_final,
                     Comparison_4_UP = res_comp5_up_final,
                     Comparison_4_DOWN = res_comp5_down_final)
  
  
write_xlsx(list_of_DEGs, path = "Supplementary Table 3.xlsx")

Figure 2E: density plot

library(tibble)
library(plyr)

## ------------------------------------------------------------------------------

## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)

## ------------------------------------------------------------------------------

## 
## Attaching package: 'plyr'

## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize

## The following object is masked from 'package:matrixStats':
## 
##     count

## The following object is masked from 'package:IRanges':
## 
##     desc

## The following object is masked from 'package:S4Vectors':
## 
##     rename

library(ggplot2)
library(hrbrthemes)

## Warning: package 'hrbrthemes' was built under R version 4.2.3

library(geomtextpath)

## Warning: package 'geomtextpath' was built under R version 4.2.3

library(devtools)

## Warning: package 'devtools' was built under R version 4.2.3

## Loading required package: usethis

## Warning: package 'usethis' was built under R version 4.2.3

library(BSDA)

## Loading required package: lattice

## 
## Attaching package: 'BSDA'

## The following object is masked from 'package:datasets':
## 
##     Orange

res_wtonly_df <- as.data.frame(res_wtonly)
res_koonly_df <- as.data.frame(res_koonly)
res_comp5_df <- as.data.frame(res_comp5)
res_crenegonly_df <- as.data.frame(res_crenegonly)


#### Get data on Darnell FMRP
Darnell <- read.csv("Darnell_s2a.csv") #From Darnell et al., 2011.

res_crenegonly_darnell <- res_crenegonly_df[Darnell[,3],]
res_crenegonly_darnell <- na.omit(res_crenegonly_darnell) #went from 842 to 727 genes
#hist(res_crenegonly_darnell[,2],breaks=40, 
#     main="L2FC of FMRP mRNA targets (Darnell et al., 2011)", 
#     xlab="L2FC",
#     ylab="# genes")
#sum(res_crenegonly_darnell[,2]>0)
#sum(res_crenegonly_darnell[,2]<0)
#sum(res_crenegonly_df[,2]>0, na.rm=TRUE)
#sum(res_crenegonly_df[,2]<0, na.rm=TRUE)

#Get astro-enriched genes (Zhang 2014 dataset)
astro_genes <- read.csv("Zhang-astro-enriched.csv") #From Zhang et al., 2014.

res_crenegonly_darnell_astro <- res_crenegonly_df[Darnell[is.element(Darnell[,3],astro_genes[,1]),3],]
res_crenegonly_darnell_astro <- na.omit(res_crenegonly_darnell_astro) #153 to 141
#hist(res_crenegonly_darnell_astro[,2],breaks=20, 
     # main="Astrocyte-Enriched FMRP mRNA targets", 
     # xlab="Log2 Fold Change",
     # ylab="# genes", 
     # cex.lab=1.5, 
     # cex.axis=1.2, 
     # cex.main=1.5)
write.csv(res_crenegonly_darnell_astro, "./res_crenegonly_darnell_astro.csv")
#sum(res_crenegonly_darnell_astro[,2]>0)
#sum(res_crenegonly_darnell_astro[,2]<0)

###Plot overlaid distribution of astrocyte FMRP-target genes vs. overall gene distribution
overlaid_fxs_map <- res_crenegonly_df
overlaid_fxs_map <- overlaid_fxs_map[,c(1,2)]
overlaid_fxs_map$Gene_Set <- 'All Genes'
overlaid_darnell_astro <- res_crenegonly_darnell_astro
overlaid_darnell_astro <- overlaid_darnell_astro[,c(1,2)]
overlaid_darnell_astro$Gene_Set <- 'Astrocyte-Enriched FMRP Targets (141)'
overlaid_darnell <- res_crenegonly_darnell
overlaid_darnell <- overlaid_darnell[,c(1,2)]
overlaid_darnell$Gene_Set <- 'darnell'

overlaid_fxs_map <- rbind(overlaid_fxs_map,overlaid_darnell_astro)
overlaid_fxs_map <- na.omit(overlaid_fxs_map)
mu <- ddply(overlaid_fxs_map, 'Gene_Set', summarise, grp.mean=mean(log2FoldChange))

#count(overlaid_darnell_astro)

density_plot <- ggplot(overlaid_fxs_map, aes(log2FoldChange, fill=Gene_Set, label=Gene_Set)) +
  geom_density(alpha=0.4) +
  xlim(-0.5,0.5) +
  ylim(0,3.5) +
  geom_vline(data = mu, aes(xintercept=grp.mean, color=Gene_Set),linetype='dashed') +
  theme_bw()

ggsave("density_plot_darnell.pdf", density_plot, width = 8, height = 4, unit = "in")

## Warning: Removed 4972 rows containing non-finite outside the scale range
## (`stat_density()`).

Processing data from Divolis et al. for comparison in Figure 3B

library(htmltools)
library(DESeq2)
library(ggplot2)
library(factoextra)
library(FactoMineR)

countData <- read.csv("Divolis_rawBoth.csv") #Homer output from analysis of Divolis et al., 2017 raw data.
countData <- countData[,-(1:7)] #remove excess metadata
countData_24h <- countData[,-(11:19)]
metadata_24h <- read.csv("Divolis_metadata_24hr.csv")

dds_24h <- DESeqDataSetFromMatrix(countData = countData_24h, colData = metadata_24h, design = ~Treatment, tidy = TRUE)

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

dds_24h <- DESeq(dds_24h)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

rownames(dds_24h) <- sub("\\|.*", "", row.names(dds_24h))

res_24h_BMP <- results(dds_24h, contrast=c("Treatment","BMP4","Vehicle"))
res_24h_TGF <- results(dds_24h, contrast=c("Treatment","TGFb1","Vehicle"))


#stimulation of cultured astrocytes with 24hr BMP4 or 24hr TGFB1

Figure 3B: comparison to other BMP-related datasets

# Correlations between my comparisons and Divolis and Alie and Buckwater
library(corrplot)

## corrplot 0.92 loaded

library(pheatmap)


Alie_RNAseq_BMP <- read.csv("Caldwell_s3 RNAseq BMP.csv") #From Caldwell et al., 2022; stimulation of cultured astrocytes with 5d BMP6.

res_wtonly_up <- as.data.frame(subset(subset(res_wtonly, log2FoldChange > 0.585), padj<0.05))
res_wtonly_up <- subset(res_wtonly_up, tpmFinal[row.names(res_wtonly_up),29]>1)
res_wtonly_up <- res_wtonly_up[order(res_wtonly_up$padj),]


res_wtonly_down <- as.data.frame(subset(subset(res_wtonly, log2FoldChange < -0.585), padj<0.05))
res_wtonly_down <- subset(res_wtonly_down, tpmFinal[row.names(res_wtonly_down),29]>1)
res_wtonly_down <- res_wtonly_down[order(res_wtonly_down$padj),]


res_koonly_up <- as.data.frame(subset(subset(res_koonly, log2FoldChange > 0.585), padj<0.05))
res_koonly_up <- subset(res_koonly_up, tpmFinal[row.names(res_koonly_up),29]>1)
res_koonly_up  <- res_koonly_up[order(res_koonly_up$padj),]


res_koonly_down <- as.data.frame(subset(subset(res_koonly, log2FoldChange < -0.585), padj<0.05))
res_koonly_down <- subset(res_koonly_down, tpmFinal[row.names(res_koonly_down),29]>1)
res_koonly_down <- res_koonly_down[order(res_koonly_down$padj),]


res_crenegonly_up <- as.data.frame(subset(subset(res_crenegonly, log2FoldChange > 0.585), padj<0.05))
res_crenegonly_up <- subset(res_crenegonly_up, tpmFinal[row.names(res_crenegonly_up),29]>1)
res_crenegonly_up <- res_crenegonly_up[order(res_crenegonly_up$padj),]


res_crenegonly_down <- as.data.frame(subset(subset(res_crenegonly, log2FoldChange < -0.585), padj<0.05))
res_crenegonly_down <- subset(res_crenegonly_down, tpmFinal[row.names(res_crenegonly_down),29]>1)
res_crenegonly_down <- res_crenegonly_down[order(res_crenegonly_down$padj),]


res_comp5_up <- as.data.frame(subset(subset(res_comp5, log2FoldChange > 0.585), padj<0.05))
res_comp5_up <- subset(res_comp5_up, tpmFinal[row.names(res_comp5_up),29]>1)
res_comp5_up <- res_comp5_up[order(res_comp5_up$padj),]


res_comp5_down <- as.data.frame(subset(subset(res_comp5, log2FoldChange < -0.585), padj<0.05))
res_comp5_down <- subset(res_comp5_down, tpmFinal[row.names(res_comp5_down),29]>1)
res_comp5_down <- res_comp5_down[order(res_comp5_down$padj),]

corr_divolis <- as.data.frame(res_wtonly)[,c(1:2)]
corr_divolis_test <- as.data.frame(res_koonly)[,c(1:2)]
corr_divolis <- merge(corr_divolis,corr_divolis_test,by="row.names",all.x=TRUE)
corr_divolis <- corr_divolis[,-c(2,4)]
colnames(corr_divolis) <- c("Annotation.Divergence","Comp1","Comp2")


res_24h_BMP_df <- as.data.frame(res_24h_BMP)
res_24h_TGF_df <- as.data.frame(res_24h_TGF)

res_24h_BMP_df$gene <- rownames(res_24h_BMP_df)
res_24h_TGF_df$gene <- rownames(res_24h_TGF_df)

res_24h_BMP_df <- res_24h_BMP_df[order(res_24h_BMP_df$gene),]
res_24h_TGF_df <- res_24h_TGF_df[order(res_24h_TGF_df$gene),]

 
corr_divolis$Alie <- Alie_RNAseq_BMP$BMP6.vs..Alone.Log2.FC
corr_divolis$BMP_24h <- res_24h_BMP_df$log2FoldChange
corr_divolis$TGF_24h <- res_24h_TGF_df$log2FoldChange

corr_divolis_all <- corr_divolis[,-(3)]


#Divolis DEGs for R21
rownames(corr_divolis) <- corr_divolis$Annotation.Divergence
corr_divolis_DEGs <- corr_divolis[c(rownames(res_wtonly_down),rownames(res_wtonly_up)),]
corr_divolis_DEGs <- corr_divolis_DEGs[,-(1)]
cor_divolis_DEGs <- cor(corr_divolis_DEGs, method = 'spearman', use = "pairwise.complete.obs")

corr_divolis_DEGs_R21 <- corr_divolis_DEGs[,c(1,3,4,5)]
cor_divolis_DEGs_R21 <- cor(corr_divolis_DEGs_R21, method = 'pearson', use = "pairwise.complete.obs")
rownames(cor_divolis_DEGs_R21) <- c("Smad4cKO","+BMP6","+BMP4","+Tgfb1")
colnames(cor_divolis_DEGs_R21) <- c("Smad4cKO","+BMP6","+BMP4","+Tgfb1")
paletteLength <- 100
myColor <- colorRampPalette(c("blue", "white", "red"))(paletteLength)

#map <- pheatmap(rbind(cor_divolis_DEGs_R21[1,],c(0,0,0,0)), color=myColor, show_rownames=TRUE, show_colnames = FALSE, cluster_cols=FALSE, cluster_rows=FALSE, fontsize=12, display_numbers = rbind(round(cor_divolis_DEGs_R21[1,],2),c("Smad4 cKO","+BMP6","+BMP4","+Tgfb1")), breaks=seq(-1,1,length.out=100))

map <- pheatmap(cor_divolis_DEGs_R21, 
                cluster_cols=FALSE, 
                color=myColor, 
                cluster_rows=FALSE, 
                display_numbers = round(cor_divolis_DEGs_R21,2),
                number_color = "black")

ggsave("corr_RNAseq_BMP.pdf", map, width = 3, height = 2.5, unit = "in")

GSEA analysis and creating Supplementary Table 4

# GSEA in R, importing mouse msigdb hallmark, canonical (reactome, wikipathways, biocarta), GO]

library(clusterProfiler)

##

## clusterProfiler v4.6.0  For help: https://yulab-smu.top/biomedical-knowledge-mining-book/
## 
## If you use clusterProfiler in published research, please cite:
## T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation. 2021, 2(3):100141

## 
## Attaching package: 'clusterProfiler'

## The following object is masked from 'package:lattice':
## 
##     dotplot

## The following objects are masked from 'package:plyr':
## 
##     arrange, mutate, rename, summarise

## The following object is masked from 'package:IRanges':
## 
##     slice

## The following object is masked from 'package:S4Vectors':
## 
##     rename

## The following object is masked from 'package:stats':
## 
##     filter

library(enrichplot)

## 
## Attaching package: 'enrichplot'

## The following object is masked from 'package:lattice':
## 
##     dotplot

library(pathview)

##

## ##############################################################################
## Pathview is an open source software package distributed under GNU General
## Public License version 3 (GPLv3). Details of GPLv3 is available at
## http://www.gnu.org/licenses/gpl-3.0.html. Particullary, users are required to
## formally cite the original Pathview paper (not just mention it) in publications
## or products. For details, do citation("pathview") within R.
## 
## The pathview downloads and uses KEGG data. Non-academic uses may require a KEGG
## license agreement (details at http://www.kegg.jp/kegg/legal.html).
## ##############################################################################

library(ggplot2)
library(ReactomePA)

## ReactomePA v1.42.0  For help: https://yulab-smu.top/biomedical-knowledge-mining-book/
## 
## If you use ReactomePA in published research, please cite:
## Guangchuang Yu, Qing-Yu He. ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization. Molecular BioSystems 2016, 12(2):477-479

## 
## Attaching package: 'ReactomePA'

## The following object is masked from 'package:lattice':
## 
##     dotplot

library(GO.db)

## Loading required package: AnnotationDbi

## 
## Attaching package: 'AnnotationDbi'

## The following object is masked from 'package:clusterProfiler':
## 
##     select

## The following object is masked from 'package:dplyr':
## 
##     select

#library(biomaRt)
library(msigdbr)

## Warning: package 'msigdbr' was built under R version 4.2.3

library(ggridges)
library(DOSE)

## DOSE v3.24.2  For help: https://yulab-smu.top/biomedical-knowledge-mining-book/
## 
## If you use DOSE in published research, please cite:
## Guangchuang Yu, Li-Gen Wang, Guang-Rong Yan, Qing-Yu He. DOSE: an R/Bioconductor package for Disease Ontology Semantic and Enrichment analysis. Bioinformatics 2015, 31(4):608-609

organism = "org.Mm.eg.db"
library(organism, character.only = TRUE)

##

library(tibble)
library(gson)

## Warning: package 'gson' was built under R version 4.2.3

tpmData <- read.csv("tpm.csv")
tpmData <- tpmData[,-(1:7)]
rownames(tpmData) <- sub("\\|.*", "", tpmData$Annotation.Divergence)
tpmData$WTavg <- rowMeans(tpmData[11:20])
tpmData$KOavg <- rowMeans(tpmData[2:10])
tpmData$all_above_1 <- rowMins(as.matrix(tpmData[,c(2:20)]))>1


GSEA_wtonly <- subset(as.data.frame(res_wtonly),tpmFinal[row.names(res_wtonly),33]==TRUE)
GSEA_wtonly$rank <- sign(GSEA_wtonly$log2FoldChange) * -log10(GSEA_wtonly$pvalue)
GSEA_wtonly <- rownames_to_column(GSEA_wtonly)
GSEA_wtonly <- GSEA_wtonly[,-c(2:7)]
GSEA_wtonly <- GSEA_wtonly[order(-GSEA_wtonly$rank),]
GSEA_wtonly <- na.omit(GSEA_wtonly)
write.table(GSEA_wtonly, "./GSEA_wtonly.rnk", col.name=TRUE, sep="\t",row.names=FALSE,quote=FALSE)

GSEA_koonly <- subset(as.data.frame(res_koonly),tpmFinal[row.names(res_koonly),33]==TRUE)
GSEA_koonly$rank <- sign(GSEA_koonly$log2FoldChange) * -log10(GSEA_koonly$pvalue)
GSEA_koonly <- rownames_to_column(GSEA_koonly)
GSEA_koonly <- GSEA_koonly[,-c(2:7)]
GSEA_koonly <- GSEA_koonly[order(-GSEA_koonly$rank),]
GSEA_koonly <- na.omit(GSEA_koonly)
write.table(GSEA_koonly, "./GSEA_koonly.rnk", col.name=TRUE, sep="\t",row.names=FALSE,quote=FALSE)

GSEA_crenegonly <- subset(as.data.frame(res_crenegonly),tpmFinal[row.names(res_crenegonly),33]==TRUE)
GSEA_crenegonly$rank <- sign(GSEA_crenegonly$log2FoldChange) * -log10(GSEA_crenegonly$pvalue)
GSEA_crenegonly <- rownames_to_column(GSEA_crenegonly)
GSEA_crenegonly <- GSEA_crenegonly[,-c(2:7)]
GSEA_crenegonly <- GSEA_crenegonly[order(-GSEA_crenegonly$rank),]
GSEA_crenegonly <- na.omit(GSEA_crenegonly)
write.table(GSEA_crenegonly, "./GSEA_crenegonly.rnk", col.name=TRUE, sep="\t",row.names=FALSE,quote=FALSE)

GSEA_comp5 <- subset(as.data.frame(res_comp5),tpmFinal[row.names(res_comp5),33]==TRUE)
GSEA_comp5$rank <- sign(GSEA_comp5$log2FoldChange) * -log10(GSEA_comp5$pvalue)
GSEA_comp5 <- rownames_to_column(GSEA_comp5)
GSEA_comp5 <- GSEA_comp5[,-c(2:7)]
GSEA_comp5 <- GSEA_comp5[order(-GSEA_comp5$rank),]
GSEA_comp5 <- na.omit(GSEA_comp5)
write.table(GSEA_comp5, "./GSEA_comp5.rnk", col.name=TRUE, sep="\t",row.names=FALSE,quote=FALSE)

GSEA_wtonly_forR <- GSEA_wtonly$rank
names(GSEA_wtonly_forR) <- GSEA_wtonly$rowname

GSEA_koonly_forR <- GSEA_koonly$rank
names(GSEA_koonly_forR) <- GSEA_koonly$rowname

GSEA_crenegonly_forR <- GSEA_crenegonly$rank
names(GSEA_crenegonly_forR) <- GSEA_crenegonly$rowname

GSEA_comp5_forR <- GSEA_comp5$rank
names(GSEA_comp5_forR) <- GSEA_comp5$rowname


# GSEA_wtonly_forR <- GSEA_wtonly_l2fc$rank
# names(GSEA_wtonly_forR) <- GSEA_wtonly_l2fc$rowname
# 
# GSEA_koonly_forR <- GSEA_koonly_l2fc$rank
# names(GSEA_koonly_forR) <- GSEA_koonly_l2fc$rowname
# 
# GSEA_crenegonly_forR <- GSEA_crenegonly_l2fc$rank
# names(GSEA_crenegonly_forR) <- GSEA_crenegonly_l2fc$rowname
# 
# GSEA_comp5_forR <- GSEA_comp5_l2fc$rank
# names(GSEA_comp5_forR) <- GSEA_comp5_l2fc$rowname


gene_sets_h = msigdbr(species = "mouse", category = "H") #hallmark
gene_sets_cp = msigdbr(species = "mouse", category = "C2") #canonical pathways including KEGG, BIOCARTA, REACTOME, WP (wikipathways)
gene_sets_go = msigdbr(species = "mouse", category = "C5") #go including mf, cc, bp

msigdbr_t2g_h = gene_sets_h %>% dplyr::distinct(gs_name, gene_symbol) %>% as.data.frame()
msigdbr_t2g_cp = gene_sets_cp %>% dplyr::distinct(gs_name, gene_symbol) %>% as.data.frame()
msigdbr_t2g_go = gene_sets_go %>% dplyr::distinct(gs_name, gene_symbol) %>% as.data.frame()


GSEA_wtonly_h <- GSEA(gene = GSEA_wtonly_forR, pAdjustMethod = "BH", pvalueCutoff = 1, TERM2GENE = msigdbr_t2g_h)

## preparing geneSet collections...

## GSEA analysis...

## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.14% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.

## leading edge analysis...

## done...

GSEA_koonly_h <- GSEA(gene = GSEA_koonly_forR, pAdjustMethod = "BH", pvalueCutoff = 1, TERM2GENE = msigdbr_t2g_h)

## preparing geneSet collections...

## GSEA analysis...

## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.35% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.

## leading edge analysis...

## done...

GSEA_crenegonly_h <-  GSEA(gene = GSEA_crenegonly_forR, pAdjustMethod = "BH", pvalueCutoff = 1,TERM2GENE = msigdbr_t2g_h)

## preparing geneSet collections...

## GSEA analysis...

## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.33% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.

## leading edge analysis...

## done...

GSEA_comp5_h <- GSEA(gene = GSEA_comp5_forR, pAdjustMethod = "BH", pvalueCutoff = 1,TERM2GENE = msigdbr_t2g_h)

## preparing geneSet collections...

## GSEA analysis...

## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.37% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.

## leading edge analysis...

## done...

GSEA_wtonly_h_df <- as.data.frame(GSEA_wtonly_h)
GSEA_wtonly_h_df[,c(4,5)] <- round(GSEA_wtonly_h_df[,c(4,5)], digits=3)
GSEA_wtonly_h_df[,c(6)] <- scientific(GSEA_wtonly_h_df[,c(6)], digits=3)
GSEA_wtonly_h_df[,c(7)] <- scientific(GSEA_wtonly_h_df[,c(7)], digits=3)
GSEA_wtonly_h_df[,c(8)] <- scientific(GSEA_wtonly_h_df[,c(8)], digits=3)

GSEA_koonly_h_df <- as.data.frame(GSEA_koonly_h)
GSEA_koonly_h_df[,c(4,5)] <- round(GSEA_koonly_h_df[,c(4,5)], digits=3)
GSEA_koonly_h_df[,c(6)] <- scientific(GSEA_koonly_h_df[,c(6)], digits=3)
GSEA_koonly_h_df[,c(7)] <- scientific(GSEA_koonly_h_df[,c(7)], digits=3)
GSEA_koonly_h_df[,c(8)] <- scientific(GSEA_koonly_h_df[,c(8)], digits=3)

GSEA_crenegonly_h_df <- as.data.frame(GSEA_crenegonly_h)
GSEA_crenegonly_h_df[,c(4,5)] <- round(GSEA_crenegonly_h_df[,c(4,5)], digits=3)
GSEA_crenegonly_h_df[,c(6)] <- scientific(GSEA_crenegonly_h_df[,c(6)], digits=3)
GSEA_crenegonly_h_df[,c(7)] <- scientific(GSEA_crenegonly_h_df[,c(7)], digits=3)
GSEA_crenegonly_h_df[,c(8)] <- scientific(GSEA_crenegonly_h_df[,c(8)], digits=3)

GSEA_comp5_h_df <- as.data.frame(GSEA_comp5_h)
GSEA_comp5_h_df[,c(4,5)] <- round(GSEA_comp5_h_df[,c(4,5)], digits=3)
GSEA_comp5_h_df[,c(6)] <- scientific(GSEA_comp5_h_df[,c(6)], digits=3)
GSEA_comp5_h_df[,c(7)] <- scientific(GSEA_comp5_h_df[,c(7)], digits=3)
GSEA_comp5_h_df[,c(8)] <- scientific(GSEA_comp5_h_df[,c(8)], digits=3)


GSEA_wtonly_h_df_up <- GSEA_wtonly_h_df[GSEA_wtonly_h_df$NES>0,-c(1)]
GSEA_wtonly_h_df_down <- GSEA_wtonly_h_df[GSEA_wtonly_h_df$NES<0,-c(1)]
GSEA_koonly_h_df_up <- GSEA_koonly_h_df[GSEA_koonly_h_df$NES>0,-c(1)]
GSEA_koonly_h_df_down <- GSEA_koonly_h_df[GSEA_koonly_h_df$NES<0,-c(1)]
GSEA_crenegonly_h_df_up <- GSEA_crenegonly_h_df[GSEA_crenegonly_h_df$NES>0,-c(1)]
GSEA_crenegonly_h_df_down <- GSEA_crenegonly_h_df[GSEA_crenegonly_h_df$NES<0,-c(1)]
GSEA_comp5_h_df_up <- GSEA_comp5_h_df[GSEA_comp5_h_df$NES>0,-c(1)]
GSEA_comp5_h_df_down <- GSEA_comp5_h_df[GSEA_comp5_h_df$NES<0,-c(1)]


GSEA_wtonly_cp <- GSEA(gene = GSEA_wtonly_forR, pAdjustMethod = "BH", pvalueCutoff = 1, TERM2GENE = msigdbr_t2g_cp)

## preparing geneSet collections...

## GSEA analysis...

## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.14% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.

## leading edge analysis...

## done...

GSEA_koonly_cp <- GSEA(gene = GSEA_koonly_forR, pAdjustMethod = "BH", pvalueCutoff = 1, TERM2GENE = msigdbr_t2g_cp)

## preparing geneSet collections...

## GSEA analysis...

## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.35% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.

## leading edge analysis...

## done...

GSEA_crenegonly_cp <-  GSEA(gene = GSEA_crenegonly_forR, pAdjustMethod = "BH", pvalueCutoff = 1,TERM2GENE = msigdbr_t2g_cp)

## preparing geneSet collections...

## GSEA analysis...

## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.33% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.

## Warning in fgseaMultilevel(pathways = pathways, stats = stats, minSize =
## minSize, : For some pathways, in reality P-values are less than 1e-10. You can
## set the `eps` argument to zero for better estimation.

## leading edge analysis...

## done...

GSEA_comp5_cp <- GSEA(gene = GSEA_comp5_forR, pAdjustMethod = "BH", pvalueCutoff = 1,TERM2GENE = msigdbr_t2g_cp)

## preparing geneSet collections...

## GSEA analysis...

## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.37% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.

## leading edge analysis...

## done...

GSEA_wtonly_cp_df <- as.data.frame(GSEA_wtonly_cp)
GSEA_wtonly_cp_df[,c(4,5)] <- round(GSEA_wtonly_cp_df[,c(4,5)], digits=3)
GSEA_wtonly_cp_df[,c(6)] <- scientific(GSEA_wtonly_cp_df[,c(6)], digits=3)
GSEA_wtonly_cp_df[,c(7)] <- scientific(GSEA_wtonly_cp_df[,c(7)], digits=3)
GSEA_wtonly_cp_df[,c(8)] <- scientific(GSEA_wtonly_cp_df[,c(8)], digits=3)

GSEA_koonly_cp_df <- as.data.frame(GSEA_koonly_cp)
GSEA_koonly_cp_df[,c(4,5)] <- round(GSEA_koonly_cp_df[,c(4,5)], digits=3)
GSEA_koonly_cp_df[,c(6)] <- scientific(GSEA_koonly_cp_df[,c(6)], digits=3)
GSEA_koonly_cp_df[,c(7)] <- scientific(GSEA_koonly_cp_df[,c(7)], digits=3)
GSEA_koonly_cp_df[,c(8)] <- scientific(GSEA_koonly_cp_df[,c(8)], digits=3)

GSEA_crenegonly_cp_df <- as.data.frame(GSEA_crenegonly_cp)
GSEA_crenegonly_cp_df[,c(4,5)] <- round(GSEA_crenegonly_cp_df[,c(4,5)], digits=3)
GSEA_crenegonly_cp_df[,c(6)] <- scientific(GSEA_crenegonly_cp_df[,c(6)], digits=3)
GSEA_crenegonly_cp_df[,c(7)] <- scientific(GSEA_crenegonly_cp_df[,c(7)], digits=3)
GSEA_crenegonly_cp_df[,c(8)] <- scientific(GSEA_crenegonly_cp_df[,c(8)], digits=3)

GSEA_comp5_cp_df <- as.data.frame(GSEA_comp5_cp)
GSEA_comp5_cp_df[,c(4,5)] <- round(GSEA_comp5_cp_df[,c(4,5)], digits=3)
GSEA_comp5_cp_df[,c(6)] <- scientific(GSEA_comp5_cp_df[,c(6)], digits=3)
GSEA_comp5_cp_df[,c(7)] <- scientific(GSEA_comp5_cp_df[,c(7)], digits=3)
GSEA_comp5_cp_df[,c(8)] <- scientific(GSEA_comp5_cp_df[,c(8)], digits=3)




#BiocManager::install('tidyverse')
library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.2.3

## Warning: package 'tidyr' was built under R version 4.2.3

## Warning: package 'readr' was built under R version 4.2.3

## Warning: package 'forcats' was built under R version 4.2.3

## Warning: package 'lubridate' was built under R version 4.2.3

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.5
## ✔ lubridate 1.9.3     ✔ stringr   1.5.0
## ✔ purrr     1.0.1     ✔ tidyr     1.3.0

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ lubridate::%within%()        masks IRanges::%within%()
## ✖ clusterProfiler::arrange()   masks plyr::arrange(), dplyr::arrange()
## ✖ readr::col_factor()          masks scales::col_factor()
## ✖ dplyr::collapse()            masks IRanges::collapse()
## ✖ dplyr::combine()             masks Biobase::combine(), BiocGenerics::combine()
## ✖ purrr::compact()             masks plyr::compact()
## ✖ plyr::count()                masks dplyr::count(), matrixStats::count()
## ✖ plyr::desc()                 masks dplyr::desc(), IRanges::desc()
## ✖ purrr::discard()             masks scales::discard()
## ✖ tidyr::expand()              masks S4Vectors::expand()
## ✖ plyr::failwith()             masks dplyr::failwith()
## ✖ clusterProfiler::filter()    masks dplyr::filter(), stats::filter()
## ✖ dplyr::first()               masks S4Vectors::first()
## ✖ plyr::id()                   masks dplyr::id()
## ✖ dplyr::lag()                 masks stats::lag()
## ✖ clusterProfiler::mutate()    masks plyr::mutate(), dplyr::mutate()
## ✖ ggplot2::Position()          masks BiocGenerics::Position(), base::Position()
## ✖ purrr::reduce()              masks GenomicRanges::reduce(), IRanges::reduce()
## ✖ clusterProfiler::rename()    masks plyr::rename(), dplyr::rename(), S4Vectors::rename()
## ✖ lubridate::second()          masks S4Vectors::second()
## ✖ lubridate::second<-()        masks S4Vectors::second<-()
## ✖ AnnotationDbi::select()      masks clusterProfiler::select(), dplyr::select()
## ✖ purrr::simplify()            masks clusterProfiler::simplify()
## ✖ clusterProfiler::slice()     masks dplyr::slice(), IRanges::slice()
## ✖ clusterProfiler::summarise() masks plyr::summarise(), dplyr::summarise()
## ✖ plyr::summarize()            masks dplyr::summarize()
## ℹ Use the ]8;;http://conflicted.r-lib.org/ conflicted package]8;;  to force all conflicts to become errors

strings <- c("KEGG_","REACTOME_","WP_","BIOCARTA_")

GSEA_wtonly_cp_df_up <- GSEA_wtonly_cp_df[GSEA_wtonly_cp_df$NES>0,-c(1)] #681 reactome 314 WP 99 biocarta 110 kegg
GSEA_wtonly_cp_df_up <- GSEA_wtonly_cp_df_up %>%
  filter(str_detect(GSEA_wtonly_cp_df_up$Description, paste(strings,collapse= "|"))) #1204 pathways, as expected

GSEA_wtonly_cp_df_down <- GSEA_wtonly_cp_df[GSEA_wtonly_cp_df$NES<0,-c(1)]
GSEA_wtonly_cp_df_down <- GSEA_wtonly_cp_df_down %>%
  filter(str_detect(GSEA_wtonly_cp_df_down$Description, paste(strings,collapse= "|")))

GSEA_koonly_cp_df_up <- GSEA_koonly_cp_df[GSEA_koonly_cp_df$NES>0,-c(1)]
GSEA_koonly_cp_df_up <- GSEA_koonly_cp_df_up %>%
  filter(str_detect(GSEA_koonly_cp_df_up$Description, paste(strings,collapse= "|")))

GSEA_koonly_cp_df_down <- GSEA_koonly_cp_df[GSEA_koonly_cp_df$NES<0,-c(1)]
GSEA_koonly_cp_df_down <- GSEA_koonly_cp_df_down %>%
  filter(str_detect(GSEA_koonly_cp_df_down$Description, paste(strings,collapse= "|")))


GSEA_crenegonly_cp_df_up <- GSEA_crenegonly_cp_df[GSEA_crenegonly_cp_df$NES>0,-c(1)]
GSEA_crenegonly_cp_df_up <- GSEA_crenegonly_cp_df_up %>%
  filter(str_detect(GSEA_crenegonly_cp_df_up$Description, paste(strings,collapse= "|")))

GSEA_crenegonly_cp_df_down <- GSEA_crenegonly_cp_df[GSEA_crenegonly_cp_df$NES<0,-c(1)]
GSEA_crenegonly_cp_df_down <- GSEA_crenegonly_cp_df_down %>%
  filter(str_detect(GSEA_crenegonly_cp_df_down$Description, paste(strings,collapse= "|")))

GSEA_comp5_cp_df_up <- GSEA_comp5_cp_df[GSEA_comp5_cp_df$NES>0,-c(1)]
GSEA_comp5_cp_df_up <- GSEA_comp5_cp_df_up %>%
  filter(str_detect(GSEA_comp5_cp_df_up$Description, paste(strings,collapse= "|")))

GSEA_comp5_cp_df_down <- GSEA_comp5_cp_df[GSEA_comp5_cp_df$NES<0,-c(1)]
GSEA_comp5_cp_df_down <- GSEA_comp5_cp_df_down %>%
  filter(str_detect(GSEA_comp5_cp_df_down$Description, paste(strings,collapse= "|")))


GSEA_wtonly_go <- GSEA(gene = GSEA_wtonly_forR, pAdjustMethod = "BH", pvalueCutoff = 1, TERM2GENE = msigdbr_t2g_go)

## preparing geneSet collections...
## GSEA analysis...

## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.14% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.

## leading edge analysis...
## done...

GSEA_koonly_go <- GSEA(gene = GSEA_koonly_forR, pAdjustMethod = "BH", pvalueCutoff = 1, TERM2GENE = msigdbr_t2g_go)

## preparing geneSet collections...
## GSEA analysis...

## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.35% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.

## leading edge analysis...
## done...

GSEA_crenegonly_go <-  GSEA(gene = GSEA_crenegonly_forR, pAdjustMethod = "BH", pvalueCutoff = 1,TERM2GENE = msigdbr_t2g_go)

## preparing geneSet collections...
## GSEA analysis...

## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.33% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.

## Warning in fgseaMultilevel(pathways = pathways, stats = stats, minSize =
## minSize, : For some pathways, in reality P-values are less than 1e-10. You can
## set the `eps` argument to zero for better estimation.

## leading edge analysis...
## done...

GSEA_comp5_go <- GSEA(gene = GSEA_comp5_forR, pAdjustMethod = "BH", pvalueCutoff = 1,TERM2GENE = msigdbr_t2g_go)

## preparing geneSet collections...
## GSEA analysis...

## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.37% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.

## leading edge analysis...
## done...

GSEA_wtonly_go_df <- as.data.frame(GSEA_wtonly_go)
GSEA_wtonly_go_df[,c(4,5)] <- round(GSEA_wtonly_go_df[,c(4,5)], digits=3)
GSEA_wtonly_go_df[,c(6)] <- scientific(GSEA_wtonly_go_df[,c(6)], digits=3)
GSEA_wtonly_go_df[,c(7)] <- scientific(GSEA_wtonly_go_df[,c(7)], digits=3)
GSEA_wtonly_go_df[,c(8)] <- scientific(GSEA_wtonly_go_df[,c(8)], digits=3)

GSEA_koonly_go_df <- as.data.frame(GSEA_koonly_go)
GSEA_koonly_go_df[,c(4,5)] <- round(GSEA_koonly_go_df[,c(4,5)], digits=3)
GSEA_koonly_go_df[,c(6)] <- scientific(GSEA_koonly_go_df[,c(6)], digits=3)
GSEA_koonly_go_df[,c(7)] <- scientific(GSEA_koonly_go_df[,c(7)], digits=3)
GSEA_koonly_go_df[,c(8)] <- scientific(GSEA_koonly_go_df[,c(8)], digits=3)

GSEA_crenegonly_go_df <- as.data.frame(GSEA_crenegonly_go)
GSEA_crenegonly_go_df[,c(4,5)] <- round(GSEA_crenegonly_go_df[,c(4,5)], digits=3)
GSEA_crenegonly_go_df[,c(6)] <- scientific(GSEA_crenegonly_go_df[,c(6)], digits=3)
GSEA_crenegonly_go_df[,c(7)] <- scientific(GSEA_crenegonly_go_df[,c(7)], digits=3)
GSEA_crenegonly_go_df[,c(8)] <- scientific(GSEA_crenegonly_go_df[,c(8)], digits=3)

GSEA_comp5_go_df <- as.data.frame(GSEA_comp5_go)
GSEA_comp5_go_df[,c(4,5)] <- round(GSEA_comp5_go_df[,c(4,5)], digits=3)
GSEA_comp5_go_df[,c(6)] <- scientific(GSEA_comp5_go_df[,c(6)], digits=3)
GSEA_comp5_go_df[,c(7)] <- scientific(GSEA_comp5_go_df[,c(7)], digits=3)
GSEA_comp5_go_df[,c(8)] <- scientific(GSEA_comp5_go_df[,c(8)], digits=3)


GSEA_wtonly_go_df_up <- GSEA_wtonly_go_df[GSEA_wtonly_go_df$NES>0,-c(1)]
GSEA_wtonly_go_df_down <- GSEA_wtonly_go_df[GSEA_wtonly_go_df$NES<0,-c(1)]
GSEA_koonly_go_df_up <- GSEA_koonly_go_df[GSEA_koonly_go_df$NES>0,-c(1)]
GSEA_koonly_go_df_down <- GSEA_koonly_go_df[GSEA_koonly_go_df$NES<0,-c(1)]
GSEA_crenegonly_go_df_up <- GSEA_crenegonly_go_df[GSEA_crenegonly_go_df$NES>0,-c(1)]
GSEA_crenegonly_go_df_down <- GSEA_crenegonly_go_df[GSEA_crenegonly_go_df$NES<0,-c(1)]
GSEA_comp5_go_df_up <- GSEA_wtonly_go_df[GSEA_comp5_go_df$NES>0,-c(1)]
GSEA_comp5_go_df_down <- GSEA_wtonly_go_df[GSEA_comp5_go_df$NES<0,-c(1)]




library(writexl)

list_of_gsea <- list(Comparison_1_Hallmark_UP = GSEA_crenegonly_h_df_up,
                     Comparison_1_Hallmark_DOWN = GSEA_crenegonly_h_df_down,
                     Comparison_2_Hallmark_UP = GSEA_koonly_h_df_up,
                     Comparison_2_Hallmark_DOWN = GSEA_koonly_h_df_down,
                     Comparison_3_Hallmark_UP = GSEA_wtonly_h_df_up,
                     Comparison_3_Hallmark_DOWN = GSEA_wtonly_h_df_down,
                     Comparison_4_Hallmark_UP = GSEA_comp5_h_df_up,
                     Comparison_4_Hallmark_DOWN = GSEA_comp5_h_df_down,
                     
                     Comparison_1_CP_UP = GSEA_crenegonly_cp_df_up,
                     Comparison_1_CP_DOWN = GSEA_crenegonly_cp_df_down,
                     Comparison_2_CP_UP = GSEA_koonly_cp_df_up,
                     Comparison_2_CP_DOWN = GSEA_koonly_cp_df_down,
                     Comparison_3_CP_UP = GSEA_wtonly_cp_df_up,
                     Comparison_3_CP_DOWN = GSEA_wtonly_cp_df_down,
                     Comparison_4_CP_UP = GSEA_comp5_cp_df_up,
                     Comparison_4_CP_DOWN = GSEA_comp5_cp_df_down,
                     
                     Comparison_1_GO_UP = GSEA_crenegonly_go_df_up,
                     Comparison_1_GO_DOWN = GSEA_crenegonly_go_df_down,
                     Comparison_2_GO_UP = GSEA_koonly_go_df_up,
                     Comparison_2_GO_DOWN = GSEA_koonly_go_df_down,
                     Comparison_3_GO_UP = GSEA_wtonly_go_df_up,
                     Comparison_3_GO_DOWN = GSEA_wtonly_go_df_down,
                     Comparison_4_GO_UP = GSEA_comp5_go_df_up,
                     Comparison_4_GO_DOWN = GSEA_comp5_go_df_down
                     )
write_xlsx(list_of_gsea, path = "Supplementary Table 4.xlsx")

Creating Figure 3E: GSEA top pathway bar graph

library(dplyr)
library(forcats)
library('stringr')
library('ggplot2')

GSEA_crenegonly_h_df$Comparison <- 1
GSEA_koonly_h_df$Comparison <- 2
GSEA_comp5_h_df$Comparison <- 4

hallmark_all <- rbind(GSEA_crenegonly_h_df,GSEA_koonly_h_df,GSEA_comp5_h_df)

hallmark_all <- hallmark_all %>% 
  group_by(Comparison,sign(NES)) %>%
  arrange((-sign(NES)*NES), .by_group=TRUE) %>%
  dplyr::slice(1:3)

hallmark_all$Description <- str_wrap(hallmark_all$Description, width = 40)
hallmark_all <- hallmark_all[order(hallmark_all$NES),]
hallmark_all$Comparison <- factor(hallmark_all$Comparison,levels=c(1,2,4))

hallmark_all$Description <- gsub("^.*?_","",hallmark_all$Description)
hallmark_all <- hallmark_all[order(hallmark_all$Comparison),]

labels <- c('Comparison 1', 'Comparison 2', 'Comparison 4')
names(labels) <- c(1,2,4)


hallmark_all %>%
  mutate(ordering = as.numeric(Comparison) + NES,
         Description = fct_reorder(Description, ordering)) %>%
  mutate(Description = factor(Description),
         Description = factor(Description, levels = rev(levels(Description))))%>%
  ggplot(aes(x=NES, y=Description, fill = NES)) + 
    geom_bar(stat = 'identity') + 
    ggtitle("")+ 
    facet_grid(~Comparison, labeller = labeller(Comparison = labels))+
    xlab("Normalized Enrichment Score") + ylab("Hallmark Pathways")+
    scale_fill_gradient2(low = "blue",midpoint = 0, mid = "white", high ="red") +
    #theme_base(base_size = 16) +
    coord_cartesian(xlim = c(-3,3))+
    theme(axis.text.x = element_text(size = 12),
          axis.title.y = element_text(size = 18)) +
    geom_vline(xintercept = 0, color = "black", linetype = "dashed", linewidth = .5)

ggsave("Hallmark_TopInAll.pdf", height = 3, width = 6)

Figures 3F/G, Figures S5D/E, Supplementary Table 5: Preparing Z scores for export to Prism

##heatmap of glycolysis genes
library(matrixStats)

glyco_creneg <- read.csv("glycolysis rnaseq creneg.csv") # List of genes from KEGGM11521

glyco_creneg <- as.data.frame(gene_sets_cp[gene_sets_cp$gs_name=="KEGG_GLYCOLYSIS_GLUCONEOGENESIS",])
glyco_heatmap <- tpmFinal[glyco_creneg$gene_symbol,]
glyco_heatmap <- glyco_heatmap[glyco_heatmap$all_above_1==TRUE,] #61 genes to 41 genes

glyco_heatmap <- glyco_heatmap[order(glyco_heatmap[,27]),]
glyco_heatmap$"mean" <- rowMeans(glyco_heatmap[,c(4:22)])
glyco_heatmap$"sd" <- rowSds(as.matrix(glyco_heatmap[,(4:22)]))
glyco_heatmap[,(4:22)] <- (glyco_heatmap[,(4:22)]-glyco_heatmap$mean)/glyco_heatmap$sd
glyco_heatmap$Fmr1WTSmad4WTavg <- rowMeans(glyco_heatmap[,c(4:7)])
glyco_heatmap$Fmr1WTSmad4cKOavg <- rowMeans(glyco_heatmap[,c(8:13)])
glyco_heatmap$Fmr1KOSmad4WTavg <- rowMeans(glyco_heatmap[,c(14:18)])
glyco_heatmap$Fmr1KOSmad4cKOavg <- rowMeans(glyco_heatmap[,c(19:22)])

glyco_heatmap$up_down <- (glyco_heatmap[,36]-glyco_heatmap[,38])>0
glyco_heatmap$rescue_amount <- glyco_heatmap[,39]-glyco_heatmap[,38]
glyco_heatmap <- glyco_heatmap[order(-glyco_heatmap[,40], glyco_heatmap[,41]),]

write.csv(glyco_heatmap, "./glyco_heatmap_zscore.csv")
##Prism was used to create the heatmap
##Average z-scores in the CSV file were used to create Supplementary Table 5.


#Heatmap of oxphos genes
oxphos_creneg <- read.csv("oxphos rnaseq creneg.csv") ## List of genes from MSigDB MM3893
rownames(oxphos_creneg) <- oxphos_creneg[,2]
oxphos_heatmap <- tpmFinal[rownames(oxphos_creneg),]
oxphos_heatmap <- oxphos_heatmap[oxphos_heatmap$all_above_1==TRUE,] #193 to 193!
oxphos_heatmap <- oxphos_heatmap[order(oxphos_heatmap[,27]),]
oxphos_heatmap$"mean" <- rowMeans(oxphos_heatmap[,c(4:22)])
oxphos_heatmap$"sd" <- rowSds(as.matrix(oxphos_heatmap[,(4:22)]))
oxphos_heatmap[,(4:22)] <- (oxphos_heatmap[,(4:22)]-oxphos_heatmap$mean)/oxphos_heatmap$sd
oxphos_heatmap$Fmr1WTSmad4WTavg <- rowMeans(oxphos_heatmap[,c(4:7)])
oxphos_heatmap$Fmr1WTSmad4cKOavg <- rowMeans(oxphos_heatmap[,c(8:13)])
oxphos_heatmap$Fmr1KOSmad4WTavg <- rowMeans(oxphos_heatmap[,c(14:18)])
oxphos_heatmap$Fmr1KOSmad4cKOavg <- rowMeans(oxphos_heatmap[,c(19:22)])

oxphos_heatmap$up_down <- (oxphos_heatmap[,35]-oxphos_heatmap[,37])>0
oxphos_heatmap$rescue_amount <- oxphos_heatmap[,38]-oxphos_heatmap[,37]
oxphos_heatmap <- oxphos_heatmap[order(-oxphos_heatmap[,39], oxphos_heatmap[,40]),]

write.csv(oxphos_heatmap, "./oxphos_heatmap_zscore.csv")
##Prism was used to create the heatmap
##Average z-scores in the CSV file were used to create Supplementary Table 5.

Compiling data related to Supplementary Table 1

#List of BMP-related genes from Blanco-Suarez et al., 2018.

res_wtonly_bmp <- as.data.frame(res_wtonly)[c('Acvr1b','Acvr1c','Acvr2a','Acvr2b','Acvrl1','Bambi','Bmp1','Bmp10','Bmp15','Bmp2','Bmp2k','Bmp3','Bmp4','Bmp5','Bmp6',
                                              'Bmp7','Bmp8a','Bmper','Bmpr1a','Bmpr1b','Bmpr2','Cer1','Chrd','Chrdl1','Chrdl2','Crim1','Dand5','Grem1','Grem2',
                                              'Id1','Id2','Id3','Id4','Inha','Inhba','Inhbb','Inhbc','Inhbe','Kcp','Nbl1','Nog','Pmepa1','Smad1','Smad2','Smad3','Smad4',
                                              'Smad5','Smad6','Smad7','Smad9','Tgfa','Tgfb1','Tgfb2','Tgfb3','Tgfbr1','Tgfbr2','Tgfbr3','Tgfbrap1','Twsg1'),]
write.csv(res_wtonly_bmp, "./res_wtonly_bmp.csv") ##Comparison 3

res_koonly_bmp <- as.data.frame(res_koonly)[c('Acvr1b','Acvr1c','Acvr2a','Acvr2b','Acvrl1','Bambi','Bmp1','Bmp10','Bmp15','Bmp2','Bmp2k','Bmp3','Bmp4','Bmp5','Bmp6',
                                              'Bmp7','Bmp8a','Bmper','Bmpr1a','Bmpr1b','Bmpr2','Cer1','Chrd','Chrdl1','Chrdl2','Crim1','Dand5','Grem1','Grem2',
                                              'Id1','Id2','Id3','Id4','Inha','Inhba','Inhbb','Inhbc','Inhbe','Kcp','Nbl1','Nog','Pmepa1','Smad1','Smad2','Smad3','Smad4',
                                              'Smad5','Smad6','Smad7','Smad9','Tgfa','Tgfb1','Tgfb2','Tgfb3','Tgfbr1','Tgfbr2','Tgfbr3','Tgfbrap1','Twsg1'),]
write.csv(res_koonly_bmp, "./res_koonly_bmp.csv") ##Comparison 2

res_crenegonly_bmp <- as.data.frame(res_crenegonly)[c('Acvr1b','Acvr1c','Acvr2a','Acvr2b','Acvrl1','Bambi','Bmp1','Bmp10','Bmp15','Bmp2','Bmp2k','Bmp3','Bmp4','Bmp5','Bmp6',
                                              'Bmp7','Bmp8a','Bmper','Bmpr1a','Bmpr1b','Bmpr2','Cer1','Chrd','Chrdl1','Chrdl2','Crim1','Dand5','Grem1','Grem2',
                                              'Id1','Id2','Id3','Id4','Inha','Inhba','Inhbb','Inhbc','Inhbe','Kcp','Nbl1','Nog','Pmepa1','Smad1','Smad2','Smad3','Smad4',
                                              'Smad5','Smad6','Smad7','Smad9','Tgfa','Tgfb1','Tgfb2','Tgfb3','Tgfbr1','Tgfbr2','Tgfbr3','Tgfbrap1','Twsg1'),]
write.csv(res_crenegonly_bmp, "./res_crenegonly_bmp.csv") ##Comparison 1


res_kokowtwt_bmp <- as.data.frame(res_comp5)[c('Acvr1b','Acvr1c','Acvr2a','Acvr2b','Acvrl1','Bambi','Bmp1','Bmp10','Bmp15','Bmp2','Bmp2k','Bmp3','Bmp4','Bmp5','Bmp6',
                                    'Bmp7','Bmp8a','Bmper','Bmpr1a','Bmpr1b','Bmpr2','Cer1','Chrd','Chrdl1','Chrdl2','Crim1','Dand5','Grem1','Grem2',
                                              'Id1','Id2','Id3','Id4','Inha','Inhba','Inhbb','Inhbc','Inhbe','Kcp','Nbl1','Nog','Pmepa1','Smad1','Smad2','Smad3','Smad4',
                                              'Smad5','Smad6','Smad7','Smad9','Tgfa','Tgfb1','Tgfb2','Tgfb3','Tgfbr1','Tgfbr2','Tgfbr3','Tgfbrap1','Twsg1'),]
write.csv(res_kokowtwt_bmp, "./res_kokowtwt_bmp.csv") ##Comparison 4

##The above CSV files as well as Supplementary Table 2 were combined in Excel to create Supplementary Table 1.

Figures S6D,E; comparisons to Caldwell et al., 2022

library(ggplot2)
library(ggrepel)

Alie_RNAseq_ND <- read.csv("Caldwell_s3 RNAseq ND.csv")


corr_Alie_FXS <- as.data.frame(res_crenegonly)[,c(1:2)]
corr_Alie_FXS$Gene <- rownames(corr_Alie_FXS)
corr_Alie_FXS <- corr_Alie_FXS[order(corr_Alie_FXS$Gene),]
corr_Alie_FXS$Alie <- Alie_RNAseq_ND$FXS.vs..WT.Log2.FC
corr_Alie_FXS <- corr_Alie_FXS[,-c(1)]

corr_Alie_FXS[sapply(corr_Alie_FXS, is.infinite)] <- NA


map <- ggplot(corr_Alie_FXS, aes(x=Alie,y=log2FoldChange))+geom_point(size=0.2) + 
  geom_vline(xintercept=0, col="red") +
  geom_hline(yintercept=0, col="red") +
  labs(x = "log2FC_invitro", y = "log2_invivo") +
  theme_bw() +
  theme(axis.text = element_text(size=12),
        axis.title = element_text(size=14),
        aspect.ratio=1) +
  stat_smooth(method=lm, se=FALSE)
ggsave("corr_Alie_FXS.pdf", map, width = 3, height = 3, unit = "in")

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 6161 rows containing non-finite outside the scale range
## (`stat_smooth()`).

## Warning: Removed 6161 rows containing missing values or values outside the scale range
## (`geom_point()`).

cor_corr_Alie_FXS <- cor(corr_Alie_FXS[,c(1,3)], method = 'pearson', use = "pairwise.complete.obs") #correlation is -0.0133



corr_Alie_FXS_DEGs <- corr_Alie_FXS[c(rownames(res_crenegonly_down),rownames(res_crenegonly_up)),]

map <- ggplot(corr_Alie_FXS_DEGs, aes(x=Alie,y=log2FoldChange, label = Gene))+geom_point(size=0.2) + 
  geom_vline(xintercept=0, col="red") +
  geom_hline(yintercept=0, col="red") +
  labs(x = "log2FC_invitro", y = "log2_invivo") +
  theme_bw() +
  theme(axis.text = element_text(size=12),
        axis.title = element_text(size=14),
        aspect.ratio=1) +
  stat_smooth(method=lm) + 
  geom_text_repel(size=2,max.overlaps=10)
ggsave("corr_Alie_FXS_DEGs.pdf", map, width = 3, height = 3, unit = "in")

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_smooth()`).

## Warning: The following aesthetics were dropped during statistical transformation: label.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_text_repel()`).

## Warning: ggrepel: 132 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

cor_corr_Alie_FXS_DEGs <- cor(corr_Alie_FXS_DEGs[,c(1,3)], method = 'pearson', use = "pairwise.complete.obs") #correlation is -0.0241

Figures S6F,G; comparisons to Caldwell et al., 2022

# BMP6 DEGs
library(ggplot2)
library(ggrepel)


corr_Alie_BMP <- as.data.frame(res_wtonly)[,c(1:2)]
corr_Alie_BMP$Gene <- rownames(corr_Alie_BMP)
corr_Alie_BMP <- corr_Alie_BMP[order(corr_Alie_BMP$Gene),]
corr_Alie_BMP$Alie <- Alie_RNAseq_BMP$BMP6.vs..Alone.Log2.FC
corr_Alie_BMP <- corr_Alie_BMP[,-c(1)]
corr_Alie_BMP[sapply(corr_Alie_BMP, is.infinite)] <- NA

map <- ggplot(corr_Alie_BMP, aes(x=Alie,y=log2FoldChange))+geom_point(size=0.2) + 
  geom_vline(xintercept=0, col="red") +
  geom_hline(yintercept=0, col="red") +
  labs(x = "log2FC_invitro", y = "log2_invivo") +
  theme_bw() +
  theme(axis.text = element_text(size=12),
        axis.title = element_text(size=14),
        aspect.ratio=1) +
  stat_smooth(method=lm, se=FALSE)
ggsave("corr_Alie_BMP.pdf", map, width = 3, height = 3, unit = "in")

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 6237 rows containing non-finite outside the scale range
## (`stat_smooth()`).

## Warning: Removed 6237 rows containing missing values or values outside the scale range
## (`geom_point()`).

cor_corr_Alie_BMP <- cor(corr_Alie_BMP[,c(1,3)], method = 'pearson', use = "pairwise.complete.obs") #correlation is -0.0134

corr_Alie_BMP_DEGs <- corr_Alie_BMP[c(rownames(res_wtonly_down),rownames(res_wtonly_up)),]

map <- ggplot(corr_Alie_BMP_DEGs, aes(x=Alie,y=log2FoldChange, label = Gene))+geom_point(size=0.2) + 
  labs(x = "log2FC_invitro", y = "log2_invivo") +
  theme_bw() +
  theme(axis.text = element_text(size=12),
        axis.title = element_text(size=14),
        aspect.ratio=1) +
  stat_smooth(method=lm) +
  geom_vline(xintercept=0, col="red") +
  geom_hline(yintercept=0, col="red") +
  geom_text_repel(size=2,max.overlaps=25)
ggsave("corr_Alie_BMP_DEGs.pdf", map, width = 3, height = 3, unit = "in")

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_smooth()`).

## Warning: The following aesthetics were dropped during statistical transformation: label.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_text_repel()`).

## Warning: ggrepel: 116 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

cor_corr_Alie_BMP_DEGs <- cor(corr_Alie_BMP_DEGs[,c(1,3)], method = 'pearson', use = "pairwise.complete.obs") #correlation is -0.581

Session Info: packages and versions

sessionInfo()

## R version 4.2.2 (2022-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 22631)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] ggrepel_0.9.2               lubridate_1.9.3            
##  [3] forcats_1.0.0               stringr_1.5.0              
##  [5] purrr_1.0.1                 readr_2.1.5                
##  [7] tidyr_1.3.0                 tidyverse_2.0.0            
##  [9] gson_0.1.0                  org.Mm.eg.db_3.16.0        
## [11] DOSE_3.24.2                 ggridges_0.5.4             
## [13] msigdbr_7.5.1               GO.db_3.16.0               
## [15] AnnotationDbi_1.60.0        ReactomePA_1.42.0          
## [17] pathview_1.38.0             enrichplot_1.18.3          
## [19] clusterProfiler_4.6.0       corrplot_0.92              
## [21] BSDA_1.2.2                  lattice_0.20-45            
## [23] devtools_2.4.5              usethis_2.2.2              
## [25] geomtextpath_0.1.1          hrbrthemes_0.8.7           
## [27] plyr_1.8.8                  fasstr_0.5.1               
## [29] scales_1.3.0                viridis_0.6.2              
## [31] viridisLite_0.4.1           pheatmap_1.0.12            
## [33] writexl_1.4.2               tibble_3.1.8               
## [35] dplyr_1.0.10                FactoMineR_2.7             
## [37] factoextra_1.0.7            ggplot2_3.5.0              
## [39] DESeq2_1.38.3               SummarizedExperiment_1.28.0
## [41] Biobase_2.58.0              MatrixGenerics_1.10.0      
## [43] matrixStats_0.63.0          GenomicRanges_1.50.2       
## [45] GenomeInfoDb_1.34.6         IRanges_2.32.0             
## [47] S4Vectors_0.36.1            BiocGenerics_0.44.0        
## [49] htmltools_0.5.4            
## 
## loaded via a namespace (and not attached):
##   [1] estimability_1.4.1      rappdirs_0.3.3          pbdZMQ_0.3-8           
##   [4] coda_0.19-4             ragg_1.2.7              bit64_4.0.5            
##   [7] knitr_1.41              DelayedArray_0.23.2     data.table_1.14.6      
##  [10] KEGGREST_1.38.0         RCurl_1.98-1.9          generics_0.1.3         
##  [13] snow_0.4-4              callr_3.7.3             cowplot_1.1.1          
##  [16] RSQLite_2.2.20          shadowtext_0.1.2        proxy_0.4-27           
##  [19] bit_4.0.5               tzdb_0.3.0              httpuv_1.6.7           
##  [22] xfun_0.36               hms_1.1.2               jquerylib_0.1.4        
##  [25] babelgene_22.9          evaluate_0.19           promises_1.2.0.1       
##  [28] fansi_1.0.3             Rgraphviz_2.42.0        igraph_1.3.5           
##  [31] DBI_1.1.3               geneplotter_1.76.0      htmlwidgets_1.6.1      
##  [34] ellipsis_0.3.2          fontLiberation_0.1.0    annotate_1.76.0        
##  [37] fontBitstreamVera_0.1.1 vctrs_0.6.3             remotes_2.4.2          
##  [40] cachem_1.0.6            withr_2.5.0             ggforce_0.4.1          
##  [43] HDO.db_0.99.1           emmeans_1.8.3           treeio_1.22.0          
##  [46] prettyunits_1.1.1       cluster_2.1.4           ape_5.6-2              
##  [49] lazyeval_0.2.2          crayon_1.5.2            crul_1.4.0             
##  [52] pkgconfig_2.0.3         labeling_0.4.2          tweenr_2.0.2           
##  [55] nlme_3.1-160            pkgload_1.3.2           rlang_1.1.1            
##  [58] lifecycle_1.0.3         miniUI_0.1.1.1          downloader_0.4         
##  [61] fontquiver_0.2.1        httpcode_0.3.0          extrafontdb_1.0        
##  [64] polyclip_1.10-4         graph_1.76.0            Matrix_1.6-1.1         
##  [67] aplot_0.1.9             processx_3.8.0          png_0.1-8              
##  [70] bitops_1.0-7            Biostrings_2.66.0       blob_1.2.3             
##  [73] qvalue_2.30.0           multcompView_0.1-8      gridGraphics_0.5-1     
##  [76] reactome.db_1.82.0      leaps_3.1               memoise_2.0.1          
##  [79] graphite_1.44.0         magrittr_2.0.3          zlibbioc_1.44.0        
##  [82] compiler_4.2.2          scatterpie_0.1.8        RColorBrewer_1.1-3     
##  [85] KEGGgraph_1.58.3        cli_3.6.0               XVector_0.38.0         
##  [88] urlchecker_1.0.1        patchwork_1.1.2         ps_1.7.2               
##  [91] mgcv_1.8-41             MASS_7.3-58.1           tidyselect_1.2.0       
##  [94] stringi_1.7.12          textshaping_0.3.7       highr_0.10             
##  [97] yaml_2.3.6              GOSemSim_2.24.0         locfit_1.5-9.7         
## [100] grid_4.2.2              sass_0.4.8              timechange_0.3.0       
## [103] fastmatch_1.1-3         tools_4.2.2             parallel_4.2.2         
## [106] rstudioapi_0.14         gridExtra_2.3           scatterplot3d_0.3-42   
## [109] farver_2.1.1            ggraph_2.1.0            digest_0.6.31          
## [112] shiny_1.7.4             gfonts_0.2.0            Rcpp_1.0.9             
## [115] later_1.3.0             org.Hs.eg.db_3.16.0     httr_1.4.7             
## [118] gdtools_0.3.7           colorspace_2.0-3        XML_3.99-0.13          
## [121] fs_1.5.2                splines_4.2.2           yulab.utils_0.0.6      
## [124] tidytree_0.4.2          graphlayouts_0.8.4      ggplotify_0.1.0        
## [127] sessioninfo_1.2.2       systemfonts_1.0.4       xtable_1.8-4           
## [130] jsonlite_1.8.4          ggtree_3.6.2            tidygraph_1.2.2        
## [133] flashClust_1.01-2       ggfun_0.0.9             R6_2.5.1               
## [136] profvis_0.3.8           pillar_1.9.0            mime_0.12              
## [139] glue_1.6.2              fastmap_1.1.0           DT_0.26                
## [142] BiocParallel_1.32.5     class_7.3-20            codetools_0.2-18       
## [145] fgsea_1.24.0            pkgbuild_1.4.0          mvtnorm_1.1-3          
## [148] utf8_1.2.2              bslib_0.4.2             curl_4.3.3             
## [151] Rttf2pt1_1.3.12         rmarkdown_2.19          munsell_0.5.0          
## [154] e1071_1.7-14            GenomeInfoDbData_1.2.9  reshape2_1.4.4         
## [157] gtable_0.3.1            extrafont_0.19

This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter. Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.