Document prepared by Eveline Pinseel, March 2023.
This document gives an overview of the reanalysis of the RNA-seq data published in Pinseel et al. 2022, ISME. This pipeline follows the exact steps described in Pinseel et al. 2022, but now using the Skmarinoi reference genome ref v1.1.2 as a reference. Therefore, I start this analysis from the quality controlled, trimmed reads obtained by Pinseel et al. 2022, and immediately proceed with mapping to the genome using STAR.
We made the STAR index as follows:
# extract nuclear genome (= remove plastid and mitochondrial genome from GFF)
head -n 84053 Sm_ManualCuration.v1.1.2.gff > Sm_ManualCuration.v1.1.2_nuclear.gff
# create output directory
mkdir STAR_index_STAR2.7.10_100bpread_Ref_v1.1.2
# run STAR v2.7.10.a
module load STAR
STAR \
--runThreadN 10 \
--runMode genomeGenerate \
--genomeDir STAR_index_STAR2.7.10_100bpread_Ref_v1.1.2 \
--genomeFastaFiles Skeletonema_marinoi_Ref_v1.1.2_nuclear.fst \
--sjdbGTFfile Sm_ManualCuration.v1.1.2_nuclear.gff \
--sjdbGTFtagExonParentTranscript Parent \
--sjdbOverhang 99 \
--genomeSAindexNbases 11
##-genomeSAindexNbases was set to 12 in the Ref1.1 analysis. However, STAR complained about this with this new genome version and suggested to use 11 instead
Next, I extracted information on intron length from the gff. To extract the introns, I work in python:
#if gffutils is not installed, install first
#pip install gffutils
import gffutils
#create GFF database (you only need to do this once)
gffutils.create_db('Sm_ManualCuration.v1.1.2_nuclear.gff',
'Sm_ManualCuration.v1.1.2_nuclear.db', keep_order=False,
merge_strategy='merge',sort_attribute_values=False, id_spec=['ID', 'Name'], force=True)
#import the database
db = gffutils.FeatureDB('Sm_ManualCuration.v1.1.2_nuclear.db', keep_order=True)
#create introns
data = gffutils.FeatureDB.create_introns(db, exon_featuretype='exon', grandparent_featuretype=None, parent_featuretype='mRNA', new_featuretype='intron', merge_attributes=True)
#print all the introns
for intron in data:
print(intron)
#copy output to text file to be treated as txt
Introns were stored in: Sm_ManualCuration.v1.1.2_nuclear-introns.txt.
Note that when calculating the introns for Ref1.1, I used grandparent_featuretype=‘Gene’ and parent_featuretype=None. I had to adjust this here because the new GFF does not contain feature types for gene but seems to indicate the positions of complete genes with mRNA.
Then I calculated intron lengths:
import csv
# import the file
GFF = 'Sm_ManualCuration.v1.1.2_nuclear-introns.txt'
# create empty list for the intron lengths
intron_lengths = []
#create a dictionary for the intron lengths and their IDs
dict_intron_length = {}
# read GFF file, line by line
with open(GFF, 'r') as gff_file:
# create a csv reader without commented lines
reader = csv.reader(gff_file, delimiter="\t")
for line in reader:
# skip blank lines
if not line:
continue
else:
# extract information from the GFF
start = int(line[3])
end = int(line[4])
attributes = line[8]
# calculate the length of all the introns
length = end - start + 1
# create a list of all the intron lengths
intron_lengths.append(length)
# create a dictionary that links the intron lengths with the intron IDs
dict_intron_length[length] = attributes
# calculate the min - max intron lengths
min_length = min(intron_lengths)
max_length = max(intron_lengths)
# print the min - max intron lengths
print("The minimum intron length is " + str(min_length))
print("The maximum intron length is " + str(max_length))
# export the intron lengths to a file
file = open("Skeletonema_marinoi_intron_lengths.txt", "w")
file.writelines(str(intron_lengths))
file.close()
#look for the minimum maximum values in the dictionary
min_intron = dict_intron_length.get(min_length)
max_intron = dict_intron_length.get(max_length)
print("The intron ID of the minimum intron length is: " + min_intron)
print("The intron ID of the maximum intron length is: " + max_intron)
#The minimum intron length is 4
#The maximum intron length is 17105
#The intron ID of the minimum intron length is: Parent=Sm_t00018725-RA
#The intron ID of the maximum intron length is: Parent=Sm_t00004715-RA
We ran STAR:
# create output directory
mkdir STAR_output
# create list of file names
ls ktrim_output/*read1.fq | sed "s/trimmed_Ktrim.read1.fq//" | sed "s,ktrim_output/,," > names_STAR_ktrim.txt
# run STAR v2.7.3.a
for i in $(cat names_STAR_ktrim.txt);do STAR \
--runThreadN 15 \
--genomeDir Skmarinoi_Ref_v1.1.2_2021-12-06/STAR_index_STAR2.7.10_100bpread_Ref_v1.1.2 \
--outSAMtype BAM SortedByCoordinate \
--alignIntronMin 4 \
--alignIntronMax 17105 \
--outReadsUnmapped Fastx \
--readFilesIn ktrim_output/$i\trimmed_Ktrim.read1.fq ktrim_output/$i\trimmed_Ktrim.read2.fq \
--outFileNamePrefix STAR_output/$i; done
Let’s extract information on read mapping:
# grep for mapping information
grep "Uniquely mapped reads %" *_Log.final.out | tr ' ' '_' | sed 's/%//1g' | sed "1s/^/File\tPercentage\n/" > STAR_uniquely_mapped_reads.txt
grep "% of reads mapped to multiple loci" *_Log.final.out | tr ' ' '_' | sed 's/%//1g' | sed "1s/^/File\tPercentage\n/" > STAR_reads_mapped_to_multiple_loci.txt
grep "% of reads mapped to too many loci" *_Log.final.out | tr ' ' '_' | sed 's/%//1g' | sed "1s/^/File\tPercentage\n/" > STAR_reads_mapped_to_too_many_loci.txt
grep "% of reads unmapped: too many mismatches" *_Log.final.out | tr ' ' '_' | sed 's/%//1g' | sed "1s/^/File\tPercentage\n/" > STAR_unmapped_too_many_mismatches.txt
grep "% of reads unmapped: too short" *_Log.final.out | tr ' ' '_' | sed 's/%//1g' | sed "1s/^/File\tPercentage\n/" > STAR_unmapped_too_short.txt
grep "% of reads unmapped: other" *_Log.final.out | tr ' ' '_' | sed 's/%//1g' | sed "1s/^/File\tPercentage\n/" > STAR_unmapped_other.txt
grep "% of chimeric reads" *_Log.final.out | tr ' ' '_' | sed 's/%//1g' | sed "1s/^/File\tPercentage\n/" > STAR_chimeric_reads.txt
#the first sed command removes all occurrences of the %-sign on each line: necessary for visualization in R
#the second sed command adds a title to the file, necessary for downstream analysis in R. Note that this code does not work on a Mac.
#the tr command removes spaces in the lines (problem in R)
# change names in files
for i in STAR*.txt;do paste $i short_name_STAR.txt > SN_$i;done
Plot the mapping data:
STAR_uniquely_mapped_reads = read.table("SN_STAR_uniquely_mapped_reads.txt", header = TRUE)
barplot(STAR_uniquely_mapped_reads$Percentage, main="STAR: uniquely mapped reads", ylim = c(0,100), ylab = "percentage (%)", names.arg= as.matrix(STAR_uniquely_mapped_reads$Short_name),cex.names=0.35, las=2, col = as.vector(STAR_uniquely_mapped_reads$Colour))
STAR_reads_mapped_to_multiple_loci = read.table("SN_STAR_reads_mapped_to_multiple_loci.txt", header = TRUE)
barplot(STAR_reads_mapped_to_multiple_loci$Percentage, main="STAR: reads mapped to multiple loci", ylab = "percentage (%)",names.arg= as.matrix(STAR_uniquely_mapped_reads$Short_name),cex.names=0.35, las=2, col = as.vector(STAR_reads_mapped_to_multiple_loci$Colour))
STAR_reads_mapped_to_too_many_loci = read.table("SN_STAR_reads_mapped_to_too_many_loci.txt", header = TRUE)
barplot(STAR_reads_mapped_to_too_many_loci$Percentage, main="STAR: reads mapped to too many loci", ylab = "percentage (%)",names.arg= as.matrix(STAR_uniquely_mapped_reads$Short_name),cex.names=0.35, las=2, col = as.vector(STAR_reads_mapped_to_too_many_loci$Colour))
STAR_unmapped_too_many_mismatches = read.table("SN_STAR_unmapped_too_many_mismatches.txt", header = TRUE)
barplot(STAR_unmapped_too_many_mismatches$Percentage, main="STAR: unmapped reads - too many mismatches", ylab = "percentage (%)",names.arg= as.matrix(STAR_uniquely_mapped_reads$Short_name),cex.names=0.35, las=2, col = as.vector(STAR_unmapped_too_many_mismatches$Colour))
STAR_unmapped_too_short = read.table("SN_STAR_unmapped_too_short.txt", header = TRUE)
barplot(STAR_unmapped_too_short$Percentage, main="STAR: unmapped reads - too short", ylab = "percentage (%)",names.arg= as.matrix(STAR_uniquely_mapped_reads$Short_name),cex.names=0.35, las=2, col = as.vector(STAR_unmapped_too_short$Colour))
STAR_chimeric_reads = read.table("SN_STAR_chimeric_reads.txt", header = TRUE)
barplot(STAR_chimeric_reads$Percentage, main="STAR: chimeric reads", ylab = "percentage (%)",names.arg= as.matrix(STAR_uniquely_mapped_reads$Short_name),cex.names=0.35, las=2, col = as.vector(STAR_chimeric_reads$Colour))
STAR_uniquely_mapped_reads = read.table("SN_STAR_uniquely_mapped_reads.txt", header = TRUE)
STAR_reads_mapped_to_multiple_loci = read.table("SN_STAR_reads_mapped_to_multiple_loci.txt", header = TRUE)
sum = STAR_uniquely_mapped_reads$Percentage + STAR_reads_mapped_to_multiple_loci$Percentage
barplot(sum, main="STAR: uniquely mapped reads + multimapped reads", ylab = "percentage (%)", ylim = c(0,100), names.arg= as.matrix(STAR_uniquely_mapped_reads$Short_name),cex.names=0.35, las=2, col = as.vector(STAR_uniquely_mapped_reads$Colour))
I used the GFF file of S. marinoi that does not contain sequences and only includes the nuclear genome: Sm_ManualCuration.v1.1.2_nuclear.gff.
We ran HTSeq as outlined below. Note that sorting the BAM files was not necessary because STAR had already sorted the output:
# create list of file names (BAM output STAR)
ls STAR_output/*bam | sed "s/Aligned.sortedByCoord.out.bam//" | sed "s,STAR_output/,," > names_BAMfiles.txt
# create index files for all bam files (in STAR_output folder)
module load samtools #loads samtools v1.10
ls *.bam > names_BAMfiles.txt
for i in $(cat names_BAMfiles.txt); do \
samtools index $i; done
# load HTSeq [since July 2022]
module load gcc/11.2.1 mkl/19.0.5 python/3.10-anaconda;source /share/apps/bin/conda-3.10.sh;conda activate htseq-3.10
# run HTSeq v3.10 on gene-level
for i in $(cat names_BAMfiles.txt); do \
htseq-count \
--format=bam \
--order=pos \
--stranded=reverse \
--minaqual=10 \
--type=mRNA \
--idattr=ID \
--mode=union \
--nonunique=none \
--samout="$i"HTSeq.gene-level.out \
STAR_output/"$i"Aligned.sortedByCoord.out.bam \
Skmarinoi_Ref_v1.1.2_2021-12-06/Sm_ManualCuration.v1.1.2_nuclear.gff \
>> "$i"HTSeq.gene-level.out.STDOUT 2>> "$i"HTSeq.gene-level.out.STERROR;done
#for type=XXX: look at the third column in the GFF file! Running this with 'gene' did not work because the GFF does not contain any gene features.
The STDOUT file contains information on the number of reads that were counted and those that were not counted. To get a better grasp on these results, we need to extract them from the files:
grep "__no_feature" HTSeq_output/*STDOUT | sed "1s/^/File\tNumber\n/" > HTSeq_gene-level_nofeature.txt
grep "__ambiguous" HTSeq_output/*STDOUT | sed "1s/^/File\tNumber\n/" > HTSeq_gene-level_ambiguous.txt
grep "__too_low_aQual" HTSeq_output/*STDOUT | sed "1s/^/File\tNumber\n/" > HTSeq_gene-level_toolowaQual.txt
grep "__not_aligned" HTSeq_output/*STDOUT | sed "1s/^/File\tNumber\n/" > HTSeq_gene-level_not-aligned.txt
grep "__alignment_not_unique" HTSeq_output/*STDOUT | sed "1s/^/File\tNumber\n/" > HTSeq_gene-level_alignment_not_unique.txt
#sed introduces a header to the file
Above grep lines will only give us the reads that were not counted. To get the counted read numbers, we need to sum the counts for all the genes in each file:
#run recursively for multiple files
ls HTSeq_output/*STDOUT > HTSeq_gene_output_STDOUT_list.txt
for i in $(cat HTSeq_gene_output_STDOUT_list.txt); do head -n 17203 $i | cut -f 2 | awk '{s+=$1}END{print s}';done > HTSeq_gene-level_countedreads.txt
#add labels to the counts
tail -n 72 HTSeq_gene-level_nofeature.txt | cut -f 1 > labels.txt; paste labels.txt HTSeq_gene-level_countedreads.txt | sed "1s/^/File\tNumber\n/" > HTSeq_gene-level_countedreads_labels.txt; mv HTSeq_gene-level_countedreads_labels.txt HTSeq_gene-level_countedreads.txt; rm labels.txt
HTSeq give absolute read-counts, but I’m interested in relative read-counts. We therefore need to have a file which contains the total read counts per sample. This can also be achieved based on the HTSeq output files:
for i in $(cat HTSeq_gene_output_STDOUT_list.txt); do cut -f 2 $i | awk '{s+=$1}END{print s}';done > HTSeq_gene-level_totalreads.txt
tail -n 72 HTSeq_gene-level_nofeature.txt | cut -f 1 > labels.txt; paste labels.txt HTSeq_gene-level_totalreads.txt | sed "1s/^/File\tNumber\n/" > HTSeq_gene-level_totalreads_labels.txt; mv HTSeq_gene-level_totalreads_labels.txt HTSeq_gene-level_totalreads.txt; rm labels.txt
Add shorter labels to the count data for visualization in R:
for i in HTSeq_gene-level*.txt;do paste $i STAR_output/short_name_STAR.txt > SN_$i;done
We will also need the total number of reads that were used as input in STAR:
grep "Number of input reads" STAR_output/*_Log.final.out | tr ' ' '_' | sed 's/%//1g' | sed "1s/^/File\tNumber\n/" > STAR_input_reads.txt
paste STAR_input_reads.txt STAR_output/short_name_STAR.txt > SN_STAR_input_reads.txt
Visualization in R:
Below code calculates the number of reads counted by HTSeq that were given as input to STAR:
HTSeq_gene_countedreads=read.table("SN_HTSeq_gene-level_countedreads.txt", header = TRUE)
STAR_input=read.table("SN_STAR_input_reads.txt", header = TRUE)
ratio = (HTSeq_gene_countedreads$Number / STAR_input$Number) * 100
barplot(ratio, main="HTSeq gene-level: counted reads [of STAR input]", ylim=c(0,70), ylab = "percentage (%)",names.arg= as.matrix(HTSeq_gene_countedreads$Short_name),cex.names=0.35, las=2, col = as.vector(HTSeq_gene_countedreads$Colour))
Below code calculates the number of reads counted by HTSeq that were mapped in STAR. Note that multimapped reads were excluded from the HTSeq analysis:
HTSeq_gene_countedreads=read.table("SN_HTSeq_gene-level_countedreads.txt", header = TRUE)
HTSeq_gene_total=read.table("SN_HTSeq_gene-level_totalreads.txt", header = TRUE)
ratio = (HTSeq_gene_countedreads$Number / HTSeq_gene_total$Number) * 100
barplot(ratio, main="HTSeq gene-level: counted reads [of STAR output]", ylim=c(0,80),ylab = "percentage (%)",names.arg= as.matrix(HTSeq_gene_countedreads$Short_name),cex.names=0.35, las=2, col = as.vector(HTSeq_gene_countedreads$Colour))
Below code calculates the number of reads counted by HTSeq that were uniquely mapped in STAR:
HTSeq_gene_countedreads=read.table("SN_HTSeq_gene-level_countedreads.txt", header = TRUE)
HTSeq_gene_total=read.table("SN_HTSeq_gene-level_totalreads.txt", header = TRUE)
HTSeq_gene_notunique=read.table("SN_HTSeq_gene-level_alignment_not_unique.txt", header = TRUE)
ratio = (HTSeq_gene_countedreads$Number / (HTSeq_gene_total$Number - HTSeq_gene_notunique$Number)) * 100
barplot(ratio, main="HTSeq gene-level: counted reads [of STAR uniquely mapped reads]", ylim=c(0,100), ylab = "percentage (%)",names.arg= as.matrix(HTSeq_gene_countedreads$Short_name),cex.names=0.35, las=2, col = as.vector(HTSeq_gene_countedreads$Colour))
Below code calculates the reads without feature that were uniquely mapped in STAR:
HTSeq_gene_nofeature=read.table("SN_HTSeq_gene-level_nofeature.txt", header = TRUE)
HTSeq_gene_total=read.table("SN_HTSeq_gene-level_totalreads.txt", header = TRUE)
HTSeq_gene_notunique=read.table("SN_HTSeq_gene-level_alignment_not_unique.txt", header = TRUE)
ratio = (HTSeq_gene_nofeature$Number / (HTSeq_gene_total$Number - HTSeq_gene_notunique$Number)) * 100
barplot(ratio, main="HTSeq gene-level: reads without feature [of STAR uniquely mapped reads]", ylim=c(0,30), ylab = "percentage (%)",names.arg= as.matrix(HTSeq_gene_countedreads$Short_name),cex.names=0.35, las=2, col = as.vector(HTSeq_gene_countedreads$Colour))
Below code calculates the ambiguous reads that were uniquely mapped in STAR:
HTSeq_gene_ambiguous=read.table("SN_HTSeq_gene-level_ambiguous.txt", header = TRUE)
HTSeq_gene_total=read.table("SN_HTSeq_gene-level_totalreads.txt", header = TRUE)
HTSeq_gene_notunique=read.table("SN_HTSeq_gene-level_alignment_not_unique.txt", header = TRUE)
ratio = (HTSeq_gene_ambiguous$Number / (HTSeq_gene_total$Number - HTSeq_gene_notunique$Number)) * 100
barplot(ratio, main="HTSeq gene-level: ambiguous reads [of STAR uniquely mapped reads]", ylim=c(0,1.2), ylab = "percentage (%)",names.arg= as.matrix(HTSeq_gene_countedreads$Short_name),cex.names=0.35, las=2, col = as.vector(HTSeq_gene_countedreads$Colour))
Finally, I combined all the count data into one file which will be used as input in R:
paste HTSeq_output/*STDOUT | cut -f 1,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126,128,130,132,134,136,138,140,142,144 > Skmarinoi8x3_reanalysis_ref1.1.2_gene-level_counts.txt
cat short_name_header.txt Skmarinoi8x3_reanalysis_ref1.1.2_gene-level_counts.txt > Skmarinoi8x3_reanalysis_ref1.1.2_gene-level_counts_FINAL.txt
#short_name_header.txt contains the short identifiers of all the samples in the order of the STDOUT files
In a first step, extract the protein and transcript files from the gff3 of the S. marinoi genome. I will do this using cufflinks.
# extract the protein and transcript files from the Maker gff3
/share/apps/bioinformatics/cufflinks/cufflinks-2.2.1.Linux_x86_64/gffread \
Sm_ManualCuration.v1.1.2_nuclear.gff \
-g Skeletonema_marinoi_Ref_v1.1.2_nuclear.fst \
-y Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins.fasta
This command creates a file that contains the protein translations of all the CDS regions. To be sure that this file matches the gff, I just quickly checked whether the number of proteins equals the number of genes. I also removed all the dots from the sequences because these will give problems later if included:
# calculate number of proteins by grepping for ">"
grep ">" Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins.fasta | wc -l
# result = 17203
# calculate number of genes + mRNA by grepping for "Sm_g" (this pattern is unique for lines with genes or mRNA)
grep "Sm_g" Sm_ManualCuration.v1.1.2_nuclear.gff | wc -l
# result = 17203
# remove dots from the input file
sed 's/\.//1g' Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins.fasta > Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot.fasta
I used the Swissprot verison of June 1, 2020, which I already had available and which I used for Ref 1.1:
# load blast/2.13.0+
module load blast
# run blastp
blastp -db June1_2020/swissprot_db \
-query Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot.fasta \
-out Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins.sprot.blastp.out \
-evalue 1e-6 \
-outfmt 6 \
-num_alignments 1 \
-seg yes \
-soft_masking true \
-lcase_masking \
-max_hsps 1 \
-num_threads 8 \
> stdout.txt 2> stderror.txt
I then also ran Swissprot on the latest version of Swissprot - downloaded on September 28th 2022:
module load blast
# make blastp database using the swissprot.fasta as input file
makeblastdb -in swissprot_Sep28_2022.fasta -dbtype prot -out swissprot_db -title swissprot_db
# run blastp
blastp -db September28_2022/swissprot_db \
-query Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot.fasta \
-out Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins.sprot.blastp.Sep2022.out \
-evalue 1e-6 \
-outfmt 6 \
-num_alignments 1 \
-seg yes \
-soft_masking true \
-lcase_masking \
-max_hsps 1 \
-num_threads 8 \
> stdout.Sep2022.txt 2> stderror.Sep2022.txt
I will use the local copy of uniprot, downloaded by Wade in July 2019 (/home/wader/databases/ref-proteomes/).
The Uniprot data contains two databases: a diamond database (uniprot_ref_proteomes.dmnd) and a NCBI blast database (.phr; .pin .psq). I will use dimond (much faster than a standard blast search, although is less accurate than true blast):
# run diamond blast on Uniprot [diamond/2.0.1]
module load diamond
diamond blastp --db /home/wader/databases/ref-proteomes-2020/uniprot_ref_proteomes.dmnd \
--query Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot.fasta \
--outfmt 6 \
--evalue 1e-6 \
--max-target-seqs 1 \
--sensitive \
--max-hsps 1 \
--out Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins.uniprot.diamond.out \
--threads 16
# select uniprot IDs from output file
cut -f2 Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins.uniprot.diamond.out > Smarinoi_Ref1.1.2_uniprot_IDs.txt
Newer versions of uniprot can be downloaded here.
The output is a list of protein IDs. More information associated with these protein IDs can be retrieved by inputting the list on the Uniprot website: use the list option.
For KEGG, first remove the space in the header of the protein fasta:
#remove space in headers
sed 's/\s/_/1g' Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot.fasta > Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot_header.fasta
KofamKOALA only allows to submit 5,000 genes at the time. Since S. marinoi has ~17,000 genes in its genome, I will need to split the fasta file of the protein sequences into 4 subsets. To do this, I used the seqkit toolkit, after locally installing it on razor:
#sequences 1-5,000
seqkit head -n 5000 Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot_header.fasta > Skmarinoi_Ref1.1.2_proteins_1-5000.fasta
#sequences 5,001-10,000
seqkit range -r 5001:10000 Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot_header.fasta > Skmarinoi_Ref1.1.2_proteins_5001-10000.fasta
#sequences 10,001-15,000
seqkit range -r 10001:15000 Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot_header.fasta > Skmarinoi_Ref1.1.2_proteins_10001-15000.fasta
#sequences 15,001-17,203
seqkit range -r 15001:20000 Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot_header.fasta > Skmarinoi_Ref1.1.2_proteins_15001-17203.fasta
The output files of above code are separately submitted to the KofamKOALA tool on the koala webserver. The E-value parameter is set to 0.01 (default value). Results are returned via mail. I used the KofamKOALA version of 2022-08-01 (KEGG release 103.0).
Combine the five resulting output files into a single file:
#remove headers of all files, except the first
grep -v "#" Skmarinoi8x3_Ref1.1.2_genes5001-10000_KEGGresults.txt > Skmarinoi8x3_Ref1.1.2_genes5001-10000_KEGGresults_nohead.txt
grep -v "#" Skmarinoi8x3_Ref1.1.2_genes10001-15000_KEGGresults.txt > Skmarinoi8x3_Ref1.1.2_genes10001-15000_KEGGresults_nohead.txt
grep -v "#" Skmarinoi8x3_Ref1.1.2_genes15001-17203_KEGGresults.txt > Skmarinoi8x3_Ref1.1.2_genes15001-17203_KEGGresults_nohead.txt
#combine output files
cat Skmarinoi8x3_Ref1.1.2_genes1-5000_KEGGresults.txt Skmarinoi8x3_Ref1.1.2_genes5001-10000_KEGGresults_nohead.txt Skmarinoi8x3_Ref1.1.2_genes10001-15000_KEGGresults_nohead.txt Skmarinoi8x3_Ref1.1.2_genes15001-17203_KEGGresults_nohead.txt > Skmarinoi_Ref1.1.2_KEGG_full_output_kofamKOALA_2022Sep28.txt
#remove redundant files
rm *nohead.txt
How many genes received KEGG annotations?
grep -v "#" Skmarinoi_Ref1.1.2_KEGG_full_output_kofamKOALA_2022Sep28.txt | wc -l
#3638
This is way less than in the Ref1.1 genome: possibly sections that were considered to be different genes in Ref1.1 were combined into one gene in Ref1.1.2?
At last, let’s reduce the file to only include columns relevant for combining all data into one database (see below):
# get genes and KEGG numbers
sed 's/* //' Skmarinoi_Ref1.1.2_KEGG_full_output_kofamKOALA_2022Sep28.txt | sed 's/# //' | sed 's/#//' | sed '1d' | sed '1d' | sed 's/ /\t/' | sed 's/ /_/g' | sed 's/_[0-9]/\t/' | cut -f 1,2 | sed 's/_//g' | sed 's/gene=/\t/' | cut -f 1,3 | sed 's/Smt/Sm_t/' > KEGG_genes.txt
# get KEGG info
sed 's/* //' Skmarinoi_Ref1.1.2_KEGG_full_output_kofamKOALA_2022Sep28.txt | sed 's/# //' | sed 's/#//' | sed '1d' | sed '1d' | sed 's/ /\t/' | sed 's/ /_/g' | sed 's/[0-9]_/\t/' | cut -f 3 | sed 's/[0-9]_/\t/' | cut -f 2 | sed 's/[0-9]_/\t/' | cut -f 2 | sed 's/[0-9]_/\t/' | cut -f 2 > KEGG_info.txt
# combine both files
paste KEGG_genes.txt KEGG_info.txt > Skmarinoi_Ref1.1.2_KEGG_full_output_kofamKOALA_2022Sep28_reduced.txt
Wade ran InterProScan for me. Probably using the following code:
# load the required java version
module load java/openjdk_14.0.1
# run InterProScan (using PBS script on scratch)
interproscan-3.44-79.0/interproscan.sh \
-appl Pfam,PRINTS,PANTHER,SMART,SignalP_EUK,TMHMM \
-iprlookup \
-goterms \
-cpu 8 \
-f tsv \
-i Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot.fasta \
2> InterProScan.stderror
Output file: Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot.fasta.tsv.NEW
I want to know which gene IDs of Ref v1.1 correspond with those of Ref v1.1.2:
# load blast/2.13.0+
module load blast
# run blastp
blastp -db /functional_annotation/blast_Smarinoi/blast_db/S.marinoi_db \
-query ../Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot.fasta \
-out Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins.sprot.ref1.1.blastp.out \
-evalue 1e-6 \
-outfmt 6 \
-num_alignments 1 \
-seg yes \
-soft_masking true \
-lcase_masking \
-max_hsps 1 \
-num_threads 8 \
> stdout.txt 2> stderror.txt
But because there are many more genes in Ref v1.1, I also want to know which genes of Ref v1.1.2 correspond with those of Ref v1.1:
# make blastp database using the swissprot.fasta as input file
makeblastdb -in Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot.fasta -dbtype prot -out S.marinoi_ref1.1.2_db -title S.marinoi_db_ref1.1.2
# run blastp
blastp -db /functional_annotation/blast_db/S.marinoi_ref1.1.2_db \
-query Skeletonema_marinoi_Ref_v1.1_Primary.OnemRNAPerGene.proteins_shortproteinremoved2rmdot.fasta \
-out Skeletonema_marinoi_Ref_v1.1_nuclear.proteins.sprot.ref1.1.2.blastp.out \
-evalue 1e-6 \
-outfmt 6 \
-num_alignments 1 \
-seg yes \
-soft_masking true \
-lcase_masking \
-max_hsps 1 \
-num_threads 8 \
> stdout.txt 2> stderror.txt
Now let’s add the InterProScan matches to the GFF of S. marinoi:
# load required modules
module load perl/3.24.0
module load exonerate/2.4.0
module load maker
# add InterProScan info to the GFF
ipr_update_gff Sm_ManualCuration.v1.1.2_nuclear.gff \
Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot.fasta.tsv.NEW > \
Sm_ManualCuration.v1.1.2_nuclear.functional_ipr.gff
However, there is an issue with the resulting gff file. If no annotations were present, a whole list of GO terms is added to a gene. We need to remove this:
# get a list of gene IDs in the whole genome
grep 'geneID' Sm_ManualCuration.v1.1.2_nuclear.gff | cut -f 9 | sed 's/=/\t/g' | sed 's/;geneID/\t/g' | cut -f 2 > Smarinoi_geneIDs_all.txt
wc -l Smarinoi_geneIDs_all.txt ##17203
# get a list of genes with annotations
cut -f 1 Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot.fasta.tsv.NEW | uniq > Smarinoi_geneIDs_genes-with-annotations.txt
wc -l Smarinoi_geneIDs_genes-with-annotations.txt ##12358
# get a list of genes without annotations
grep -v -f Smarinoi_geneIDs_genes-with-annotations.txt Smarinoi_geneIDs_all.txt > Smarinoi_geneIDS_genes-without-annotations.txt
wc -l Smarinoi_geneIDS_genes-without-annotations.txt ##4845
# reduce gff to lines with only gene IDs
grep 'geneID' Sm_ManualCuration.v1.1.2_nuclear.functional_ipr.gff > Sm_ManualCuration.v1.1.2_nuclear.functional_ipr_genes.gff
wc -l Smarinoi_geneIDS_genes-without-annotations.gff ##17203
# grep for the massive list of GO terms added to genes without annotations (e.g. number 9)
grep Sm_g00000009 Sm_ManualCuration.v1.1.2_nuclear.functional_ipr_genes.gff | sed 's/Ontology_term=/\t/' | cut -f 10 > test
# check whether the strange string of GO term count corresponds with the number of genes without GO terms
grep -f test Sm_ManualCuration.v1.1.2_nuclear.functional_ipr_genes.gff | wc -l ##10711
## doesn't fit: probably because some genes with annotations also don't have GO terms
# remove the massive list of GO terms from the gff
sed 's/Ontology_term=GO:0000009,GO:0000015,GO:0000027,GO:0000030,GO:0000045,GO:0000049,GO:0000055,GO:0000056,GO:0000062,GO:0000070,GO:0000077,GO:0000079,GO:0000105,GO:0000123,GO:0000124,GO:0000126,GO:0000139,GO:0000145,GO:0000148,GO:0000151,GO:0000154,GO:0000155,GO:0000159,GO:0000160,GO:0000164,GO:0000166,GO:0000172,GO:0000178,GO:0000179,GO:0000184,GO:0000213,GO:0000221,GO:0000225,GO:0000226,GO:0000228,GO:0000244,GO:0000245,GO:0000256,GO:0000275,GO:0000276,GO:0000278,GO:0000287,GO:0000289,GO:0000290,GO:0000338,GO:0000339,GO:0000340,GO:0000347,GO:0000350,GO:0000386,GO:0000387,GO:0000398,GO:0000408,GO:0000413,GO:0000422,GO:0000439,GO:0000462,GO:0000469,GO:0000470,GO:0000493,GO:0000502,GO:0000723,GO:0000724,GO:0000774,GO:0000776,GO:0000784,GO:0000786,GO:0000796,GO:0000808,GO:0000811,GO:0000812,GO:0000813,GO:0000814,GO:0000829,GO:0000906,GO:0000922,GO:0000930,GO:0000938,GO:0000956,GO:0000974,GO:0000981,GO:0000995,GO:0001164,GO:0001188,GO:0001510,GO:0001522,GO:0001671,GO:0001682,GO:0001731,GO:0001735,GO:0001882,GO:0002098,GO:0002100,GO:0002161,GO:0002943,GO:0002949,GO:0002953,GO:0003333,GO:0003341,GO:0003676,GO:0003677,GO:0003678,GO:0003682,GO:0003684,GO:0003688,GO:0003689,GO:0003690,GO:0003697,GO:0003700,GO:0003712,GO:0003713,GO:0003714,GO:0003721,GO:0003723,GO:0003724,GO:0003725,GO:0003729,GO:0003735,GO:0003743,GO:0003746,GO:0003747,GO:0003755,GO:0003756,GO:0003774,GO:0003777,GO:0003779,GO:0003824,GO:0003826,GO:0003830,GO:0003839,GO:0003843,GO:0003847,GO:0003848,GO:0003849,GO:0003852,GO:0003854,GO:0003855,GO:0003857,GO:0003860,GO:0003862,GO:0003863,GO:0003864,GO:0003868,GO:0003872,GO:0003873,GO:0003876,GO:0003879,GO:0003883,GO:0003884,GO:0003885,GO:0003887,GO:0003896,GO:0003899,GO:0003906,GO:0003910,GO:0003911,GO:0003916,GO:0003917,GO:0003918,GO:0003922,GO:0003923,GO:0003924,GO:0003934,GO:0003937,GO:0003951,GO:0003952,GO:0003954,GO:0003964,GO:0003975,GO:0003979,GO:0003980,GO:0003989,GO:0003993,GO:0003994,GO:0003995,GO:0003997,GO:0003998,GO:0004000,GO:0004001,GO:0004014,GO:0004017,GO:0004019,GO:0004037,GO:0004040,GO:0004042,GO:0004045,GO:0004055,GO:0004056,GO:0004057,GO:0004066,GO:0004070,GO:0004071,GO:0004076,GO:0004089,GO:0004096,GO:0004106,GO:0004107,GO:0004109,GO:0004111,GO:0004112,GO:0004114,GO:0004129,GO:0004140,GO:0004141,GO:0004143,GO:0004144,GO:0004146,GO:0004151,GO:0004161,GO:0004164,GO:0004170,GO:0004174,GO:0004175,GO:0004176,GO:0004177,GO:0004181,GO:0004185,GO:0004190,GO:0004197,GO:0004198,GO:0004222,GO:0004252,GO:0004298,GO:0004315,GO:0004325,GO:0004326,GO:0004329,GO:0004332,GO:0004333,GO:0004334,GO:0004335,GO:0004337,GO:0004340,GO:0004345,GO:0004347,GO:0004348,GO:0004351,GO:0004356,GO:0004358,GO:0004359,GO:0004360,GO:0004362,GO:0004363,GO:0004364,GO:0004366,GO:0004367,GO:0004368,GO:0004371,GO:0004372,GO:0004375,GO:0004378,GO:0004379,GO:0004386,GO:0004392,GO:0004399,GO:0004402,GO:0004407,GO:0004408,GO:0004411,GO:0004418,GO:0004420,GO:0004421,GO:0004424,GO:0004425,GO:0004427,GO:0004435,GO:0004450,GO:0004451,GO:0004455,GO:0004470,GO:0004471,GO:0004474,GO:0004476,GO:0004482,GO:0004483,GO:0004484,GO:0004488,GO:0004489,GO:0004491,GO:0004497,GO:0004499,GO:0004512,GO:0004514,GO:0004517,GO:0004518,GO:0004519,GO:0004521,GO:0004523,GO:0004525,GO:0004527,GO:0004535,GO:0004540,GO:0004550,GO:0004553,GO:0004557,GO:0004559,GO:0004563,GO:0004565,GO:0004568,GO:0004571,GO:0004576,GO:0004584,GO:0004587,GO:0004590,GO:0004591,GO:0004592,GO:0004594,GO:0004595,GO:0004597,GO:0004601,GO:0004602,GO:0004605,GO:0004609,GO:0004612,GO:0004615,GO:0004616,GO:0004618,GO:0004619,GO:0004634,GO:0004635,GO:0004637,GO:0004640,GO:0004641,GO:0004643,GO:0004645,GO:0004651,GO:0004652,GO:0004654,GO:0004655,GO:0004657,GO:0004664,GO:0004665,GO:0004668,GO:0004671,GO:0004672,GO:0004674,GO:0004683,GO:0004712,GO:0004713,GO:0004719,GO:0004721,GO:0004722,GO:0004731,GO:0004739,GO:0004743,GO:0004748,GO:0004749,GO:0004751,GO:0004764,GO:0004781,GO:0004784,GO:0004788,GO:0004799,GO:0004801,GO:0004803,GO:0004806,GO:0004807,GO:0004809,GO:0004812,GO:0004813,GO:0004814,GO:0004815,GO:0004820,GO:0004822,GO:0004823,GO:0004824,GO:0004825,GO:0004826,GO:0004827,GO:0004828,GO:0004829,GO:0004830,GO:0004832,GO:0004834,GO:0004842,GO:0004843,GO:0004844,GO:0004852,GO:0004853,GO:0004864,GO:0004865,GO:0004867,GO:0004888,GO:0004930,GO:0004965,GO:0005047,GO:0005049,GO:0005085,GO:0005086,GO:0005092,GO:0005096,GO:0005198,GO:0005200,GO:0005216,GO:0005230,GO:0005242,GO:0005246,GO:0005247,GO:0005249,GO:0005262,GO:0005267,GO:0005315,GO:0005319,GO:0005337,GO:0005375,GO:0005381,GO:0005384,GO:0005452,GO:0005457,GO:0005471,GO:0005484,GO:0005506,GO:0005507,GO:0005509,GO:0005515,GO:0005516,GO:0005524,GO:0005525,GO:0005534,GO:0005536,GO:0005542,GO:0005544,GO:0005576,GO:0005615,GO:0005634,GO:0005643,GO:0005655,GO:0005663,GO:0005665,GO:0005666,GO:0005667,GO:0005669,GO:0005673,GO:0005674,GO:0005675,GO:0005680,GO:0005681,GO:0005685,GO:0005694,GO:0005730,GO:0005732,GO:0005737,GO:0005739,GO:0005740,GO:0005741,GO:0005743,GO:0005744,GO:0005747,GO:0005750,GO:0005751,GO:0005758,GO:0005759,GO:0005761,GO:0005764,GO:0005765,GO:0005777,GO:0005778,GO:0005779,GO:0005783,GO:0005784,GO:0005786,GO:0005787,GO:0005788,GO:0005789,GO:0005794,GO:0005801,GO:0005815,GO:0005829,GO:0005832,GO:0005838,GO:0005839,GO:0005840,GO:0005846,GO:0005847,GO:0005852,GO:0005854,GO:0005858,GO:0005869,GO:0005871,GO:0005874,GO:0005886,GO:0005887,GO:0005929,GO:0005930,GO:0005956,GO:0005960,GO:0005965,GO:0005968,GO:0005971,GO:0005975,GO:0005992,GO:0005996,GO:0006000,GO:0006002,GO:0006003,GO:0006006,GO:0006012,GO:0006013,GO:0006021,GO:0006030,GO:0006032,GO:0006071,GO:0006072,GO:0006075,GO:0006078,GO:0006086,GO:0006090,GO:0006094,GO:0006096,GO:0006097,GO:0006098,GO:0006099,GO:0006106,GO:0006108,GO:0006120,GO:0006122,GO:0006139,GO:0006164,GO:0006165,GO:0006166,GO:0006177,GO:0006183,GO:0006189,GO:0006190,GO:0006206,GO:0006207,GO:0006213,GO:0006221,GO:0006226,GO:0006228,GO:0006231,GO:0006241,GO:0006259,GO:0006260,GO:0006261,GO:0006265,GO:0006269,GO:0006270,GO:0006275,GO:0006281,GO:0006283,GO:0006284,GO:0006289,GO:0006298,GO:0006301,GO:0006302,GO:0006303,GO:0006306,GO:0006307,GO:0006310,GO:0006313,GO:0006325,GO:0006333,GO:0006334,GO:0006338,GO:0006348,GO:0006351,GO:0006352,GO:0006355,GO:0006357,GO:0006360,GO:0006364,GO:0006366,GO:0006367,GO:0006368,GO:0006370,GO:0006376,GO:0006378,GO:0006379,GO:0006383,GO:0006384,GO:0006388,GO:0006396,GO:0006397,GO:0006400,GO:0006401,GO:0006402,GO:0006406,GO:0006412,GO:0006413,GO:0006414,GO:0006415,GO:0006417,GO:0006418,GO:0006419,GO:0006420,GO:0006422,GO:0006426,GO:0006428,GO:0006429,GO:0006430,GO:0006431,GO:0006432,GO:0006433,GO:0006434,GO:0006435,GO:0006436,GO:0006438,GO:0006452,GO:0006457,GO:0006464,GO:0006465,GO:0006468,GO:0006470,GO:0006478,GO:0006479,GO:0006480,GO:0006481,GO:0006486,GO:0006487,GO:0006488,GO:0006490,GO:0006491,GO:0006493,GO:0006499,GO:0006506,GO:0006508,GO:0006511,GO:0006513,GO:0006520,GO:0006525,GO:0006526,GO:0006529,GO:0006535,GO:0006536,GO:0006537,GO:0006542,GO:0006545,GO:0006546,GO:0006555,GO:0006559,GO:0006562,GO:0006568,GO:0006570,GO:0006571,GO:0006596,GO:0006597,GO:0006605,GO:0006606,GO:0006614,GO:0006621,GO:0006623,GO:0006625,GO:0006627,GO:0006629,GO:0006631,GO:0006633,GO:0006635,GO:0006644,GO:0006650,GO:0006656,GO:0006661,GO:0006665,GO:0006694,GO:0006725,GO:0006729,GO:0006741,GO:0006744,GO:0006749,GO:0006750,GO:0006751,GO:0006777,GO:0006779,GO:0006780,GO:0006783,GO:0006784,GO:0006788,GO:0006796,GO:0006801,GO:0006807,GO:0006809,GO:0006811,GO:0006812,GO:0006813,GO:0006814,GO:0006817,GO:0006820,GO:0006821,GO:0006825,GO:0006850,GO:0006862,GO:0006869,GO:0006885,GO:0006886,GO:0006887,GO:0006888,GO:0006890,GO:0006891,GO:0006897,GO:0006904,GO:0006913,GO:0006914,GO:0006952,GO:0006974,GO:0006979,GO:0007005,GO:0007010,GO:0007015,GO:0007017,GO:0007018,GO:0007020,GO:0007021,GO:0007023,GO:0007030,GO:0007031,GO:0007033,GO:0007034,GO:0007062,GO:0007064,GO:0007076,GO:0007093,GO:0007094,GO:0007095,GO:0007131,GO:0007155,GO:0007165,GO:0007186,GO:0007205,GO:0007219,GO:0007224,GO:0007264,GO:0008017,GO:0008022,GO:0008033,GO:0008047,GO:0008061,GO:0008076,GO:0008080,GO:0008081,GO:0008097,GO:0008113,GO:0008121,GO:0008124,GO:0008131,GO:0008134,GO:0008137,GO:0008138,GO:0008146,GO:0008168,GO:0008171,GO:0008173,GO:0008175,GO:0008176,GO:0008180,GO:0008198,GO:0008199,GO:0008233,GO:0008234,GO:0008235,GO:0008236,GO:0008237,GO:0008251,GO:0008270,GO:0008272,GO:0008278,GO:0008289,GO:0008290,GO:0008295,GO:0008299,GO:0008308,GO:0008312,GO:0008318,GO:0008320,GO:0008324,GO:0008353,GO:0008374,GO:0008375,GO:0008380,GO:0008408,GO:0008409,GO:0008410,GO:0008413,GO:0008417,GO:0008418,GO:0008420,GO:0008444,GO:0008452,GO:0008476,GO:0008478,GO:0008483,GO:0008484,GO:0008495,GO:0008519,GO:0008521,GO:0008531,GO:0008534,GO:0008536,GO:0008537,GO:0008540,GO:0008541,GO:0008559,GO:0008609,GO:0008610,GO:0008612,GO:0008616,GO:0008622,GO:0008641,GO:0008649,GO:0008652,GO:0008654,GO:0008661,GO:0008685,GO:0008686,GO:0008703,GO:0008705,GO:0008734,GO:0008757,GO:0008762,GO:0008767,GO:0008804,GO:0008810,GO:0008818,GO:0008824,GO:0008836,GO:0008837,GO:0008839,GO:0008864,GO:0008883,GO:0008887,GO:0008897,GO:0008914,GO:0008929,GO:0008935,GO:0008939,GO:0008942,GO:0008963,GO:0008964,GO:0008970,GO:0008974,GO:0008977,GO:0008986,GO:0008987,GO:0008990,GO:0009001,GO:0009008,GO:0009039,GO:0009052,GO:0009055,GO:0009058,GO:0009072,GO:0009073,GO:0009082,GO:0009083,GO:0009086,GO:0009089,GO:0009094,GO:0009098,GO:0009102,GO:0009107,GO:0009113,GO:0009116,GO:0009117,GO:0009143,GO:0009165,GO:0009166,GO:0009190,GO:0009228,GO:0009229,GO:0009231,GO:0009234,GO:0009235,GO:0009236,GO:0009247,GO:0009263,GO:0009298,GO:0009308,GO:0009312,GO:0009330,GO:0009331,GO:0009341,GO:0009349,GO:0009376,GO:0009396,GO:0009416,GO:0009435,GO:0009439,GO:0009443,GO:0009446,GO:0009451,GO:0009452,GO:0009496,GO:0009507,GO:0009523,GO:0009535,GO:0009584,GO:0009611,GO:0009642,GO:0009644,GO:0009654,GO:0009678,GO:0009765,GO:0009773,GO:0009877,GO:0009916,GO:0009966,GO:0009976,GO:0009982,GO:0010024,GO:0010038,GO:0010181,GO:0010207,GO:0010212,GO:0010242,GO:0010265,GO:0010277,GO:0010309,GO:0010389,GO:0010390,GO:0010468,GO:0010485,GO:0010756,GO:0010997,GO:0015031,GO:0015035,GO:0015074,GO:0015075,GO:0015078,GO:0015095,GO:0015097,GO:0015098,GO:0015109,GO:0015114,GO:0015116,GO:0015144,GO:0015165,GO:0015204,GO:0015267,GO:0015276,GO:0015297,GO:0015299,GO:0015321,GO:0015385,GO:0015629,GO:0015689,GO:0015693,GO:0015694,GO:0015696,GO:0015703,GO:0015708,GO:0015914,GO:0015930,GO:0015934,GO:0015935,GO:0015936,GO:0015937,GO:0015940,GO:0015969,GO:0015977,GO:0015979,GO:0015986,GO:0015995,GO:0016020,GO:0016021,GO:0016035,GO:0016036,GO:0016042,GO:0016051,GO:0016070,GO:0016114,GO:0016151,GO:0016192,GO:0016209,GO:0016226,GO:0016255,GO:0016272,GO:0016279,GO:0016301,GO:0016307,GO:0016310,GO:0016311,GO:0016316,GO:0016409,GO:0016422,GO:0016428,GO:0016429,GO:0016435,GO:0016459,GO:0016462,GO:0016471,GO:0016480,GO:0016485,GO:0016491,GO:0016504,GO:0016531,GO:0016538,GO:0016559,GO:0016560,GO:0016567,GO:0016570,GO:0016571,GO:0016572,GO:0016573,GO:0016575,GO:0016578,GO:0016579,GO:0016586,GO:0016592,GO:0016593,GO:0016597,GO:0016598,GO:0016603,GO:0016614,GO:0016615,GO:0016616,GO:0016620,GO:0016624,GO:0016627,GO:0016630,GO:0016636,GO:0016638,GO:0016651,GO:0016661,GO:0016670,GO:0016671,GO:0016679,GO:0016701,GO:0016702,GO:0016705,GO:0016706,GO:0016714,GO:0016715,GO:0016717,GO:0016730,GO:0016740,GO:0016742,GO:0016743,GO:0016746,GO:0016747,GO:0016756,GO:0016757,GO:0016758,GO:0016763,GO:0016765,GO:0016772,GO:0016773,GO:0016779,GO:0016780,GO:0016783,GO:0016785,GO:0016787,GO:0016788,GO:0016791,GO:0016798,GO:0016799,GO:0016805,GO:0016810,GO:0016811,GO:0016817,GO:0016818,GO:0016829,GO:0016831,GO:0016832,GO:0016836,GO:0016844,GO:0016846,GO:0016849,GO:0016851,GO:0016852,GO:0016853,GO:0016857,GO:0016866,GO:0016868,GO:0016872,GO:0016874,GO:0016884,GO:0016887,GO:0016889,GO:0016899,GO:0016903,GO:0016925,GO:0016971,GO:0016972,GO:0016973,GO:0016987,GO:0016992,GO:0016998,GO:0017004,GO:0017009,GO:0017025,GO:0017038,GO:0017056,GO:0017070,GO:0017108,GO:0017112,GO:0017116,GO:0017119,GO:0017121,GO:0017128,GO:0017137,GO:0017150,GO:0017176,GO:0017183,GO:0017186,GO:0017196,GO:0018024,GO:0018025,GO:0018193,GO:0018298,GO:0018342,GO:0018343,GO:0018344,GO:0018580,GO:0019001,GO:0019005,GO:0019008,GO:0019079,GO:0019205,GO:0019211,GO:0019237,GO:0019239,GO:0019242,GO:0019264,GO:0019288,GO:0019310,GO:0019346,GO:0019432,GO:0019464,GO:0019509,GO:0019538,GO:0019722,GO:0019752,GO:0019773,GO:0019774,GO:0019781,GO:0019789,GO:0019825,GO:0019843,GO:0019856,GO:0019867,GO:0019887,GO:0019888,GO:0019894,GO:0019898,GO:0019901,GO:0019903,GO:0019904,GO:0019915,GO:0019948,GO:0019985,GO:0019988,GO:0020037,GO:0022625,GO:0022857,GO:0022900,GO:0022904,GO:0030001,GO:0030008,GO:0030014,GO:0030015,GO:0030026,GO:0030036,GO:0030042,GO:0030058,GO:0030071,GO:0030091,GO:0030117,GO:0030123,GO:0030126,GO:0030127,GO:0030130,GO:0030131,GO:0030132,GO:0030145,GO:0030150,GO:0030151,GO:0030163,GO:0030170,GO:0030171,GO:0030173,GO:0030176,GO:0030234,GO:0030242,GO:0030246,GO:0030259,GO:0030261,GO:0030286,GO:0030328,GO:0030332,GO:0030337,GO:0030433,GO:0030488,GO:0030515,GO:0030532,GO:0030604,GO:0030623,GO:0030628,GO:0030677,GO:0030686,GO:0030688,GO:0030870,GO:0030880,GO:0030896,GO:0030906,GO:0030915,GO:0030942,GO:0030955,GO:0030975,GO:0030976,GO:0030983,GO:0030992,GO:0031011,GO:0031071,GO:0031083,GO:0031122,GO:0031124,GO:0031145,GO:0031146,GO:0031151,GO:0031167,GO:0031177,GO:0031201,GO:0031204,GO:0031207,GO:0031251,GO:0031262,GO:0031297,GO:0031369,GO:0031390,GO:0031417,GO:0031418,GO:0031419,GO:0031422,GO:0031491,GO:0031514,GO:0031515,GO:0031571,GO:0031588,GO:0031625,GO:0031683,GO:0031902,GO:0031929,GO:0031931,GO:0031932,GO:0032006,GO:0032007,GO:0032008,GO:0032012,GO:0032039,GO:0032040,GO:0032049,GO:0032259,GO:0032264,GO:0032299,GO:0032300,GO:0032366,GO:0032456,GO:0032469,GO:0032509,GO:0032515,GO:0032549,GO:0032574,GO:0032777,GO:0032784,GO:0032786,GO:0032957,GO:0032963,GO:0032968,GO:0032977,GO:0032981,GO:0033014,GO:0033063,GO:0033177,GO:0033178,GO:0033179,GO:0033180,GO:0033384,GO:0033539,GO:0033567,GO:0033573,GO:0033588,GO:0033617,GO:0033674,GO:0033743,GO:0033897,GO:0034066,GO:0034128,GO:0034198,GO:0034219,GO:0034220,GO:0034450,GO:0034457,GO:0034474,GO:0034477,GO:0034511,GO:0034553,GO:0034729,GO:0034755,GO:0035082,GO:0035091,GO:0035098,GO:0035101,GO:0035194,GO:0035246,GO:0035267,GO:0035299,GO:0035312,GO:0035368,GO:0035434,GO:0035435,GO:0035494,GO:0035515,GO:0035516,GO:0035522,GO:0035552,GO:0035553,GO:0035556,GO:0035591,GO:0035596,GO:0035999,GO:0036085,GO:0036159,GO:0036265,GO:0036297,GO:0036310,GO:0036361,GO:0036374,GO:0036402,GO:0036459,GO:0036524,GO:0040014,GO:0042023,GO:0042026,GO:0042073,GO:0042128,GO:0042132,GO:0042147,GO:0042162,GO:0042176,GO:0042242,GO:0042245,GO:0042254,GO:0042256,GO:0042264,GO:0042273,GO:0042274,GO:0042281,GO:0042283,GO:0042373,GO:0042450,GO:0042549,GO:0042555,GO:0042558,GO:0042578,GO:0042597,GO:0042623,GO:0042626,GO:0042651,GO:0042719,GO:0042720,GO:0042721,GO:0042765,GO:0042803,GO:0042819,GO:0042823,GO:0042908,GO:0042910,GO:0043015,GO:0043022,GO:0043023,GO:0043039,GO:0043043,GO:0043044,GO:0043047,GO:0043066,GO:0043085,GO:0043087,GO:0043130,GO:0043138,GO:0043139,GO:0043154,GO:0043161,GO:0043190,GO:0043231,GO:0043240,GO:0043248,GO:0043399,GO:0043419,GO:0043461,GO:0043486,GO:0043531,GO:0043547,GO:0043564,GO:0043565,GO:0043622,GO:0043625,GO:0043631,GO:0043666,GO:0043752,GO:0043967,GO:0043968,GO:0043998,GO:0044237,GO:0044238,GO:0044341,GO:0044458,GO:0044571,GO:0044666,GO:0044877,GO:0045038,GO:0045039,GO:0045047,GO:0045048,GO:0045116,GO:0045131,GO:0045239,GO:0045261,GO:0045292,GO:0045300,GO:0045337,GO:0045454,GO:0045737,GO:0045859,GO:0045892,GO:0045893,GO:0045900,GO:0045901,GO:0045905,GO:0045910,GO:0046034,GO:0046081,GO:0046168,GO:0046314,GO:0046406,GO:0046416,GO:0046422,GO:0046429,GO:0046488,GO:0046540,GO:0046654,GO:0046677,GO:0046695,GO:0046777,GO:0046835,GO:0046854,GO:0046856,GO:0046872,GO:0046873,GO:0046907,GO:0046912,GO:0046923,GO:0046933,GO:0046938,GO:0046961,GO:0046982,GO:0046983,GO:0047057,GO:0047325,GO:0047429,GO:0047661,GO:0047793,GO:0048015,GO:0048029,GO:0048034,GO:0048037,GO:0048038,GO:0048188,GO:0048193,GO:0048278,GO:0048472,GO:0048478,GO:0048487,GO:0048500,GO:0048678,GO:0048870,GO:0050080,GO:0050113,GO:0050242,GO:0050290,GO:0050333,GO:0050483,GO:0050660,GO:0050661,GO:0050662,GO:0050897,GO:0050992,GO:0051016,GO:0051028,GO:0051056,GO:0051073,GO:0051082,GO:0051087,GO:0051103,GO:0051156,GO:0051168,GO:0051188,GO:0051213,GO:0051259,GO:0051260,GO:0051276,GO:0051287,GO:0051304,GO:0051315,GO:0051382,GO:0051499,GO:0051536,GO:0051537,GO:0051539,GO:0051603,GO:0051726,GO:0051745,GO:0051879,GO:0051920,GO:0051998,GO:0052725,GO:0052726,GO:0052824,GO:0052855,GO:0052861,GO:0055085,GO:0055087,GO:0055114,GO:0060090,GO:0060271,GO:0061575,GO:0061578,GO:0061608,GO:0061617,GO:0061630,GO:0065003,GO:0070008,GO:0070011,GO:0070072,GO:0070204,GO:0070286,GO:0070402,GO:0070403,GO:0070476,GO:0070481,GO:0070567,GO:0070569,GO:0070577,GO:0070628,GO:0070682,GO:0070772,GO:0070773,GO:0070860,GO:0070897,GO:0070940,GO:0070966,GO:0070985,GO:0070988,GO:0071013,GO:0071025,GO:0071203,GO:0071209,GO:0071586,GO:0071596,GO:0071704,GO:0071805,GO:0071821,GO:0071918,GO:0071949,GO:0071985,GO:0071986,GO:0072321,GO:0072357,GO:0072487,GO:0072546,GO:0080009,GO:0080019,GO:0080085,GO:0089701,GO:0090114,GO:0090481,GO:0090522,GO:0090730,GO:0097027,GO:0097056,GO:0097255,GO:0097361,GO:0097367,GO:0097428,GO:0098519,GO:0098656,GO:0099122,GO:0101005,GO:0106035,GO:0106050,GO:0120009,GO:0120013,GO:0140326,GO:1901135,GO:1901137,GO:1901642,GO:1902412,GO:1902445,GO:1902600,GO:1902979,GO:1904263,GO:1904668,GO:1905775,GO:1990112,GO:1990116,GO:1990316,GO:1990380,GO:1990745;//' Sm_ManualCuration.v1.1.2_nuclear.functional_ipr_genes.gff > Sm_ManualCuration.v1.1.2_nuclear.functional_ipr_genes2.gff
Now, I can also add the blastp hits to the gff:
# add swissprot blastp hits to gff
maker_functional_gff swissprot/September28_2022/swissprot_Sep28_2022.fasta \
swissprot/Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins.sprot.blastp.Sep2022.out \
Sm_ManualCuration.v1.1.2_nuclear.functional_ipr_genes2.gff \
> Sm_ManualCuration.v1.1.2_nuclear.functional_ipr_swissprot.gff
Let’s do a quality check.
# test line numbers
wc -l Sm_ManualCuration.v1.1.2_nuclear.functional_ipr_swissprot.gff #17203
Other databases were not added, but can be looked at separately.
Extract GO annotations from the InterProScan file. The output file will be used for the GO enrichment analyses in R:
# grep for lines that contain GO information
grep 'GO:' Sm_ManualCuration.v1.1.2_nuclear.functional_ipr_swissprot.gff > Sm_ManualCuration.v1.1.2_nuclear.functional_ipr_swissprot.GO.gff
# get GO terms
python extract_GOterms.py Sm_ManualCuration.v1.1.2_nuclear.functional_ipr_swissprot.GO.gff | sed 's/ /\t/' | sort > Smarinoi_Ref1.1.2_GOterms.txt
This is the script:
#! /usr/bin/python3
##Python script to extract genes names and GO information from a GFF file
import argparse
import csv
#create an argument parser object
parser = argparse.ArgumentParser(description = "This script extracts gene names and GO information from a GFF file. Note that prior to running this script, the GFF needs to be reduced to only contain $
#add positional argument for the input position in the Fib sequence
parser.add_argument("GFF", help="Name of the GFF file")
#parse the arguments
args = parser.parse_args()
#create two empty lists to store the information on gene names and GO terms
gene_names = []
GO_terms = []
#create a csv reader object
with open(args.GFF,"r") as gff:
#create csv.reader object
reader = csv.reader(gff,delimiter="\t")
for line in reader:
#skip blank lines
if not line:
continue
else:
#access data in the GFF file
for field in reader:
#get gene names
gene_field = field[8].split(";")
gene = gene_field[0].split("ID=")
gene_names.append(gene[1])
#get GO terms
GO_field = field[8].split("Ontology_term=")
GO = GO_field[1].split(";")
GO_terms.append(GO[0])
#check the lay-out of the lists
#print(gene_names)
#print(GO_terms)
#check the length of the lists
#print("The length of the gene names list is: ", len(gene_names))
#print("The length of the GO terms list is: ", len(GO_terms))
#create dictionary
zip_iterator = zip(gene_names,GO_terms)
genes_GO_dict = dict(zip_iterator)
#print(genes_GO_dict)
#print the dictionary in table format
for gene, GO in genes_GO_dict.items():
print('{} {}'.format(gene, GO))
The output of the script looks as follows:
Sm_t00006568-RA GO:0005515
Sm_t00002445-RA GO:0004512,GO:0006021,GO:0008654
Sm_t00001746-RA GO:0003723,GO:0006396,GO:0008173
Sm_t00001746-RA GO:0008168
Sm_t00004904-RA GO:0005515
Sm_t00013811-RA GO:0005515
Sm_t00000656-RA GO:0006629
Sm_t00011110-RA GO:0003924,GO:0005525,GO:0006913
Sm_t00011110-RA GO:0003924,GO:0005525
Sm_t00000013-RA GO:0008061
#etc.
We will be adjusting the python command to get lists of each individual parameter:
# prep the file
sed 's/Note=/Swissprot:/' Sm_ManualCuration.v1.1.2_nuclear.functional_ipr_swissprot.gff | sed 's/PANTHER://2g' | sed 's/InterPro://2g' | sed 's/Pfam://2g' | sed 's/SMART://2g' | sed 's/SignalP_EUK://2g' | sed 's/PRINTS://2g' | sed 's/,PANTHER:/;PANTHER:/' | sed 's/,Pfam:/;Pfam:/' | sed 's/,SMART:/;SMART:/' | sed 's/,SignalP_EUK:/;SignalP_EUK:/' | sed 's/,PRINTS:/;PRINTS:/'> prep.gff
# InterPro
grep 'InterPro:' prep.gff > InterPro.gff; python extract_InterPro.py InterPro.gff | sed 's/ /\t/' | sort > Smarinoi_Ref1.1.2_InterPro.txt; rm InterPro.gff
# Panther
grep 'PANTHER:' prep.gff > Panther.gff; python extract_Panther.py Panther.gff | sed 's/ /\t/' | sort > Smarinoi_Ref1.1.2_Panther.txt; rm Panther.gff
# PRINTS
grep 'PRINTS:' prep.gff > Prints.gff; python extract_Prints.py Prints.gff | sed 's/ /\t/' | sort > Smarinoi_Ref1.1.2_Prints.txt; rm Prints.gff
# Pfam
grep 'Pfam:' prep.gff > Pfam.gff; python extract_Pfam.py Pfam.gff | sed 's/ /\t/' | sort > Smarinoi_Ref1.1.2_Pfam.txt; rm Pfam.gff
# SMART
grep 'SMART:' prep.gff > SMART.gff; python extract_Smart.py SMART.gff | sed 's/ /\t/' | sort > Smarinoi_Ref1.1.2_SMART.txt; rm SMART.gff
# Swissprot
grep 'Swissprot:' prep.gff > Swissprot.gff; python extract_Swissprot.py Swissprot.gff | sed 's/ /\t/' | sed 's/ /_/g' | sort > Smarinoi_Ref1.1.2_Swissprot.txt; rm Swissprot.gff
# SignalP
grep 'SignalP_EUK:' prep.gff > SignalP.gff; python extract_SignalP.py SignalP.gff | sed 's/ /\t/' | sort > Smarinoi_Ref1.1.2_SignalP.txt; rm SignalP.gff
Finally, all databases are joined in R:
# import data
GO = read.table("Smarinoi_Ref1.1.2_GOterms.txt", header=FALSE)
InterPro = read.table("Smarinoi_Ref1.1.2_InterPro.txt", header=FALSE)
Panther = read.table("Smarinoi_Ref1.1.2_Panther.txt", header=FALSE)
Pfam = read.table("Smarinoi_Ref1.1.2_Pfam.txt", header=FALSE)
Prints = read.table("Smarinoi_Ref1.1.2_Prints.txt", header=FALSE)
Smart = read.table("Smarinoi_Ref1.1.2_SMART.txt", header=FALSE)
Swissprot = read.table("Smarinoi_Ref1.1.2_Swissprot.txt", header=FALSE)
SignalP = read.table("Smarinoi_Ref1.1.2_SignalP.txt", header=FALSE)
KEGG = read.table("KEGG_kofamkoalaoutput/Skmarinoi_Ref1.1.2_KEGG_full_output_kofamKOALA_2022Sep28_reduced.txt")
Uniprot = read.table("Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins.uniprot.diamond.out.sel", header=FALSE)
Uniprot2 = read.csv("uniprot-compressed_true_download_true_fields_accession_2Creviewed_2C-2023.01.30-23.41.23.80.csv", header=TRUE)
blast = read.table("Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins.sprot.ref1.1.blastp.out", header=FALSE)
# column names
colnames(GO) = c('genes', 'GO')
colnames(InterPro) = c('genes', 'InterPro')
colnames(Panther) = c('genes', 'Panther')
colnames(Pfam) = c('genes', 'Pfam')
colnames(Prints) = c('genes', 'Prints')
colnames(Smart) = c('genes', 'Smart')
colnames(Swissprot) = c('genes', 'Swissprot')
colnames(SignalP) = c('genes', 'SignalP')
colnames(KEGG) = c('genes', 'KEGG_ID', 'KEGG_info')
colnames(Uniprot) = c('genes', 'Uniprot_hit')
blast = blast[,c(1:2)]
colnames(blast) = c('genes', 'ID_ref1.1.1')
# merge data
tmp1 = merge(blast, Swissprot, by = c("genes"), all=TRUE)
tmp2 = merge(tmp1, Uniprot, by = c("genes"), all=TRUE)
tmp3 = merge(tmp2, Uniprot2, by = "Uniprot_hit", all.x = TRUE)
tmp4 = merge(tmp3, GO, by = c("genes"), all=TRUE)
tmp5 = merge(tmp4, KEGG, by = c("genes"), all=TRUE)
tmp6 = merge(tmp5, InterPro, by = c("genes"), all=TRUE)
tmp7 = merge(tmp6, Panther, by = c("genes"), all=TRUE)
tmp8 = merge(tmp7, Pfam, by = c("genes"), all=TRUE)
tmp9 = merge(tmp8, Prints, by = c("genes"), all=TRUE)
tmp10 = merge(tmp9, Smart, by = c("genes"), all=TRUE)
tmp11 = merge(tmp10, SignalP, by = c("genes"), all=TRUE)
data = tmp11
# export data
write.csv(data, "Smarinoi_Ref1.1.2_full-annotation.csv")
Note that for some genes, there are multiple output lines. This happened for genes with multiple KEGG hits.
We used R v4.0.2 for our analyses.
The required R-packages:
library("edgeR")
## Warning: package 'edgeR' was built under R version 4.1.1
## Warning: package 'limma' was built under R version 4.1.3
library("stageR")
## Warning: package 'stageR' was built under R version 4.1.1
## Warning: package 'SummarizedExperiment' was built under R version 4.1.1
## Warning: package 'MatrixGenerics' was built under R version 4.1.1
## Warning: package 'matrixStats' was built under R version 4.1.2
## Warning: package 'GenomicRanges' was built under R version 4.1.2
## Warning: package 'BiocGenerics' was built under R version 4.1.1
## Warning: package 'IRanges' was built under R version 4.1.1
## Warning: package 'GenomeInfoDb' was built under R version 4.1.2
## Warning: package 'Biobase' was built under R version 4.1.1
library("limma")
library("topGO")
## Warning: package 'topGO' was built under R version 4.1.1
## Warning: package 'graph' was built under R version 4.1.1
## Warning: package 'AnnotationDbi' was built under R version 4.1.2
library("GO.db")
library("topconfects")
## Warning: package 'topconfects' was built under R version 4.1.1
library("UpSetR")
library("PoiClaClu")
library("RColorBrewer")
## Warning: package 'RColorBrewer' was built under R version 4.1.2
library("pheatmap")
library("ggplot2")
## Warning: package 'ggplot2' was built under R version 4.1.2
library("ComplexHeatmap")
## Warning: package 'ComplexHeatmap' was built under R version 4.1.1
library("VennDiagram")
## Warning: package 'VennDiagram' was built under R version 4.1.2
library("tidyr")
## Warning: package 'tidyr' was built under R version 4.1.2
library("plyr")
## Warning: package 'plyr' was built under R version 4.1.2
Used package versions:
# check package versions of all used packages
packageVersion("edgeR")
## [1] '3.36.0'
packageVersion("stageR")
## [1] '1.16.0'
packageVersion("limma")
## [1] '3.50.3'
packageVersion("topGO")
## [1] '2.46.0'
packageVersion("GO.db")
## [1] '3.14.0'
packageVersion("topconfects")
## [1] '1.10.0'
packageVersion("UpSetR")
## [1] '1.4.0'
packageVersion("PoiClaClu")
## [1] '1.0.2.1'
packageVersion("RColorBrewer")
## [1] '1.1.3'
packageVersion("pheatmap")
## [1] '1.0.12'
packageVersion("ggplot2")
## [1] '3.4.1'
packageVersion("ComplexHeatmap")
## [1] '2.10.0'
packageVersion("VennDiagram")
## [1] '1.7.3'
packageVersion("tidyr")
## [1] '1.3.0'
packageVersion("plyr")
## [1] '1.8.8'
We imported the dataset (output HTSeq) as follows:
# import the count data
x = read.table("02.Skmarinoi8x3_rna-seq_reanalysis/Skmarinoi8x3_reanalysis_ref1.1.2_gene-level_counts_FINAL.txt",header=TRUE)
x = x[-c(17204:17208), ] #drop the last lines that do not contain information on gene counts
total_gene_number = nrow(x) #total number of genes in the analysis
EdgeR works with a DGEList data class object. This needed to be created using the count data and a group object that contains information on the different groups:
# create a DGEList data class
group = c("A.16ppt","A.16ppt","A.16ppt","A.24ppt","A.24ppt","A.24ppt","A.8ppt","A.8ppt","A.8ppt","B.16ppt","B.16ppt","B.16ppt","B.24ppt","B.24ppt","B.24ppt","B.8ppt","B.8ppt","B.8ppt","D.16ppt","D.16ppt","D.16ppt","D.24ppt","D.24ppt","D.24ppt","D.8ppt","D.8ppt","D.8ppt","F.16ppt","F.16ppt","F.16ppt","F.24ppt","F.24ppt","F.24ppt","F.8ppt","F.8ppt","F.8ppt","I.16ppt","I.16ppt","I.16ppt","I.24ppt","I.24ppt","I.24ppt","I.8ppt","I.8ppt","I.8ppt","J.16ppt","J.16ppt","J.16ppt","J.24ppt","J.24ppt","J.24ppt","J.8ppt","J.8ppt","J.8ppt","K.16ppt","K.16ppt","K.16ppt","K.24ppt","K.24ppt","K.24ppt","K.8ppt","K.8ppt","K.8ppt","P.16ppt","P.16ppt","P.16ppt","P.24ppt","P.24ppt","P.24ppt","P.8ppt","P.8ppt","P.8ppt")
y = DGEList(counts=x, group=group)
In a next step, the genes that have very low counts across all the libraries were removed. Filtering was done using the CPM (count per million). Here, we retained all the genes that have least one CPM in at least three samples:
# filter out lowly expressed genes
keep = rowSums(cpm(y)>1)>=3 #keep genes that have a least one count per million in at least three samples
y = y[keep,]
y$samples$lib.size = colSums(y$counts)
Next, we calculated a set of normalization factors (one for each sample) to eliminate composition biases between libraries:
# calculate normalization factors
y = calcNormFactors(y, method = 'TMM') #normalizes for RNA composition (highly expressed genes)
In a next step, we generated mean-difference (MD) plots for each sample. A MD plot allows exploring the expression profiles of individual samples more closely. A MD plot visualizes the library size-adjusted log-fold change between two libraries (the difference) against the average log-expression across those libraries (the mean).
# plot MD plots for genotype A
par(mfrow=c(3,3))
for (library in c(1:9)){
plotMD(y, column = library)
abline(h=0, col="red",lty=2,lwd=2)}
# plot MD plots for genotype B
par(mfrow=c(3,3))
for (library in c(10:18)){
plotMD(y, column = library)
abline(h=0, col="red",lty=2,lwd=2)}
# plot MD plots for genotype D
par(mfrow=c(3,3))
for (library in c(19:27)){
plotMD(y, column = library)
abline(h=0, col="red",lty=2,lwd=2)}
# plot MD plots for genotype F
par(mfrow=c(3,3))
for (library in c(28:36)){
plotMD(y, column = library)
abline(h=0, col="red",lty=2,lwd=2)}
# plot MD plots for genotype I
par(mfrow=c(3,3))
for (library in c(37:45)){
plotMD(y, column = library)
abline(h=0, col="red",lty=2,lwd=2)}
# plot MD plots for genotype J
par(mfrow=c(3,3))
for (library in c(46:54)){
plotMD(y, column = library)
abline(h=0, col="red",lty=2,lwd=2)}
# plot MD plots for genotype K
par(mfrow=c(3,3))
for (library in c(55:63)){
plotMD(y, column = library)
abline(h=0, col="red",lty=2,lwd=2)}
# plot MD plots for genotype P
par(mfrow=c(3,3))
for (library in c(64:72)){
plotMD(y, column = library)
abline(h=0, col="red",lty=2,lwd=2)}
Next, we created a design matrix. This matrix allowed for pairwise comparisons between genotype+conditions when doing the analyses on Differential Expression:
# create a design matrix without intercept
design = model.matrix(~0+group, data=y$samples)
colnames(design) = levels(y$samples$group)
Prior to differential expression analysis, dispersion needed to be estimated:
# estimate dispersion
y = estimateDisp(y, design)
# visualize dispersion
plotBCV(y)
We fitted the actual model. We used the glmQLFit function, which is a quasi-likelihood (QL) method that accounts for gene-specific variability from both biological and technical sources:
# model fitting
fit_group_model = glmQLFit(y, design, robust=TRUE)
plotQLDisp(fit_group_model)
To explore the dataset, we plotted a MDS plot.
In a MDS plot, the distance between each pair of samples can be interpreted as the leading log-fold change between the samples for the genes that best distinguish that pair of samples. By default, leading fold-change is defined as the root-mean-square of the largest 500 log2-fold changes between that pair of samples.
# define colors for salinities
high = '#433E85FF'
med = '#1E9B8AFF'
low = "#C2DF23FF"
colors = rep(c(med,high,low),8)
# define symbols for genotypes
pch = c(21,21,21,24,24,24,10,10,10,22,22,22,25,25,25,8,8,8,23,23,23,12,12,12)
# plot MDS plot
plotMDS(y, top = 500, title = "MDS plot for top 500 genes", bg = colors[(y$samples$group)],
col = colors[(y$samples$group)], pch=pch[(y$samples$group)],cex = 1.25)
legend("bottomright", inset = c(0, 0), legend=levels(y$samples$group),
pch=pch, col='black', pt.bg=colors, ncol=8, cex = 0.55)
Similarity between samples was also explored by means of a heatmap:
# calculate poisson distances for the normalized count data
poisd = PoissonDistance(t(y$counts))
# create a list with sample names to be used in the plot
names = c("A.16ppt","A.16ppt","A.16ppt","A.24ppt","A.24ppt","A.24ppt","A.8ppt","A.8ppt","A.8ppt","B.16ppt","B.16ppt","B.16ppt","B.24ppt","B.24ppt","B.24ppt","B.8ppt","B.8ppt","B.8ppt","D.16ppt","D.16ppt","D.16ppt","D.24ppt","D.24ppt","D.24ppt","D.8ppt","D.8ppt","D.8ppt","F.16ppt","F.16ppt","F.16ppt","F.24ppt","F.24ppt","F.24ppt","F.8ppt","F.8ppt","F.8ppt","I.16ppt","I.16ppt","I.16ppt","I.24ppt","I.24ppt","I.24ppt","I.8ppt","I.8ppt","I.8ppt","J.16ppt","J.16ppt","J.16ppt","J.24ppt","J.24ppt","J.24ppt","J.8ppt","J.8ppt","J.8ppt","K.16ppt","K.16ppt","K.16ppt","K.24ppt","K.24ppt","K.24ppt","K.8ppt","K.8ppt","K.8ppt","P.16ppt","P.16ppt","P.16ppt","P.24ppt","P.24ppt","P.24ppt","P.8ppt","P.8ppt","P.8ppt")
# define colors to be used in the plot
colors_heatmap = colorRampPalette(rev(brewer.pal(9,"Purples")))(255)
# plot heatmap
samplePoisDistMatrix = as.matrix(poisd$dd)
rownames(samplePoisDistMatrix) = paste(names)
colnames(samplePoisDistMatrix) = paste(names)
pheatmap(samplePoisDistMatrix,
clustering_distance_rows=poisd$dd,
clustering_distance_cols=poisd$dd,
col=colors_heatmap)
In this omnibus test we combined the tests for the average salinity effect, as well as the respones of each individual genotype.
In a first step, we defined all the contrasts that need to be tested: 27 in total (24 for the genotypes and 3 for the average effect):
# define all contrasts to test
C_RQ1e2=matrix(0,nrow=ncol(fit_group_model$coefficients),ncol=27)
rownames(C_RQ1e2)=colnames(fit_group_model$coefficients)
colnames(C_RQ1e2)=c("A8-A16","A16-A24","A8-A24",
"B8-B16","B16-B24","B8-B24",
"D8-D16","D16-D24","D8-D24",
"F8-F16","F16-F24","F8-F24",
"I8-I16","I16-I24","I8-I24",
"J8-J16","J16-J24","J8-J24",
"K8-K16","K16-K24","K8-K24",
"P8-P16","P16-P24","P8-P24",
"avg8-16", "avg16-24","avg8-24")
# genotype A (salinity effect)
C_RQ1e2[c("A.8ppt","A.16ppt"),"A8-A16"]=c(1,-1)
C_RQ1e2[c("A.8ppt","A.24ppt"),"A8-A24"]=c(1,-1)
C_RQ1e2[c("A.16ppt","A.24ppt"),"A16-A24"]=c(1,-1)
# genotype B (salinity effect)
C_RQ1e2[c("B.8ppt","B.16ppt"),"B8-B16"]=c(1,-1)
C_RQ1e2[c("B.8ppt","B.24ppt"),"B8-B24"]=c(1,-1)
C_RQ1e2[c("B.16ppt","B.24ppt"),"B16-B24"]=c(1,-1)
# genotype D (salinity effect)
C_RQ1e2[c("D.8ppt","D.16ppt"),"D8-D16"]=c(1,-1)
C_RQ1e2[c("D.8ppt","D.24ppt"),"D8-D24"]=c(1,-1)
C_RQ1e2[c("D.16ppt","D.24ppt"),"D16-D24"]=c(1,-1)
# genotype F (salinity effect)
C_RQ1e2[c("F.8ppt","F.16ppt"),"F8-F16"]=c(1,-1)
C_RQ1e2[c("F.8ppt","F.24ppt"),"F8-F24"]=c(1,-1)
C_RQ1e2[c("F.16ppt","F.24ppt"),"F16-F24"]=c(1,-1)
# genotype I (salinity effect)
C_RQ1e2[c("I.8ppt","I.16ppt"),"I8-I16"]=c(1,-1)
C_RQ1e2[c("I.8ppt","I.24ppt"),"I8-I24"]=c(1,-1)
C_RQ1e2[c("I.16ppt","I.24ppt"),"I16-I24"]=c(1,-1)
# genotype J (salinity effect)
C_RQ1e2[c("J.8ppt","J.16ppt"),"J8-J16"]=c(1,-1)
C_RQ1e2[c("J.8ppt","J.24ppt"),"J8-J24"]=c(1,-1)
C_RQ1e2[c("J.16ppt","J.24ppt"),"J16-J24"]=c(1,-1)
# genotype K (salinity effect)
C_RQ1e2[c("K.8ppt","K.16ppt"),"K8-K16"]=c(1,-1)
C_RQ1e2[c("K.8ppt","K.24ppt"),"K8-K24"]=c(1,-1)
C_RQ1e2[c("K.16ppt","K.24ppt"),"K16-K24"]=c(1,-1)
# genotype P (salinity effect)
C_RQ1e2[c("P.8ppt","P.16ppt"),"P8-P16"]=c(1,-1)
C_RQ1e2[c("P.8ppt","P.24ppt"),"P8-P24"]=c(1,-1)
C_RQ1e2[c("P.16ppt","P.24ppt"),"P16-P24"]=c(1,-1)
# average salinity effect
C_RQ1e2[c("A.8ppt", "B.8ppt", "D.8ppt","F.8ppt", "I.8ppt", "J.8ppt", "K.8ppt", "P.8ppt",
"A.16ppt", "B.16ppt", "D.16ppt","F.16ppt", "I.16ppt", "J.16ppt", "K.16ppt", "P.16ppt"),
"avg8-16"]=c(1/8,1/8,1/8,1/8,1/8,1/8,1/8,1/8,-1/8,-1/8,-1/8,-1/8,-1/8,-1/8,-1/8,-1/8)
C_RQ1e2[c("A.8ppt", "B.8ppt", "D.8ppt","F.8ppt", "I.8ppt", "J.8ppt", "K.8ppt", "P.8ppt",
"A.24ppt", "B.24ppt", "D.24ppt","F.24ppt", "I.24ppt", "J.24ppt", "K.24ppt", "P.24ppt"),
"avg8-24"]=c(1/8,1/8,1/8,1/8,1/8,1/8,1/8,1/8,-1/8,-1/8,-1/8,-1/8,-1/8,-1/8,-1/8,-1/8)
C_RQ1e2[c("A.16ppt", "B.16ppt", "D.16ppt","F.16ppt", "I.16ppt", "J.16ppt", "K.16ppt", "P.16ppt",
"A.24ppt", "B.24ppt", "D.24ppt","F.24ppt", "I.24ppt", "J.24ppt", "K.24ppt", "P.24ppt"),
"avg16-24"]=c(1/8,1/8,1/8,1/8,1/8,1/8,1/8,1/8,-1/8,-1/8,-1/8,-1/8,-1/8,-1/8,-1/8,-1/8)
We performed the stage-wise testing procedure in stageR. StageR allows for simultaneous FDR control in all the contrasts, and consists of two steps: the screening stage, and the confirmation stage.
The screening stage tested whether any of the 27 contrasts were significant, i.e. it tests whether there has been any effect of the treatment for each genotype separately as well as for the average effect. The screening stage gives P-values as output, but these are not yet FDR-controlled so should not be used in downstream analyses.
# screening stage
alpha = 0.05
screenTest_RQ1e2 = glmQLFTest(fit_group_model, contrast=C_RQ1e2)
pScreen_RQ1e2 = screenTest_RQ1e2$table$PValue
names(pScreen_RQ1e2) = rownames(screenTest_RQ1e2$table)
The screening stage was followed by the confirmation stage. In the confirmation stage, every contrast was assessed separately. The confirmation stage P-values were adjusted to control the FWER across the hypotheses within a gene and are subsequently corrected to the BH-adjusted significance level of the screening stage. This allowed for a direct comparison of the adjusted P-values to the provided significance level alpha for both screening and confirmation stage adjusted P-values. Here, we used the holm method for correction of the P-values.
# confirmation stage
confirmationResults_RQ1e2 = sapply(1:ncol(C_RQ1e2),function(i) glmQLFTest(fit_group_model, contrast = C_RQ1e2[,i]), simplify=FALSE) #calculates Ftest for each contrast
confirmationPList_RQ1e2 = lapply(confirmationResults_RQ1e2, function(x) x$table$PValue) # takes the P-values from all genes for each contrast and puts them in a list
confirmationP_RQ1e2 = as.matrix(Reduce(f=cbind,confirmationPList_RQ1e2))
rownames(confirmationP_RQ1e2) = rownames(confirmationResults_RQ1e2[[1]]$table)
colnames(confirmationP_RQ1e2) = colnames(C_RQ1e2)
stageRObj_RQ1e2 = stageR(pScreen=pScreen_RQ1e2, pConfirmation=confirmationP_RQ1e2) # constructs an object
stageRAdj_RQ1e2 = stageWiseAdjustment(object=stageRObj_RQ1e2, method="holm", alpha=0.05) # adjusts the P-values using FWER correction using the holm method
resRQ1e2 = getResults(stageRAdj_RQ1e2)
## The returned adjusted p-values are based on a stage-wise testing approach and are only valid for the provided target OFDR level of 5%. If a different target OFDR level is of interest,the entire adjustment should be re-run.
# number of DE genes in every contrast
SignifGenesRQ1e2 = colSums(resRQ1e2)
SignifGenesRQ1e2
## padjScreen A8-A16 A16-A24 A8-A24 B8-B16 B16-B24 B8-B24
## 8820 815 385 1649 469 443 1750
## D8-D16 D16-D24 D8-D24 F8-F16 F16-F24 F8-F24 I8-I16
## 1026 451 1224 1159 867 1428 285
## I16-I24 I8-I24 J8-J16 J16-J24 J8-J24 K8-K16 K16-K24
## 297 502 751 215 1042 197 638
## K8-K24 P8-P16 P16-P24 P8-P24 avg8-16 avg16-24 avg8-24
## 1102 1378 648 1810 2586 2014 4276
# get adjusted P-values
adjusted_p_RQ1e2 = getAdjustedPValues(stageRAdj_RQ1e2, onlySignificantGenes = FALSE, order = FALSE)
## The returned adjusted p-values are based on a stage-wise testing approach and are only valid for the provided target OFDR level of 5%. If a different target OFDR level is of interest,the entire adjustment should be re-run.
Upon finishing the stage-wise testing procedure, we checked the number of significant genes:
# visualize number of significant genes in each contrast
resRQ1e2_df = as.data.frame(resRQ1e2)
resRQ1e2_df2 = resRQ1e2_df
resRQ1e2_df2$gene = rownames(resRQ1e2_df2)
OnlySignGenes_RQ1e2 = resRQ1e2_df[resRQ1e2_df$padjScreen == 1,] # removes rows for which global test was non significant
dim(OnlySignGenes_RQ1e2) # still includes genes for which all posthoc tests were 0
## [1] 8820 28
Were there any genes that were significant in the screening stage but not in the confirmation stage?
# select genes that were only significant in the screening stage
genesSI_RQ1e2 = rownames(adjusted_p_RQ1e2)[adjusted_p_RQ1e2[,"padjScreen"]<=0.05]
genesNotFoundStageII_RQ1e2 = genesSI_RQ1e2[genesSI_RQ1e2 %in% rownames(resRQ1e2)[rowSums(resRQ1e2==0)==27]]
length(genesNotFoundStageII_RQ1e2) #stage I only genes
## [1] 1144
1144 genes were not significant in the confirmation stage, whereas they were found to be significant in the screening stage.
We removed the genes that were not significant after the confirmation stage:
# create object that only contains genes that are significant after the confirmation stage
OnlySignGenes_RQ1e2_ConStage = OnlySignGenes_RQ1e2 [!rownames(OnlySignGenes_RQ1e2 ) %in% genesNotFoundStageII_RQ1e2, ]
nrow(OnlySignGenes_RQ1e2_ConStage)
## [1] 7676
7676 genes were significant after the confirmation stage. These are the genes we continued our analyses with.
Before we continued with the downstream analyses, we created a single data object that contains some key-information of the statistical pipeline outlined above. This included information on logFC, logCPM and P-values for each gene for each contrast.
First, we selected the FDR adjusted P-values for each contrast using the output of the stageR screening stage:
# select the adjusted P-values for each contrast
adjusted_p_RQ1e2 = getAdjustedPValues(stageRAdj_RQ1e2, onlySignificantGenes = FALSE, order = FALSE)
## The returned adjusted p-values are based on a stage-wise testing approach and are only valid for the provided target OFDR level of 5%. If a different target OFDR level is of interest,the entire adjustment should be re-run.
# rename column headers in adjusted_p_RQ1e2
colnames(adjusted_p_RQ1e2)=c("padjScreen","A8vsA16_Padj","A16vsA24_Padj","A8vsA24_Padj","B8vsB16_Padj","B16vsB24_Padj","B8vsB24_Padj","D8vsD16_Padj","D16vsD24_Padj","D8vsD24_Padj","F8vsF16_Padj","F16vsF24_Padj","F8vsF24_Padj","I8vsI16_Padj","I16vsI24_Padj","I8vsI24_Padj","J8vsJ16_Padj","J16vsJ24_Padj","J8vsJ24_Padj","K8vsK16_Padj","K16vsK24_Padj","K8vsK24_Padj","P8vsP16_Padj","P16vsP24_Padj","P8vsP24_Padj","avg8vs16_Padj","avg16vs24_Padj","avg8vs24_Padj")
Second, we extracted the information on logFC, logCPM, F value and non-adjusted P-values from the confirmationResults_RQ1e2 object:
# create empty list to hold the data values
datalist = list()
# loop over the confirmationResults_RQ1e2 object to obtain the relevant information (table)
for (contrast in c(1:27)){
table = confirmationResults_RQ1e2[[contrast]]$table
datalist[[contrast]] = table
}
# turn list into data frame
confirmationResults_RQ1e2_total_dataset = data.frame(datalist)
# rename column names for tractability
colnames(confirmationResults_RQ1e2_total_dataset)=c("A8vsA16_logFC","A8vsA16_logCPM","A8vsA16_F","A8vsA16_nonadj_PValue","A16vsA24_logFC","A16vsA24_logCPM","A16vsA24_F","A16vsA24_nonadj_PValue","A8vsA24_logFC","A8vsA24_logCPM","A8vsA24_F","A8vsA24_nonadj_PValue","B8vsB16_logFC","B8vsB16_logCPM","B8vsB16_F","B8vsB16_nonadj_PValue","B16vsB24_logFC","B16vsB24_logCPM","B16vsB24_F","B16vsB24_nonadj_PValue","B8vsB24_logFC","B8vsB24_logCPM","B8vsB24_F","B8vsB24_nonadj_PValue","D8vsD16_logFC","D8vsD16_logCPM","D8vsD16_F","D8vsD16_nonadj_PValue","D16vsD24_logFC","D16vsD24_logCPM","D16vsD24_F","D16vsD24_nonadj_PValue","D8vsD24_logFC","D8vsD24_logCPM","D8vsD24_F","D8vsD24_nonadj_PValue","F8vsF16_logFC","F8vsF16_logCPM","F8vsF16_F","F8vsF16_nonadj_PValue","F16vsF24_logFC","F16vsF24_logCPM","F16vsF24_F","F16vsF24_nonadj_PValue","F8vsF24_logFC","F8vsF24_logCPM","F8vsF24_F","F8vsF24_nonadj_PValue","I8vsI16_logFC","I8vsI16_logCPM","I8vsI16_F","I8vsI16_nonadj_PValue","I16vsI24_logFC","I16vsI24_logCPM","I16vsI24_F","I16vsI24_nonadj_PValue","I8vsI24_logFC","I8vsI24_logCPM","I8vsI24_F","I8vsI24_nonadj_PValue","J8vsJ16_logFC","J8vsJ16_logCPM","J8vsJ16_F","J8vsJ16_nonadj_PValue","J16vsJ24_logFC","J16vsJ24_logCPM","J16vsJ24_F","J16vsJ24_nonadj_PValue","J8vsJ24_logFC","J8vsJ24_logCPM","J8vsJ24_F","J8vsJ24_nonadj_PValue","K8vsK16_logFC","K8vsK16_logCPM","K8vsK16_F","K8vsK16_nonadj_PValue","K16vsK24_logFC","K16vsK24_logCPM","K16vsK24_F","K16vsK24_nonadj_PValue","K8vsK24_logFC","K8vsK24_logCPM","K8vsK24_F","K8vsK24_nonadj_PValue","P8vsP16_logFC","P8vsP16_logCPM","P8vsP16_F","P8vsP16_nonadj_PValue","P16vsP24_logFC","P16vsP24_logCPM","P16vsP24_F","P16vsP24_nonadj_PValue","P8vsP24_logFC","P8vsP24_logCPM","P8vsP24_F","P8vsP24_nonadj_PValue","avg8vs16_logFC","avg8vs16_logCPM","avg8vs16_F","avg8vs16_nonadj_PValue","avg16vs24_logFC","avg16vs24_logCPM","avg16vs24_F","avg16vs24_nonadj_PValue","avg8vs24_logFC","avg8vs24_logCPM","avg8vs24_F","avg8vs24_nonadj_PValue")
Then we combined the FDR-adjusted P-values and the table with information on logFC etc. into a single data frame:
# merge the data frames
table = merge(confirmationResults_RQ1e2_total_dataset,adjusted_p_RQ1e2, by = 0, all = TRUE)
# use the first column (gene names) for the row names
all_results_RQ1e2 = table[,-1]
rownames(all_results_RQ1e2) = table[,1]
The resulting data frame all_results_RQ1e2 was used in multiple analyses downstream to access basic statistical information on each gene for each contrast.
Core response genes were defined as genes that are DE in each genotype, regardless of the salinity contrast. We selected for these genes as follows:
# list of significant genes for each genotype
SignGenes_genA = c(rownames (subset (OnlySignGenes_RQ1e2_ConStage,
OnlySignGenes_RQ1e2_ConStage$`A8-A24`== 1 |
OnlySignGenes_RQ1e2_ConStage$`A8-A16`== 1 |
OnlySignGenes_RQ1e2_ConStage$`A16-A24`== 1)))
SignGenes_genB = c(rownames (subset (OnlySignGenes_RQ1e2_ConStage,
OnlySignGenes_RQ1e2_ConStage$`B8-B24`== 1 |
OnlySignGenes_RQ1e2_ConStage$`B8-B16`== 1 |
OnlySignGenes_RQ1e2_ConStage$`B16-B24`== 1)))
SignGenes_genD = c(rownames (subset (OnlySignGenes_RQ1e2_ConStage,
OnlySignGenes_RQ1e2_ConStage$`D8-D24`== 1 |
OnlySignGenes_RQ1e2_ConStage$`D8-D16`== 1 |
OnlySignGenes_RQ1e2_ConStage$`D16-D24`== 1)))
SignGenes_genF = c(rownames (subset (OnlySignGenes_RQ1e2_ConStage,
OnlySignGenes_RQ1e2_ConStage$`F8-F24`== 1 |
OnlySignGenes_RQ1e2_ConStage$`F8-F16`== 1 |
OnlySignGenes_RQ1e2_ConStage$`F16-F24`== 1)))
SignGenes_genI = c(rownames (subset (OnlySignGenes_RQ1e2_ConStage,
OnlySignGenes_RQ1e2_ConStage$`I8-I24`== 1 |
OnlySignGenes_RQ1e2_ConStage$`I8-I16`== 1 |
OnlySignGenes_RQ1e2_ConStage$`I16-I24`== 1)))
SignGenes_genJ = c(rownames (subset (OnlySignGenes_RQ1e2_ConStage,
OnlySignGenes_RQ1e2_ConStage$`J8-J24`== 1 |
OnlySignGenes_RQ1e2_ConStage$`J8-J16`== 1 |
OnlySignGenes_RQ1e2_ConStage$`J16-J24`== 1)))
SignGenes_genK = c(rownames (subset (OnlySignGenes_RQ1e2_ConStage,
OnlySignGenes_RQ1e2_ConStage$`K8-K24`== 1 |
OnlySignGenes_RQ1e2_ConStage$`K8-K16`== 1 |
OnlySignGenes_RQ1e2_ConStage$`K16-K24`== 1)))
SignGenes_genP = c(rownames (subset (OnlySignGenes_RQ1e2_ConStage,
OnlySignGenes_RQ1e2_ConStage$`P8-P24`== 1 |
OnlySignGenes_RQ1e2_ConStage$`P8-P16`== 1 |
OnlySignGenes_RQ1e2_ConStage$`P16-P24`== 1)))
# take intersect of all genotypes
RQ1e2_CoreResponse = Reduce(intersect, list(SignGenes_genA,SignGenes_genB,SignGenes_genD,SignGenes_genF,SignGenes_genI,SignGenes_genJ,SignGenes_genK,SignGenes_genP))
RQ1e2_CoreResponse
## [1] "Sm_t00000820-RA" "Sm_t00001191-RA" "Sm_t00002242-RA" "Sm_t00002835-RA"
## [5] "Sm_t00003616-RA" "Sm_t00003882-RA" "Sm_t00005258-RA" "Sm_t00005259-RA"
## [9] "Sm_t00005877-RA" "Sm_t00007121-RA" "Sm_t00007360-RA" "Sm_t00007543-RA"
## [13] "Sm_t00008098-RA" "Sm_t00008123-RA" "Sm_t00008820-RA" "Sm_t00009398-RA"
## [17] "Sm_t00009402-RA" "Sm_t00009981-RA" "Sm_t00010077-RA" "Sm_t00010552-RA"
## [21] "Sm_t00010556-RA" "Sm_t00011041-RA" "Sm_t00011042-RA" "Sm_t00012577-RA"
## [25] "Sm_t00013291-RA" "Sm_t00013313-RA" "Sm_t00014816-RA" "Sm_t00015478-RA"
## [29] "Sm_t00016600-RA" "Sm_t00017272-RA" "Sm_t00018475-RA" "Sm_t00018687-RA"
## [33] "Sm_t00018847-RA"
How did these core response genes relate to the top genes selected by stageR’s FDR-adjusted P-value of the global null hypothesis (Padjscreen)?
# top 25 genes
Padjscreen_sorted_top25 = Padjscreen_sorted[1:25,]
Padjscreen_sorted_top25_genes = rownames(Padjscreen_sorted_top25)
venn_top25 = venn.diagram(x = list(RQ1e2_CoreResponse, Padjscreen_sorted_top25_genes), NULL,
main = "top 25 genes & core response", main.fontface = "plain",
main.fontfamily = "sans", main.col = "black", main.cex = 1.5, #lay-out title
category.names = c("core response", "top 25"), alpha=c(0.5,0.5),
lwd = 2, lty = 'blank', fill = c("dodgerblue3", "gray60"), #lay-out circles
cex = 1, fontface = "bold", fontfamily = "sans", #lay-out numbers
cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", #lay-out names
cat.pos = c(-155, 155), cat.dist = c(0.055, 0.055), cat.fontfamily = "sans")
grid.draw(venn_top25)
# top 100 genes
Padjscreen_sorted_top100 = Padjscreen_sorted[1:100,]
Padjscreen_sorted_top100_genes = rownames(Padjscreen_sorted_top100)
venn_top100 = venn.diagram(x = list(RQ1e2_CoreResponse, Padjscreen_sorted_top100_genes), NULL,
main = "top 100 genes & core response", main.fontface = "plain",
main.fontfamily = "sans", main.col = "black", main.cex = 1.5, #lay-out title
category.names = c("core response", "top 100"), alpha=c(0.5,0.5),
lwd = 2, lty = 'blank', fill = c("dodgerblue3", "gray60"), #lay-out circles
cex = 1, fontface = "bold", fontfamily = "sans", #lay-out numbers
cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", #lay-out names
cat.pos = c(-165, 145), cat.dist = c(0.055, 0.055), cat.fontfamily = "sans")
grid.draw(venn_top100)
# top 225 genes
Padjscreen_sorted_top225 = Padjscreen_sorted[1:225,]
Padjscreen_sorted_top225_genes = rownames(Padjscreen_sorted_top225)
venn_top225 = venn.diagram(x = list(RQ1e2_CoreResponse, Padjscreen_sorted_top225_genes), NULL,
main = "top 225 genes & core response", main.fontface = "plain",
main.fontfamily = "sans", main.col = "black", main.cex = 1.5, #lay-out title
category.names = c("core response", "top 225"), alpha=c(0.5,0.5),
lwd = 2, lty = 'blank', fill = c("dodgerblue3", "gray60"), #lay-out circles
cex = 1, fontface = "bold", fontfamily = "sans", #lay-out numbers
cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", #lay-out names
cat.pos = c(-165, 45), cat.dist = c(0.055, 0.055), cat.fontfamily = "sans")
grid.draw(venn_top225)
Next, we plotted a heatmap of the core response genes, showing logFC values of significant and non-significant contrasts:
# select logFC values
all_results_RQ1e2_logFC = all_results_RQ1e2[,grepl("logFC", colnames(all_results_RQ1e2))]
# change column names of logFC object
colnames(all_results_RQ1e2_logFC) = c('A8-A16','A16-A24','A8-A24','B8-B16','B16-B24','B8-B24','D8-D16','D16-D24','D8-D24','F8-F16','F16-F24','F8-F24','I8-I16','I16-I24','I8-I24','J8-J16','J16-J24','J8-J24','K8-K16','K16-K24','K8-K24','P8-P16','P16-P24','P8-P24','avg8-avg16','avg16-avg24','avg8-avg24')
# select core response genes
all_results_RQ1e2_logFC_core = subset(all_results_RQ1e2_logFC, rownames(all_results_RQ1e2_logFC)%in%RQ1e2_CoreResponse)
# reorder gene names (rows)
target = c("Sm_t00000820-RA", "Sm_t00001191-RA", "Sm_t00002242-RA", "Sm_t00002835-RA",
"Sm_t00003616-RA", "Sm_t00003882-RA", "Sm_t00005258-RA", "Sm_t00005259-RA",
"Sm_t00005877-RA", "Sm_t00007121-RA", "Sm_t00007360-RA", "Sm_t00007543-RA",
"Sm_t00008098-RA", "Sm_t00008123-RA", "Sm_t00008820-RA", "Sm_t00009398-RA",
"Sm_t00009402-RA", "Sm_t00009981-RA", "Sm_t00010077-RA", "Sm_t00010552-RA",
"Sm_t00010556-RA", "Sm_t00011041-RA", "Sm_t00011042-RA", "Sm_t00012577-RA",
"Sm_t00013291-RA", "Sm_t00013313-RA", "Sm_t00014816-RA", "Sm_t00015478-RA",
"Sm_t00016600-RA", "Sm_t00017272-RA", "Sm_t00018475-RA", "Sm_t00018687-RA",
"Sm_t00018847-RA")
all_results_RQ1e2_logFC_core_reordered = all_results_RQ1e2_logFC_core [match(target, rownames(all_results_RQ1e2_logFC_core)),]
# change row names (= gene names) to include more information on gene identity
#rownames(all_results_RQ1e2_logFC_core_reordered) =
#c("Sm_g00002242 slc38a11 - amino acid transporter",
#"Sm_g00003882 KEA3 - potassium transporter",
#"Sm_g00005258 SLC35F5 - solute transporter",
#"Sm_g00021791 SLC35F5 - solute transporter",
#"Sm_g00007543 MJ0079 - ATPase activity",
#"Sm_g00007121 ATP13A3 - cation transporting ATPase",
#"Sm_g00008820 HMA9 - cation transporting ATPase",
#"Sm_g00000820 Evolv2 - fatty acid/lipid metabolism",
#"Sm_g00005259 - fatty acid/lipid metabolism",
#"Sm_g00021792 - fatty acid/lipid metabolism",
#"Sm_g00013313 odc-1 - polyamine biosynthesis",
#"Sm_g00020016 aphA - polyamine biosynthesis",
#"Sm_g00015478 CALS1 - 1,3-beta-D-glucan biosynthesis",
#"Sm_g00012577 VDE1 - violaxanthin-de-epoxidase",
#"Sm_g00008098 - transcription factor",
#"Sm_g00009981 fusA - translation elongation",
#"Sm_g00009402 - protein binding activity",
#"Sm_g00013291 - unknown",
#"Sm_g00017716 - unknown",
#"Sm_g00018687 - unknown",
#"Sm_g00019737 - unknown",
#"Sm_g00014816 - unknown",
#"Sm_g00007360 Usp5 - deubiquitination",
#"Sm_g00015422 PSMD12 - proteasome subunit",
#"Sm_g00008123 - iron ion binding",
#"Sm_g00011041 AKHSDH1 - glycine/serine/threonine metabolism [ectoine?]",
#"Sm_g00011042 asd - glycine/serine/threonine metabolism [ectoine?]")
# reformat data for plotting
all_results_RQ1e2_logFC_core_reordered_reshaped = gather(all_results_RQ1e2_logFC_core_reordered,
"condition", "logFC", 1:27)
temp_rownames = rep(rownames(all_results_RQ1e2_logFC_core_reordered), 27)
all_results_RQ1e2_logFC_core_reordered_reshaped$gene = temp_rownames
# create data frame with TRUE/FALSE information on significance
all_results_RQ1e2_OnlySig_logFC_OnlySign_TF = all_results_RQ1e2_OnlySig_logFC_OnlySign
all_results_RQ1e2_OnlySig_logFC_OnlySign_TF[] = lapply(all_results_RQ1e2_OnlySig_logFC_OnlySign_TF, as.logical)
all_results_RQ1e2_OnlySig_logFC_OnlySign_TF[all_results_RQ1e2_OnlySig_logFC_OnlySign_TF == FALSE] = NA
# include information on significance
TF_logFC_core_response = subset(all_results_RQ1e2_OnlySig_logFC_OnlySign_TF,
rownames(all_results_RQ1e2_OnlySig_logFC_OnlySign_TF)%in%RQ1e2_CoreResponse)
TF_logFC_core_response_reordered = TF_logFC_core_response[match(target, rownames(TF_logFC_core_response)),]
TF_logFC_core_response_reordered_reshaped = gather(TF_logFC_core_response_reordered,
"condition2", "significance", 1:27)
temp_rownames = rep(rownames(TF_logFC_core_response_reordered), 27)
TF_logFC_core_response_reordered_reshaped$gene2 = temp_rownames
logFC_core_response_all = cbind(all_results_RQ1e2_logFC_core_reordered_reshaped,
TF_logFC_core_response_reordered_reshaped)
# create order for plotting
logFC_core_response_all$condition = factor(logFC_core_response_all$condition, levels = c(
'avg16-avg24','A16-A24','B16-B24','D16-D24','F16-F24','I16-I24','J16-J24','K16-K24','P16-P24',
'avg8-avg16','A8-A16','B8-B16','D8-D16','F8-F16','I8-I16','J8-J16','K8-K16','P8-P16',
'avg8-avg24','A8-A24','B8-B24','D8-D24','F8-F24','I8-I24','J8-J24','K8-K24','P8-P24'))
#logFC_core_response_all$gene = factor(logFC_core_response_all$gene,
# levels = rev(c("Sm_g00002242 slc38a11 - amino acid transporter",
#"Sm_g00003882 KEA3 - potassium transporter",
#"Sm_g00005258 SLC35F5 - solute transporter",
#"Sm_g00021791 SLC35F5 - solute transporter",
#"Sm_g00007543 MJ0079 - ATPase activity",
#"Sm_g00007121 ATP13A3 - cation transporting ATPase",
#"Sm_g00008820 HMA9 - cation transporting ATPase",
#"Sm_g00000820 Evolv2 - fatty acid/lipid metabolism",
#"Sm_g00005259 - fatty acid/lipid metabolism",
#"Sm_g00021792 - fatty acid/lipid metabolism",
#"Sm_g00013313 odc-1 - polyamine biosynthesis",
#"Sm_g00020016 aphA - polyamine biosynthesis",
#"Sm_g00015478 CALS1 - 1,3-beta-D-glucan biosynthesis",
#"Sm_g00012577 VDE1 - violaxanthin-de-epoxidase",
#"Sm_g00008098 - transcription factor",
#"Sm_g00009981 fusA - translation elongation",
#"Sm_g00009402 - protein binding activity",
#"Sm_g00013291 - unknown",
#"Sm_g00017716 - unknown",
#"Sm_g00018687 - unknown",
#"Sm_g00019737 - unknown",
#"Sm_g00014816 - unknown",
#"Sm_g00007360 Usp5 - deubiquitination",
#"Sm_g00015422 PSMD12 - proteasome subunit",
#"Sm_g00008123 - iron ion binding",
#"Sm_g00011041 AKHSDH1 - glycine/serine/threonine metabolism [ectoine?]",
#"Sm_g00011042 asd - glycine/serine/threonine metabolism [ectoine?]")))
# plot heatmap
heatmap_core_all = ggplot(logFC_core_response_all, aes(x = condition, y = gene, fill = logFC)) + geom_tile() +
geom_tile(data = logFC_core_response_all[!is.na(logFC_core_response_all$significance), ],
aes(color = significance), size = 0.5) +
theme(panel.background = element_blank()) +
scale_fill_gradient2(low="#313695", mid = 'white', high="#A50026", midpoint=0, name = 'logFC') +
scale_color_manual(guide = FALSE, values = c(`TRUE` = "black")) +
xlab("Contrast") +
ylab("Gene") +
theme(axis.text.x = element_text(angle = 45, vjust = 0.8, hjust = 0.8, size = 9),
axis.text.y = element_text(size = 9), strip.text = element_text(size = 9, family = "sans", ),
axis.title = element_text(size = 12), strip.text.y = element_text(angle = 0, hjust = 0),
strip.background = element_blank())
heatmap_core_all
In this section, we plotted a volcano plot for each contrast (average response and genotype-dependent response).
First, the 8-24 contrasts:
par(mfrow=c(3,3))
# calculate threshold at which the non adjusted P-values are no longer significant in the FDR
subset = subset(all_results_RQ1e2, avg8vs24_Padj>0.05 | avg8vs24_Padj==is.na(NA)) # select all genes that are not significant for the FDR
sel = subset[with(subset, order(subset$avg8vs24_Padj)),] # sort Padj from small to large
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) # select the name of the first row = smallest non significant non adjusted P-value
threshold = sel2$avg8vs24_nonadj_PValue # select the non adjusted P-value
# plot the volcano plot
plot(1, type="n", xlab = NA, ylab="-log10 nonadj P", main="average effect",
xlim=c(-10,10),
ylim=c(0, 45))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, avg8vs24_Padj > 0.05 | avg8vs24_Padj == is.na(NA)),
points(avg8vs24_logFC, -log10(avg8vs24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(avg8vs24_logFC) >= 1 & avg8vs24_Padj <= 0.05),
points(avg8vs24_logFC, -log10(avg8vs24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(avg8vs24_logFC) < 1 & avg8vs24_Padj <= 0.05),
points(avg8vs24_logFC, -log10(avg8vs24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
# core genes
core_genes = as.data.frame(subset(all_results_RQ1e2, rownames(all_results_RQ1e2)%in%RQ1e2_CoreResponse))
with(subset(core_genes, avg8vs24_Padj <= 0.05), points(avg8vs24_logFC, -log10(avg8vs24_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, avg8vs24_Padj > 0.05), points(avg8vs24_logFC, -log10(avg8vs24_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
## genotype A [8-24]
subset = subset(all_results_RQ1e2, A8vsA24_Padj>0.05 | A8vsA24_Padj==is.na(NA))
sel = subset[with(subset, order(subset$A8vsA24_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),])
threshold = sel2$A8vsA24_nonadj_PValue
plot(1, type="n", xlab = NA, ylab = NA, main="genotype A",
xlim=c(-10,10),
ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, A8vsA24_Padj > 0.05 | A8vsA24_Padj == is.na(NA)),
points(A8vsA24_logFC, -log10(A8vsA24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(A8vsA24_logFC) >= 1 & A8vsA24_Padj <= 0.05),
points(A8vsA24_logFC, -log10(A8vsA24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(A8vsA24_logFC) < 1 & A8vsA24_Padj <= 0.05),
points(A8vsA24_logFC, -log10(A8vsA24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, A8vsA24_Padj <= 0.05), points(A8vsA24_logFC, -log10(A8vsA24_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, A8vsA24_Padj > 0.05), points(A8vsA24_logFC, -log10(A8vsA24_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
# genotype B [8-24]
subset = subset(all_results_RQ1e2, B8vsB24_Padj>0.05 | B8vsB24_Padj==is.na(NA))
sel = subset[with(subset, order(subset$B8vsB24_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),])
threshold = sel2$B8vsB24_nonadj_PValue
plot(1, type="n",xlab = NA, ylab = NA, main="genotype B",
xlim=c(-10,10),
ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, B8vsB24_Padj > 0.05 | B8vsB24_Padj == is.na(NA)),
points(B8vsB24_logFC, -log10(B8vsB24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(B8vsB24_logFC) >= 1 & B8vsB24_Padj <= 0.05),
points(B8vsB24_logFC, -log10(B8vsB24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(B8vsB24_logFC) < 1 & B8vsB24_Padj <= 0.05),
points(B8vsB24_logFC, -log10(B8vsB24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, B8vsB24_Padj <= 0.05), points(B8vsB24_logFC, -log10(B8vsB24_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, B8vsB24_Padj > 0.05), points(B8vsB24_logFC, -log10(B8vsB24_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
## genotype D [8-24]
subset = subset(all_results_RQ1e2, D8vsD24_Padj>0.05 | D8vsD24_Padj==is.na(NA))
sel = subset[with(subset, order(subset$D8vsD24_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),])
threshold = sel2$D8vsD24_nonadj_PValue
plot(1, type="n", xlab = NA, ylab="-log10 nonadj P", main="genotype D",
xlim=c(-10,10),
ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, D8vsD24_Padj > 0.05 | D8vsD24_Padj == is.na(NA)),
points(D8vsD24_logFC, -log10(D8vsD24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(D8vsD24_logFC) >= 1 & D8vsD24_Padj <= 0.05),
points(D8vsD24_logFC, -log10(D8vsD24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(D8vsD24_logFC) < 1 & D8vsD24_Padj <= 0.05),
points(D8vsD24_logFC, -log10(D8vsD24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, D8vsD24_Padj <= 0.05), points(D8vsD24_logFC, -log10(D8vsD24_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, D8vsD24_Padj > 0.05), points(D8vsD24_logFC, -log10(D8vsD24_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
## genotype F [8-24]
subset = subset(all_results_RQ1e2, F8vsF24_Padj>0.05 | F8vsF24_Padj==is.na(NA))
sel = subset[with(subset, order(subset$F8vsF24_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),])
threshold = sel2$F8vsF24_nonadj_PValue
plot(1, type="n", xlab = NA, ylab = NA, main="genotype F",
xlim=c(-10,10),
ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, F8vsF24_Padj > 0.05 | F8vsF24_Padj == is.na(NA)),
points(F8vsF24_logFC, -log10(F8vsF24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(F8vsF24_logFC) >= 1 & F8vsF24_Padj <= 0.05),
points(F8vsF24_logFC, -log10(F8vsF24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(F8vsF24_logFC) < 1 & F8vsF24_Padj <= 0.05),
points(F8vsF24_logFC, -log10(F8vsF24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, F8vsF24_Padj <= 0.05), points(F8vsF24_logFC, -log10(F8vsF24_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, F8vsF24_Padj > 0.05), points(F8vsF24_logFC, -log10(F8vsF24_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
## genotype I [8-24]
subset = subset(all_results_RQ1e2, I8vsI24_Padj>0.05 | I8vsI24_Padj==is.na(NA))
sel = subset[with(subset, order(subset$I8vsI24_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),])
threshold = sel2$I8vsI24_nonadj_PValue
plot(1, type="n", xlab = NA, ylab = NA, main="genotype I",
xlim=c(-10,10),
ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, I8vsI24_Padj > 0.05 | I8vsI24_Padj == is.na(NA)),
points(I8vsI24_logFC, -log10(I8vsI24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(I8vsI24_logFC) >= 1 & I8vsI24_Padj <= 0.05),
points(I8vsI24_logFC, -log10(I8vsI24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(I8vsI24_logFC) < 1 & I8vsI24_Padj <= 0.05),
points(I8vsI24_logFC, -log10(I8vsI24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, I8vsI24_Padj <= 0.05), points(I8vsI24_logFC, -log10(I8vsI24_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, I8vsI24_Padj > 0.05), points(I8vsI24_logFC, -log10(I8vsI24_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
## genotype J [8-24]
subset = subset(all_results_RQ1e2, J8vsJ24_Padj>0.05 | J8vsJ24_Padj==is.na(NA))
sel = subset[with(subset, order(subset$J8vsJ24_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),])
threshold = sel2$J8vsJ24_nonadj_PValue
plot(1, type="n", xlab="log2 fold change", ylab="-log10 nonadj P", main="genotype J",
xlim=c(-10,10),
ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, J8vsJ24_Padj > 0.05 | J8vsJ24_Padj == is.na(NA)),
points(J8vsJ24_logFC, -log10(J8vsJ24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(J8vsJ24_logFC) >= 1 & J8vsJ24_Padj <= 0.05),
points(J8vsJ24_logFC, -log10(J8vsJ24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(J8vsJ24_logFC) < 1 & J8vsJ24_Padj <= 0.05),
points(J8vsJ24_logFC, -log10(J8vsJ24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, J8vsJ24_Padj <= 0.05), points(J8vsJ24_logFC, -log10(J8vsJ24_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, J8vsJ24_Padj > 0.05), points(J8vsJ24_logFC, -log10(J8vsJ24_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
## genotype K [8-24]
subset = subset(all_results_RQ1e2, K8vsK24_Padj>0.05 | K8vsK24_Padj==is.na(NA))
sel = subset[with(subset, order(subset$K8vsK24_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),])
threshold = sel2$K8vsK24_nonadj_PValue
plot(1, type="n", xlab="log2 fold change", ylab = NA, main="genotype K",
xlim=c(-10,10),
ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, K8vsK24_Padj > 0.05 | K8vsK24_Padj == is.na(NA)),
points(K8vsK24_logFC, -log10(K8vsK24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(K8vsK24_logFC) >= 1 & K8vsK24_Padj <= 0.05),
points(K8vsK24_logFC, -log10(K8vsK24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(K8vsK24_logFC) < 1 & K8vsK24_Padj <= 0.05),
points(K8vsK24_logFC, -log10(K8vsK24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, K8vsK24_Padj <= 0.05), points(K8vsK24_logFC, -log10(K8vsK24_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, K8vsK24_Padj > 0.05), points(K8vsK24_logFC, -log10(K8vsK24_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
## genotype P [8-24]
subset = subset(all_results_RQ1e2, P8vsP24_Padj>0.05 | P8vsP24_Padj==is.na(NA))
sel = subset[with(subset, order(subset$P8vsP24_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),])
threshold = sel2$P8vsP24_nonadj_PValue
plot(1, type="n", xlab="log2 fold change", ylab = NA, main="genotype P",
xlim=c(-10,10),
ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, P8vsP24_Padj > 0.05 | P8vsP24_Padj == is.na(NA)),
points(P8vsP24_logFC, -log10(P8vsP24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(P8vsP24_logFC) >= 1 & P8vsP24_Padj <= 0.05),
points(P8vsP24_logFC, -log10(P8vsP24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(P8vsP24_logFC) < 1 & P8vsP24_Padj <= 0.05),
points(P8vsP24_logFC, -log10(P8vsP24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, P8vsP24_Padj <= 0.05), points(P8vsP24_logFC, -log10(P8vsP24_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, P8vsP24_Padj > 0.05), points(P8vsP24_logFC, -log10(P8vsP24_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
# plot title
mtext("Volcano plots contrasts 8vs24", side = 3, line = -1.25, outer = TRUE,font = 2)
The 16-24 contrasts:
par(mfrow=c(3,3))
# calculate threshold at which the non adjusted P-values are no longer significant in the FDR
subset = subset(all_results_RQ1e2, avg16vs24_Padj>0.05 | avg16vs24_Padj==is.na(NA))
sel = subset[with(subset, order(subset$avg16vs24_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),])
threshold = sel2$avg16vs24_nonadj_PValue
# plot the volcano plot
plot(1, type="n", xlab=NA, ylab="-log10 nonadj P", main="average effect",
xlim=c(-10,10),
ylim=c(0, 45))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, avg16vs24_Padj > 0.05 | avg16vs24_Padj == is.na(NA)),
points(avg16vs24_logFC, -log10(avg16vs24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(avg16vs24_logFC) >= 1 & avg16vs24_Padj <= 0.05),
points(avg16vs24_logFC, -log10(avg16vs24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(avg16vs24_logFC) < 1 & avg16vs24_Padj <= 0.05),
points(avg16vs24_logFC, -log10(avg16vs24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
# core genes
with(subset(core_genes, avg16vs24_Padj <= 0.05), points(avg16vs24_logFC, -log10(avg16vs24_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, avg16vs24_Padj > 0.05), points(avg16vs24_logFC, -log10(avg16vs24_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
## genotype A [16-24]
subset = subset(all_results_RQ1e2, A16vsA24_Padj>0.05 | A16vsA24_Padj==is.na(NA))
sel = subset[with(subset, order(subset$A16vsA24_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),])
threshold = sel2$A16vsA24_nonadj_PValue #
plot(1, type="n", xlab=NA, ylab=NA, main="genotype A",
xlim=c(-10,10),
ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, A16vsA24_Padj > 0.05 | A16vsA24_Padj == is.na(NA)),
points(A16vsA24_logFC, -log10(A16vsA24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(A16vsA24_logFC) >= 1 & A16vsA24_Padj <= 0.05),
points(A16vsA24_logFC, -log10(A16vsA24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(A16vsA24_logFC) < 1 & A16vsA24_Padj <= 0.05),
points(A16vsA24_logFC, -log10(A16vsA24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, A16vsA24_Padj <= 0.05), points(A16vsA24_logFC, -log10(A16vsA24_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, A16vsA24_Padj > 0.05), points(A16vsA24_logFC, -log10(A16vsA24_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
## genotype B [16-24]
subset = subset(all_results_RQ1e2, B16vsB24_Padj>0.05 | B16vsB24_Padj==is.na(NA))
sel = subset[with(subset, order(subset$B16vsB24_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),])
threshold = sel2$B16vsB24_nonadj_PValue
plot(1, type="n", xlab=NA, ylab=NA, main="genotype B",
xlim=c(-10,10),
ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, B16vsB24_Padj > 0.05 | B16vsB24_Padj == is.na(NA)),
points(B16vsB24_logFC, -log10(B16vsB24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(B16vsB24_logFC) >= 1 & B16vsB24_Padj <= 0.05),
points(B16vsB24_logFC, -log10(B16vsB24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(B16vsB24_logFC) < 1 & B16vsB24_Padj <= 0.05),
points(B16vsB24_logFC, -log10(B16vsB24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, B16vsB24_Padj <= 0.05), points(B16vsB24_logFC, -log10(B16vsB24_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, B16vsB24_Padj > 0.05), points(B16vsB24_logFC, -log10(B16vsB24_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
## genotype D [16-24]
subset = subset(all_results_RQ1e2, D16vsD24_Padj>0.05 | D16vsD24_Padj==is.na(NA))
sel = subset[with(subset, order(subset$D16vsD24_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),])
threshold = sel2$D16vsD24_nonadj_PValue
plot(1, type="n", xlab=NA, ylab="-log10 nonadj P", main="genotype D",
xlim=c(min(all_results_RQ1e2$D16vsD24_logFC)-1, max(all_results_RQ1e2$D16vsD24_logFC)+1),
ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, D16vsD24_Padj > 0.05 | D16vsD24_Padj == is.na(NA)),
points(D16vsD24_logFC, -log10(D16vsD24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(D16vsD24_logFC) >= 1 & D16vsD24_Padj <= 0.05),
points(D16vsD24_logFC, -log10(D16vsD24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(D16vsD24_logFC) < 1 & D16vsD24_Padj <= 0.05),
points(D16vsD24_logFC, -log10(D16vsD24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, D16vsD24_Padj <= 0.05), points(D16vsD24_logFC, -log10(D16vsD24_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, D16vsD24_Padj > 0.05), points(D16vsD24_logFC, -log10(D16vsD24_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
## genotype F [16-24]
subset = subset(all_results_RQ1e2, F16vsF24_Padj>0.05 | F16vsF24_Padj==is.na(NA))
sel = subset[with(subset, order(subset$F16vsF24_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),])
threshold = sel2$F16vsF24_nonadj_PValue
plot(1, type="n", xlab=NA, ylab=NA, main="genotype F",
xlim=c(-10,10),
ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, F16vsF24_Padj > 0.05 | F16vsF24_Padj == is.na(NA)),
points(F16vsF24_logFC, -log10(F16vsF24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(F16vsF24_logFC) >= 1 & F16vsF24_Padj <= 0.05),
points(F16vsF24_logFC, -log10(F16vsF24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(F16vsF24_logFC) < 1 & F16vsF24_Padj <= 0.05),
points(F16vsF24_logFC, -log10(F16vsF24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, F16vsF24_Padj <= 0.05), points(F16vsF24_logFC, -log10(F16vsF24_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, F16vsF24_Padj > 0.05), points(F16vsF24_logFC, -log10(F16vsF24_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
## genotype I [16-24]
subset = subset(all_results_RQ1e2, I16vsI24_Padj>0.05 | I16vsI24_Padj==is.na(NA))
sel = subset[with(subset, order(subset$I16vsI24_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),])
threshold = sel2$I16vsI24_nonadj_PValue
plot(1, type="n", xlab=NA, ylab=NA, main="genotype I",
xlim=c(-10,10),
ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, I16vsI24_Padj > 0.05 | I16vsI24_Padj == is.na(NA)),
points(I16vsI24_logFC, -log10(I16vsI24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(I16vsI24_logFC) >= 1 & I16vsI24_Padj <= 0.05),
points(I16vsI24_logFC, -log10(I16vsI24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(I16vsI24_logFC) < 1 & I16vsI24_Padj <= 0.05),
points(I16vsI24_logFC, -log10(I16vsI24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, I16vsI24_Padj <= 0.05), points(I16vsI24_logFC, -log10(I16vsI24_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, I16vsI24_Padj > 0.05), points(I16vsI24_logFC, -log10(I16vsI24_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
## genotype J [16-24]
subset = subset(all_results_RQ1e2, J16vsJ24_Padj>0.05 | J16vsJ24_Padj==is.na(NA))
sel = subset[with(subset, order(subset$J16vsJ24_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),])
threshold = sel2$J16vsJ24_nonadj_PValue
plot(1, type="n", xlab="log2 fold change", ylab="-log10 nonadj P", main="genotype J",
xlim=c(-10,10),
ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, J16vsJ24_Padj > 0.05 | J16vsJ24_Padj == is.na(NA)),
points(J16vsJ24_logFC, -log10(J16vsJ24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(J16vsJ24_logFC) >= 1 & J16vsJ24_Padj <= 0.05),
points(J16vsJ24_logFC, -log10(J16vsJ24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(J16vsJ24_logFC) < 1 & J16vsJ24_Padj <= 0.05),
points(J16vsJ24_logFC, -log10(J16vsJ24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, J16vsJ24_Padj <= 0.05), points(J16vsJ24_logFC, -log10(J16vsJ24_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, J16vsJ24_Padj > 0.05), points(J16vsJ24_logFC, -log10(J16vsJ24_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
## genotype K [16-24]
subset = subset(all_results_RQ1e2, K16vsK24_Padj>0.05 | K16vsK24_Padj==is.na(NA))
sel = subset[with(subset, order(subset$K16vsK24_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),])
threshold = sel2$K16vsK24_nonadj_PValue
plot(1, type="n", xlab="log2 fold change", ylab=NA, main="genotype K",
xlim=c(-10,10),
ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, K16vsK24_Padj > 0.05 | K16vsK24_Padj == is.na(NA)),
points(K16vsK24_logFC, -log10(K16vsK24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(K16vsK24_logFC) >= 1 & K16vsK24_Padj <= 0.05),
points(K16vsK24_logFC, -log10(K16vsK24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(K16vsK24_logFC) < 1 & K16vsK24_Padj <= 0.05),
points(K16vsK24_logFC, -log10(K16vsK24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, K16vsK24_Padj <= 0.05), points(K16vsK24_logFC, -log10(K16vsK24_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, K16vsK24_Padj > 0.05), points(K16vsK24_logFC, -log10(K16vsK24_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
##genotype P [16-24]
subset = subset(all_results_RQ1e2, P16vsP24_Padj>0.05 | P16vsP24_Padj==is.na(NA))
sel = subset[with(subset, order(subset$P16vsP24_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),])
threshold = sel2$P16vsP24_nonadj_PValue
plot(1, type="n", xlab="log2 fold change", ylab=NA, main="genotype P",
xlim=c(-10,10),
ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, P16vsP24_Padj > 0.05 | P16vsP24_Padj == is.na(NA)),
points(P16vsP24_logFC, -log10(P16vsP24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(P16vsP24_logFC) >= 1 & P16vsP24_Padj <= 0.05),
points(P16vsP24_logFC, -log10(P16vsP24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(P16vsP24_logFC) < 1 & P16vsP24_Padj <= 0.05),
points(P16vsP24_logFC, -log10(P16vsP24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, P16vsP24_Padj <= 0.05), points(P16vsP24_logFC, -log10(P16vsP24_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, P16vsP24_Padj > 0.05), points(P16vsP24_logFC, -log10(P16vsP24_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
# plot title
mtext("Volcano plots contrasts 16vs24", side = 3, line = -1.25, outer = TRUE,font = 2)
And the 8-16 contrasts:
par(mfrow=c(3,3))
# calculate threshold at which the non adjusted P-values are no longer significant in the FDR
subset = subset(all_results_RQ1e2, avg8vs16_Padj>0.05 | avg8vs16_Padj==is.na(NA)) # select all genes that are not significant for the FDR
sel = subset[with(subset, order(subset$avg8vs16_Padj)),] # sort Padj from small to large
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) # select the name of the first row = smallest non significant non adjusted P-value
threshold = sel2$avg8vs16_nonadj_PValue # select the non adjusted P-value
# plot the volcano plot
plot(1, type="n", xlab=NA, ylab="-log10 nonadj P", main="average effect",
xlim=c(-10,10),
ylim=c(0, 45))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, avg8vs16_Padj > 0.05 | avg8vs16_Padj == is.na(NA)),
points(avg8vs16_logFC, -log10(avg8vs16_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(avg8vs16_logFC) >= 1 & avg8vs16_Padj <= 0.05),
points(avg8vs16_logFC, -log10(avg8vs16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(avg8vs16_logFC) < 1 & avg8vs16_Padj <= 0.05),
points(avg8vs16_logFC, -log10(avg8vs16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
# core genes
with(subset(core_genes, avg8vs16_Padj <= 0.05), points(avg8vs16_logFC, -log10(avg8vs16_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, avg8vs16_Padj > 0.05), points(avg8vs16_logFC, -log10(avg8vs16_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
## genotype A [8-16]
subset = subset(all_results_RQ1e2, A8vsA16_Padj>0.05 | A8vsA16_Padj==is.na(NA))
sel = subset[with(subset, order(subset$A8vsA16_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),])
threshold = sel2$A8vsA16_nonadj_PValue
plot(1, type="n", xlab=NA, ylab=NA, main="genotype A",
xlim=c(-10,10),
ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, A8vsA16_Padj > 0.05 | A8vsA16_Padj == is.na(NA)),
points(A8vsA16_logFC, -log10(A8vsA16_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(A8vsA16_logFC) >= 1 & A8vsA16_Padj <= 0.05),
points(A8vsA16_logFC, -log10(A8vsA16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(A8vsA16_logFC) < 1 & A8vsA16_Padj <= 0.05),
points(A8vsA16_logFC, -log10(A8vsA16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, A8vsA16_Padj <= 0.05), points(A8vsA16_logFC, -log10(A8vsA16_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, A8vsA16_Padj > 0.05), points(A8vsA16_logFC, -log10(A8vsA16_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
## genotype B [8-16]
subset = subset(all_results_RQ1e2, B8vsB16_Padj>0.05 | B8vsB16_Padj==is.na(NA))
sel = subset[with(subset, order(subset$B8vsB16_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),])
threshold = sel2$B8vsB16_nonadj_PValue
plot(1, type="n", xlab=NA, ylab=NA, main="genotype B",
xlim=c(-10,10),
ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, B8vsB16_Padj > 0.05 | B8vsB16_Padj == is.na(NA)),
points(B8vsB16_logFC, -log10(B8vsB16_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(B8vsB16_logFC) >= 1 & B8vsB16_Padj <= 0.05),
points(B8vsB16_logFC, -log10(B8vsB16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(B8vsB16_logFC) < 1 & B8vsB16_Padj <= 0.05),
points(B8vsB16_logFC, -log10(B8vsB16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, B8vsB16_Padj <= 0.05), points(B8vsB16_logFC, -log10(B8vsB16_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, B8vsB16_Padj > 0.05), points(B8vsB16_logFC, -log10(B8vsB16_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
## genotype D [8-16]
subset = subset(all_results_RQ1e2, D8vsD16_Padj>0.05 | D8vsD16_Padj==is.na(NA))
sel = subset[with(subset, order(subset$D8vsD16_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),])
threshold = sel2$D8vsD16_nonadj_PValue
plot(1, type="n", xlab=NA, ylab="-log10 nonadj P", main="genotype D",
xlim=c(-10,10),
ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, D8vsD16_Padj > 0.05 | D8vsD16_Padj == is.na(NA)),
points(D8vsD16_logFC, -log10(D8vsD16_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(D8vsD16_logFC) >= 1 & D8vsD16_Padj <= 0.05),
points(D8vsD16_logFC, -log10(D8vsD16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(D8vsD16_logFC) < 1 & D8vsD16_Padj <= 0.05),
points(D8vsD16_logFC, -log10(D8vsD16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, D8vsD16_Padj <= 0.05), points(D8vsD16_logFC, -log10(D8vsD16_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, D8vsD16_Padj > 0.05), points(D8vsD16_logFC, -log10(D8vsD16_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
## genotype F [8-16]
subset = subset(all_results_RQ1e2, F8vsF16_Padj>0.05 | F8vsF16_Padj==is.na(NA))
sel = subset[with(subset, order(subset$F8vsF16_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),])
threshold = sel2$F8vsF16_nonadj_PValue
plot(1, type="n", xlab=NA, ylab=NA, main="genotype F",
xlim=c(-10,10),
ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, F8vsF16_Padj > 0.05 | F8vsF16_Padj == is.na(NA)),
points(F8vsF16_logFC, -log10(F8vsF16_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(F8vsF16_logFC) >= 1 & F8vsF16_Padj <= 0.05),
points(F8vsF16_logFC, -log10(F8vsF16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(F8vsF16_logFC) < 1 & F8vsF16_Padj <= 0.05),
points(F8vsF16_logFC, -log10(F8vsF16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, F8vsF16_Padj <= 0.05), points(F8vsF16_logFC, -log10(F8vsF16_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, F8vsF16_Padj > 0.05), points(F8vsF16_logFC, -log10(F8vsF16_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
## genotype I [8-16]
ubset = subset(all_results_RQ1e2, I8vsI16_Padj>0.05 | I8vsI16_Padj==is.na(NA))
sel = subset[with(subset, order(subset$I8vsI16_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),])
threshold = sel2$I8vsI16_nonadj_PValue
plot(1, type="n", xlab=NA, ylab=NA, main="genotype I",
xlim=c(-10,10),
ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, I8vsI16_Padj > 0.05 | I8vsI16_Padj == is.na(NA)),
points(I8vsI16_logFC, -log10(I8vsI16_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(I8vsI16_logFC) >= 1 & I8vsI16_Padj <= 0.05),
points(I8vsI16_logFC, -log10(I8vsI16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(I8vsI16_logFC) < 1 & I8vsI16_Padj <= 0.05),
points(I8vsI16_logFC, -log10(I8vsI16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, I8vsI16_Padj <= 0.05), points(I8vsI16_logFC, -log10(I8vsI16_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, I8vsI16_Padj > 0.05), points(I8vsI16_logFC, -log10(I8vsI16_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
## genotype J [8-16]
subset = subset(all_results_RQ1e2, J8vsJ16_Padj>0.05 | J8vsJ16_Padj==is.na(NA))
sel = subset[with(subset, order(subset$J8vsJ16_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),])
threshold = sel2$J8vsJ16_nonadj_PValue
plot(1, type="n", xlab="log2 fold change", ylab="-log10 nonadj P", main="genotype J",
xlim=c(-10,10),
ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, J8vsJ16_Padj > 0.05 | J8vsJ16_Padj == is.na(NA)),
points(J8vsJ16_logFC, -log10(J8vsJ16_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(J8vsJ16_logFC) >= 1 & J8vsJ16_Padj <= 0.05),
points(J8vsJ16_logFC, -log10(J8vsJ16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(J8vsJ16_logFC) < 1 & J8vsJ16_Padj <= 0.05),
points(J8vsJ16_logFC, -log10(J8vsJ16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, J8vsJ16_Padj <= 0.05), points(J8vsJ16_logFC, -log10(J8vsJ16_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, J8vsJ16_Padj > 0.05), points(J8vsJ16_logFC, -log10(J8vsJ16_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
## genotype K [8-16]
subset = subset(all_results_RQ1e2, K8vsK16_Padj>0.05 | K8vsK16_Padj==is.na(NA))
sel = subset[with(subset, order(subset$K8vsK16_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),])
threshold = sel2$K8vsK16_nonadj_PValue
plot(1, type="n", xlab="log2 fold change", ylab=NA, main="genotype K",
xlim=c(-10,10),
ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, K8vsK16_Padj > 0.05 | K8vsK16_Padj == is.na(NA)),
points(K8vsK16_logFC, -log10(K8vsK16_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(K8vsK16_logFC) >= 1 & K8vsK16_Padj <= 0.05),
points(K8vsK16_logFC, -log10(K8vsK16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(K8vsK16_logFC) < 1 & K8vsK16_Padj <= 0.05),
points(K8vsK16_logFC, -log10(K8vsK16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, K8vsK16_Padj <= 0.05), points(K8vsK16_logFC, -log10(K8vsK16_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, K8vsK16_Padj > 0.05), points(K8vsK16_logFC, -log10(K8vsK16_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
## genotype P [8-16]
subset = subset(all_results_RQ1e2, P8vsP16_Padj>0.05 | P8vsP16_Padj==is.na(NA))
sel = subset[with(subset, order(subset$P8vsP16_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),])
threshold = sel2$P8vsP16_nonadj_PValue
plot(1, type="n", xlab="log2 fold change", ylab=NA, main="genotype P",
xlim=c(-10,10),
ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, P8vsP16_Padj > 0.05 | P8vsP16_Padj == is.na(NA)),
points(P8vsP16_logFC, -log10(P8vsP16_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(P8vsP16_logFC) >= 1 & P8vsP16_Padj <= 0.05),
points(P8vsP16_logFC, -log10(P8vsP16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(P8vsP16_logFC) < 1 & P8vsP16_Padj <= 0.05),
points(P8vsP16_logFC, -log10(P8vsP16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, P8vsP16_Padj <= 0.05), points(P8vsP16_logFC, -log10(P8vsP16_nonadj_PValue),
pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, P8vsP16_Padj > 0.05), points(P8vsP16_logFC, -log10(P8vsP16_nonadj_PValue),
pch=15, cex = 0.75, col="gray30"))
# plot title
mtext("Volcano plots contrasts 8vs16", side = 3, line = -1.25, outer = TRUE,font = 2)
Next, we subdivided DE genes by means of their expression patterns in function of salinity.
We started with the average response:
# extract expression levels of significant genes
expression_RQ1e2 = fit_group_model$fitted.values
# calculate the average expression per gene for each salinity treatment over all genotypes
mean_16ppt_RQ1e2 = as.matrix(apply(expression_RQ1e2[,c(1:3,10:12,19:21,28:30,37:39,46:48,55:57,64:66)], 1, mean))
mean_24ppt_RQ1e2 = as.matrix(apply(expression_RQ1e2[,c(4:6,13:15,22:24,31:33,40:42,49:51,58:60,67:69)], 1, mean))
mean_8ppt_RQ1e2 = as.matrix(apply(expression_RQ1e2[,c(7:9,16:18,25:27,34:36,43:45,52:54,61:63,70:72)], 1, mean))
# add column names to the matrices with the average values
colnames(mean_16ppt_RQ1e2) = c("16ppt")
colnames(mean_24ppt_RQ1e2) = c("24ppt")
colnames(mean_8ppt_RQ1e2) = c("8ppt")
# merge the data frames & take the log value of the expression
mean_expression_RQ1e2_log = as.data.frame(log(cbind(mean_24ppt_RQ1e2, mean_16ppt_RQ1e2, mean_8ppt_RQ1e2)))
# extract expression levels of subsets of genes
## take only genes significant in all contrasts
OnlySignGenes_RQ1e2_ConStage_average_names_allcontrasts = c(rownames(subset(
OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`avg8-16` == 1 &
OnlySignGenes_RQ1e2_ConStage$`avg8-24` == 1 & OnlySignGenes_RQ1e2_ConStage$`avg16-24` == 1)))
mean_expression_RQ1e2_allcontrasts_log = subset(mean_expression_RQ1e2_log,
rownames(mean_expression_RQ1e2_log)%in%OnlySignGenes_RQ1e2_ConStage_average_names_allcontrasts)
## take only genes significant in 16-24 and 8-24
OnlySignGenes_RQ1e2_ConStage_average_names_16vs24_8vs24 = c(rownames(subset(
OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`avg8-16` == 0 &
OnlySignGenes_RQ1e2_ConStage$`avg8-24` == 1 & OnlySignGenes_RQ1e2_ConStage$`avg16-24` == 1)))
mean_expression_RQ1e2_16vs24_8vs24_log = subset(mean_expression_RQ1e2_log,
rownames(mean_expression_RQ1e2_log)%in%OnlySignGenes_RQ1e2_ConStage_average_names_16vs24_8vs24)
## take only genes significant in 16-24 and 8-16
OnlySignGenes_RQ1e2_ConStage_average_names_16vs24_8vs16 = c(rownames(subset(
OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`avg8-16` == 1 &
OnlySignGenes_RQ1e2_ConStage$`avg8-24` == 0 & OnlySignGenes_RQ1e2_ConStage$`avg16-24` == 1)))
mean_expression_RQ1e2_16vs24_8vs16_log = subset(mean_expression_RQ1e2_log,
rownames(mean_expression_RQ1e2_log)%in%OnlySignGenes_RQ1e2_ConStage_average_names_16vs24_8vs16)
## take only genes significant in 8-24 and 8-16
OnlySignGenes_RQ1e2_ConStage_average_names_8vs24_8vs16 = c(rownames(subset(
OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`avg8-16` == 1 &
OnlySignGenes_RQ1e2_ConStage$`avg8-24` == 1 & OnlySignGenes_RQ1e2_ConStage$`avg16-24` == 0)))
mean_expression_RQ1e2_8vs24_8vs16_log = subset(mean_expression_RQ1e2_log,
rownames(mean_expression_RQ1e2_log)%in%OnlySignGenes_RQ1e2_ConStage_average_names_8vs24_8vs16)
## take only genes significant in 16-24
OnlySignGenes_RQ1e2_ConStage_average_names_16vs24 = c(rownames(subset(
OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`avg8-16` == 0 &
OnlySignGenes_RQ1e2_ConStage$`avg8-24` == 0 & OnlySignGenes_RQ1e2_ConStage$`avg16-24` == 1)))
mean_expression_RQ1e2_16vs24_log = subset(mean_expression_RQ1e2_log,
rownames(mean_expression_RQ1e2_log)%in%OnlySignGenes_RQ1e2_ConStage_average_names_16vs24)
## take only genes significant in 8-24
OnlySignGenes_RQ1e2_ConStage_average_names_8vs24 = c(rownames(subset(
OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`avg8-16` == 0 &
OnlySignGenes_RQ1e2_ConStage$`avg8-24` == 1 & OnlySignGenes_RQ1e2_ConStage$`avg16-24` == 0)))
mean_expression_RQ1e2_8vs24_log = subset(mean_expression_RQ1e2_log,
rownames(mean_expression_RQ1e2_log)%in%OnlySignGenes_RQ1e2_ConStage_average_names_8vs24)
## take only genes significant in 8-16
OnlySignGenes_RQ1e2_ConStage_average_names_8vs16 = c(rownames(subset(
OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`avg8-16` == 1 &
OnlySignGenes_RQ1e2_ConStage$`avg8-24` == 0 & OnlySignGenes_RQ1e2_ConStage$`avg16-24` == 0)))
mean_expression_RQ1e2_8vs16_log = subset(mean_expression_RQ1e2_log,
rownames(mean_expression_RQ1e2_log)%in%OnlySignGenes_RQ1e2_ConStage_average_names_8vs16)
# select genes with specific patterns of up- and downregulation
## significant only in one contrast
RQ1e2_16g8 = rownames(subset(mean_expression_RQ1e2_8vs16_log,
mean_expression_RQ1e2_8vs16_log$`16ppt` > mean_expression_RQ1e2_8vs16_log$`8ppt`))
RQ1e2_16s8 = rownames(subset(mean_expression_RQ1e2_8vs16_log,
mean_expression_RQ1e2_8vs16_log$`16ppt` < mean_expression_RQ1e2_8vs16_log$`8ppt`))
RQ1e2_24g8 = rownames(subset(mean_expression_RQ1e2_8vs24_log,
mean_expression_RQ1e2_8vs24_log$`24ppt` > mean_expression_RQ1e2_8vs24_log$`8ppt`))
RQ1e2_24s8 = rownames(subset(mean_expression_RQ1e2_8vs24_log,
mean_expression_RQ1e2_8vs24_log$`24ppt` < mean_expression_RQ1e2_8vs24_log$`8ppt`))
RQ1e2_24g16 = rownames(subset(mean_expression_RQ1e2_16vs24_log,
mean_expression_RQ1e2_16vs24_log$`24ppt` > mean_expression_RQ1e2_16vs24_log$`16ppt`))
RQ1e2_24s16 = rownames(subset(mean_expression_RQ1e2_16vs24_log,
mean_expression_RQ1e2_16vs24_log$`24ppt` < mean_expression_RQ1e2_16vs24_log$`16ppt`))
## significant only in two contrasts
RQ1e2_24g16_24g8 = rownames(subset(mean_expression_RQ1e2_16vs24_8vs24_log,
mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` > mean_expression_RQ1e2_16vs24_8vs24_log$`8ppt` &
mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` > mean_expression_RQ1e2_16vs24_8vs24_log$`16ppt`))
RQ1e2_24s16_24s8 = rownames(subset(mean_expression_RQ1e2_16vs24_8vs24_log,
mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` < mean_expression_RQ1e2_16vs24_8vs24_log$`8ppt` &
mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` < mean_expression_RQ1e2_16vs24_8vs24_log$`16ppt`))
RQ1e2_24g16_24s8 = rownames(subset(mean_expression_RQ1e2_16vs24_8vs24_log,
mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` < mean_expression_RQ1e2_16vs24_8vs24_log$`8ppt` &
mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` > mean_expression_RQ1e2_16vs24_8vs24_log$`16ppt`))
RQ1e2_24s16_24g8 = rownames(subset(mean_expression_RQ1e2_16vs24_8vs24_log,
mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` > mean_expression_RQ1e2_16vs24_8vs24_log$`8ppt` &
mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` < mean_expression_RQ1e2_16vs24_8vs24_log$`16ppt`))
RQ1e2_24g8_16g8 = rownames(subset(mean_expression_RQ1e2_8vs24_8vs16_log,
mean_expression_RQ1e2_8vs24_8vs16_log$`24ppt` > mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt` &
mean_expression_RQ1e2_8vs24_8vs16_log$`16ppt` > mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt`))
RQ1e2_24s8_16s8 = rownames(subset(mean_expression_RQ1e2_8vs24_8vs16_log,
mean_expression_RQ1e2_8vs24_8vs16_log$`24ppt` < mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt` &
mean_expression_RQ1e2_8vs24_8vs16_log$`16ppt` < mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt`))
RQ1e2_24g8_16s8 = rownames(subset(mean_expression_RQ1e2_8vs24_8vs16_log,
mean_expression_RQ1e2_8vs24_8vs16_log$`24ppt` > mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt` &
mean_expression_RQ1e2_8vs24_8vs16_log$`16ppt` < mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt`))
RQ1e2_24s8_16g8 = rownames(subset(mean_expression_RQ1e2_8vs24_8vs16_log,
mean_expression_RQ1e2_8vs24_8vs16_log$`24ppt` < mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt` &
mean_expression_RQ1e2_8vs24_8vs16_log$`16ppt` > mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt`))
RQ1e2_24g16_16g8 = rownames(subset(mean_expression_RQ1e2_16vs24_8vs16_log,
mean_expression_RQ1e2_16vs24_8vs16_log$`24ppt` > mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` &
mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` > mean_expression_RQ1e2_16vs24_8vs16_log$`8ppt`))
RQ1e2_24s16_16s8 = rownames(subset(mean_expression_RQ1e2_16vs24_8vs16_log,
mean_expression_RQ1e2_16vs24_8vs16_log$`24ppt` < mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` &
mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` < mean_expression_RQ1e2_16vs24_8vs16_log$`8ppt`))
RQ1e2_24g16_16s8 = rownames(subset(mean_expression_RQ1e2_16vs24_8vs16_log,
mean_expression_RQ1e2_16vs24_8vs16_log$`24ppt` > mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` &
mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` < mean_expression_RQ1e2_16vs24_8vs16_log$`8ppt`))
RQ1e2_24s16_16g8 = rownames(subset( mean_expression_RQ1e2_16vs24_8vs16_log,
mean_expression_RQ1e2_16vs24_8vs16_log$`24ppt` < mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` &
mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` > mean_expression_RQ1e2_16vs24_8vs16_log$`8ppt`))
## significant only in three contrasts
RQ1e2_24g8_24g16_16g8 = rownames(subset(mean_expression_RQ1e2_allcontrasts_log,
mean_expression_RQ1e2_allcontrasts_log$`24ppt` > mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
mean_expression_RQ1e2_allcontrasts_log$`24ppt` > mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
mean_expression_RQ1e2_allcontrasts_log$`16ppt` > mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_24s8_24s16_16s8 = rownames(subset(mean_expression_RQ1e2_allcontrasts_log,
mean_expression_RQ1e2_allcontrasts_log$`24ppt` < mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
mean_expression_RQ1e2_allcontrasts_log$`24ppt` < mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
mean_expression_RQ1e2_allcontrasts_log$`16ppt` < mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_24g8_24s16_16g8 = rownames(subset(mean_expression_RQ1e2_allcontrasts_log,
mean_expression_RQ1e2_allcontrasts_log$`24ppt` > mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
mean_expression_RQ1e2_allcontrasts_log$`24ppt` < mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
mean_expression_RQ1e2_allcontrasts_log$`16ppt` > mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_24g8_24s16_16s8 = rownames(subset(mean_expression_RQ1e2_allcontrasts_log,
mean_expression_RQ1e2_allcontrasts_log$`24ppt` > mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
mean_expression_RQ1e2_allcontrasts_log$`24ppt` < mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
mean_expression_RQ1e2_allcontrasts_log$`16ppt` < mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_24g8_24g16_16s8 = rownames(subset(mean_expression_RQ1e2_allcontrasts_log,
mean_expression_RQ1e2_allcontrasts_log$`24ppt` > mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
mean_expression_RQ1e2_allcontrasts_log$`24ppt` > mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
mean_expression_RQ1e2_allcontrasts_log$`16ppt` < mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_24s8_24g16_16s8 = rownames(subset(mean_expression_RQ1e2_allcontrasts_log,
mean_expression_RQ1e2_allcontrasts_log$`24ppt` < mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
mean_expression_RQ1e2_allcontrasts_log$`24ppt` > mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
mean_expression_RQ1e2_allcontrasts_log$`16ppt` < mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_24s8_24s16_16g8 = rownames(subset(mean_expression_RQ1e2_allcontrasts_log,
mean_expression_RQ1e2_allcontrasts_log$`24ppt` < mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
mean_expression_RQ1e2_allcontrasts_log$`24ppt` < mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
mean_expression_RQ1e2_allcontrasts_log$`16ppt` > mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_24s8_24g16_16g8 = rownames(subset(mean_expression_RQ1e2_allcontrasts_log,
mean_expression_RQ1e2_allcontrasts_log$`24ppt` < mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
mean_expression_RQ1e2_allcontrasts_log$`24ppt` > mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
mean_expression_RQ1e2_allcontrasts_log$`16ppt` > mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
# take subsets of relative expression values
RQ1e2_16g8_exp = subset(mean_expression_RQ1e2_8vs16_log,
rownames(mean_expression_RQ1e2_8vs16_log)%in%RQ1e2_16g8)
RQ1e2_16s8_exp = subset(mean_expression_RQ1e2_8vs16_log,
rownames(mean_expression_RQ1e2_8vs16_log)%in%RQ1e2_16s8)
RQ1e2_24g8_exp = subset(mean_expression_RQ1e2_8vs24_log,
rownames(mean_expression_RQ1e2_8vs24_log)%in%RQ1e2_24g8)
RQ1e2_24s8_exp = subset(mean_expression_RQ1e2_8vs24_log,
rownames(mean_expression_RQ1e2_8vs24_log)%in%RQ1e2_24s8)
RQ1e2_24g16_exp = subset(mean_expression_RQ1e2_16vs24_log,
rownames(mean_expression_RQ1e2_16vs24_log)%in%RQ1e2_24g16)
RQ1e2_24s16_exp = subset(mean_expression_RQ1e2_16vs24_log,
rownames(mean_expression_RQ1e2_16vs24_log)%in%RQ1e2_24s16)
RQ1e2_24g16_24g8_exp = subset(mean_expression_RQ1e2_16vs24_8vs24_log,
rownames(mean_expression_RQ1e2_16vs24_8vs24_log)%in%RQ1e2_24g16_24g8)
RQ1e2_24g16_24s8_exp = subset(mean_expression_RQ1e2_16vs24_8vs24_log,
rownames(mean_expression_RQ1e2_16vs24_8vs24_log)%in%RQ1e2_24g16_24s8)
RQ1e2_24s16_24g8_exp = subset(mean_expression_RQ1e2_16vs24_8vs24_log,
rownames(mean_expression_RQ1e2_16vs24_8vs24_log)%in%RQ1e2_24s16_24g8)
RQ1e2_24s16_24s8_exp = subset(mean_expression_RQ1e2_16vs24_8vs24_log,
rownames(mean_expression_RQ1e2_16vs24_8vs24_log)%in%RQ1e2_24s16_24s8)
RQ1e2_24g8_16g8_exp = subset(mean_expression_RQ1e2_8vs24_8vs16_log,
rownames(mean_expression_RQ1e2_8vs24_8vs16_log)%in%RQ1e2_24g8_16g8)
RQ1e2_24s8_16s8_exp = subset(mean_expression_RQ1e2_8vs24_8vs16_log,
rownames(mean_expression_RQ1e2_8vs24_8vs16_log)%in%RQ1e2_24s8_16s8)
RQ1e2_24g8_16s8_exp = subset(mean_expression_RQ1e2_8vs24_8vs16_log,
rownames(mean_expression_RQ1e2_8vs24_8vs16_log)%in%RQ1e2_24g8_16s8)
RQ1e2_24s8_16g8_exp = subset(mean_expression_RQ1e2_8vs24_8vs16_log,
rownames(mean_expression_RQ1e2_8vs24_8vs16_log)%in%RQ1e2_24s8_16g8)
RQ1e2_24g16_16g8_exp = subset(mean_expression_RQ1e2_16vs24_8vs16_log,
rownames(mean_expression_RQ1e2_16vs24_8vs16_log)%in%RQ1e2_24g16_16g8)
RQ1e2_24s16_16s8_exp = subset(mean_expression_RQ1e2_16vs24_8vs16_log,
rownames(mean_expression_RQ1e2_16vs24_8vs16_log)%in%RQ1e2_24s16_16s8)
RQ1e2_24g16_16s8_exp = subset(mean_expression_RQ1e2_16vs24_8vs16_log,
rownames(mean_expression_RQ1e2_16vs24_8vs16_log)%in%RQ1e2_24g16_16s8)
RQ1e2_24s16_16g8_exp = subset(mean_expression_RQ1e2_16vs24_8vs16_log,
rownames(mean_expression_RQ1e2_16vs24_8vs16_log)%in%RQ1e2_24s16_16g8)
RQ1e2_24g8_24g16_16g8_exp = subset(mean_expression_RQ1e2_allcontrasts_log,
rownames(mean_expression_RQ1e2_allcontrasts_log)%in%RQ1e2_24g8_24g16_16g8)
RQ1e2_24s8_24s16_16s8_exp = subset(mean_expression_RQ1e2_allcontrasts_log,
rownames(mean_expression_RQ1e2_allcontrasts_log)%in%RQ1e2_24s8_24s16_16s8)
RQ1e2_24g8_24s16_16g8_exp = subset(mean_expression_RQ1e2_allcontrasts_log,
rownames(mean_expression_RQ1e2_allcontrasts_log)%in%RQ1e2_24g8_24s16_16g8)
RQ1e2_24g8_24s16_16s8_exp = subset(mean_expression_RQ1e2_allcontrasts_log,
rownames(mean_expression_RQ1e2_allcontrasts_log)%in%RQ1e2_24g8_24s16_16s8)
RQ1e2_24g8_24g16_16s8_exp = subset(mean_expression_RQ1e2_allcontrasts_log,
rownames(mean_expression_RQ1e2_allcontrasts_log)%in%RQ1e2_24g8_24g16_16s8)
RQ1e2_24s8_24g16_16s8_exp = subset(mean_expression_RQ1e2_allcontrasts_log,
rownames(mean_expression_RQ1e2_allcontrasts_log)%in%RQ1e2_24s8_24g16_16s8)
RQ1e2_24s8_24s16_16g8_exp = subset(mean_expression_RQ1e2_allcontrasts_log,
rownames(mean_expression_RQ1e2_allcontrasts_log)%in%RQ1e2_24s8_24s16_16g8)
RQ1e2_24s8_24g16_16g8_exp = subset(mean_expression_RQ1e2_allcontrasts_log,
rownames(mean_expression_RQ1e2_allcontrasts_log)%in%RQ1e2_24s8_24g16_16g8)
# core response genes
## 24>8, 24>16 & 16>8
RQ1e2_allcontrasts_core1 = subset(RQ1e2_24g8_24g16_16g8_exp, rownames(RQ1e2_24g8_24g16_16g8_exp)%in%RQ1e2_CoreResponse)
## 24<8, 24<16 & 16<8
RQ1e2_allcontrasts_core2 = subset(RQ1e2_24s8_24s16_16s8_exp, rownames(RQ1e2_24s8_24s16_16s8_exp)%in%RQ1e2_CoreResponse)
## 24>8, 24<16 & 16>8
RQ1e2_allcontrasts_core3 = subset(RQ1e2_24g8_24s16_16g8_exp, rownames(RQ1e2_24g8_24s16_16g8_exp)%in%RQ1e2_CoreResponse)
# combine all the clusters into a list
RQ1e2_cluster_list = list(RQ1e2_16g8,RQ1e2_16s8,RQ1e2_24g8,RQ1e2_24s8,
RQ1e2_24g16,RQ1e2_24s16,RQ1e2_24g16_24g8,
RQ1e2_24s16_24s8,RQ1e2_24g16_24s8,RQ1e2_24s16_24g8,
RQ1e2_24g8_16g8,RQ1e2_24s8_16s8,RQ1e2_24g8_16s8,
RQ1e2_24s8_16g8,RQ1e2_24g16_16g8,RQ1e2_24s16_16s8,
RQ1e2_24g16_16s8,RQ1e2_24s16_16g8,RQ1e2_24g8_24g16_16g8,
RQ1e2_24s8_24s16_16s8,RQ1e2_24g8_24s16_16g8,
RQ1e2_24g8_24s16_16s8,RQ1e2_24g8_24g16_16s8,
RQ1e2_24s8_24g16_16s8,RQ1e2_24s8_24s16_16g8,
RQ1e2_24s8_24g16_16g8)
names(RQ1e2_cluster_list) = c("RQ1e2 avg: 16>8","RQ1e2 avg: 16<8","RQ1e2 avg: 24>8",
"RQ1e2 avg: 24<8","RQ1e2 avg: 24>16","RQ1e2 avg: 24<16",
"RQ1e2 avg: 24>16 and 24>8","RQ1e2 avg: 24<16 and 24<8",
"RQ1e2 avg: 24>16 and 24<8","RQ1e2 avg: 24<16 and 24>8",
"RQ1e2 avg: 24>8 and 16>8","RQ1e2 avg: 24<8 and 16<8",
"RQ1e2 avg: 24>8 and 16<8","RQ1e2 avg: 24<8 and 16>8",
"RQ1e2 avg: 24>16 and 16>8","RQ1e2 avg: 24<16 and 16<8",
"RQ1e2 avg: 24>16 and 16<8","RQ1e2 avg: 24<16 and 16>8",
"RQ1e2 avg: 24>8 and 24>16 and 16>8","RQ1e2 avg: 24<8 and 24<16 and 16<8",
"RQ1e2 avg: 24>8 and 24<16 and 16>8","RQ1e2 avg: 24>8 and 24<16 and 16<8",
"RQ1e2 avg: 24>8 and 24>16 and 16<8","RQ1e2 avg: 24<8 and 24>16 and 16<8",
"RQ1e2 avg: 24<8 and 24<16 and 16>8","RQ1e2 avg: 24<8 and 24>16 and 16>8")
# create name list (necessary for downstream code)
names = c(RQ1e2_16g8,RQ1e2_16s8,RQ1e2_24g8,RQ1e2_24s8,RQ1e2_24g16,RQ1e2_24s16,
RQ1e2_24g16_24g8,RQ1e2_24s16_24s8,RQ1e2_24g16_24s8,RQ1e2_24s16_24g8,
RQ1e2_24g8_16g8,RQ1e2_24s8_16s8,RQ1e2_24g8_16s8,RQ1e2_24s8_16g8,
RQ1e2_24g16_16g8,RQ1e2_24s16_16s8,RQ1e2_24g16_16s8,RQ1e2_24s16_16g8,
RQ1e2_24g8_24g16_16g8,RQ1e2_24s8_24s16_16s8,RQ1e2_24g8_24s16_16g8,RQ1e2_24g8_24s16_16s8,
RQ1e2_24g8_24g16_16s8,RQ1e2_24s8_24g16_16s8,RQ1e2_24s8_24s16_16g8,RQ1e2_24s8_24g16_16g8)
# plot the results
## set figure dimensions
par(mfrow = c(3,7))
## plots (only including sets for which at least one gene was significant)
### 16>8
matplot(t(RQ1e2_16g8_exp),type="l",lty=1,col=1,
ylab="log average estimated expression",main="16>8",xaxt="n")
### 24>16
matplot(t(RQ1e2_24g16_exp),type="l",lty=1,col=1,
ylab=NA,main="24>16",xaxt="n")
### 16<8
matplot(t(RQ1e2_16s8_exp),type="l",lty=1,col=1,
ylab=NA,main="16<8",xaxt="n")
### 24<16
matplot(t(RQ1e2_24s16_exp),type="l",lty=1,col=1,
ylab=NA,main="24<16",xaxt="n")
### 24<16, 16>8
matplot(t(RQ1e2_24s16_16g8_exp),type="l",lty=1,col=1,
ylab=NA,main="24<16, 16>8",xaxt="n")
### 24>8, 24<16
matplot(t(RQ1e2_24s16_24g8_exp),type="l",lty=1,col=1,
ylab=NA,main="24>8, 24<16",xaxt="n")
### 24<8, 16>8
matplot(t(RQ1e2_24s8_16g8_exp),type="l",lty=1,col=1,
ylab=NA,main="24<8, 16>8",xaxt="n")
### 24>8
matplot(t(RQ1e2_24g8_exp),type="l",lty=1,col=1,
ylab="log average estimated expression",main="24>8",xaxt="n")
### 24>8, 24>16
matplot(t(RQ1e2_24g16_24g8_exp),type="l",lty=1,col=1,
ylab=NA,main="24>8, 24>16",xaxt="n")
### 24<8
matplot(t(RQ1e2_24s8_exp),type="l",lty=1,col=1,
ylab=NA,main="24<8",xaxt="n")
### 24<8, 24<16
matplot(t(RQ1e2_24s16_24s8_exp),type="l",lty=1,col=1,
ylab=NA,main="24<8, 24<16",xaxt="n")
### 24>8, 24<16 & 16>8
matplot(t(RQ1e2_24g8_24s16_16g8_exp),type="l",lty=1,col=1,
ylab=NA,main="24>8, 24<16, 16>8",xaxt="n")
matlines(t(RQ1e2_allcontrasts_core3), type = "l", lty = 1, lwd = 1,col="red")
### 24<8, 24<16 & 16>8
matplot(t(RQ1e2_24s8_24s16_16g8_exp),type="l",lty=1,col=1,
ylab=NA,main="24<8, 24<16, 16>8",xaxt="n")
### 24>16, 16<8
matplot(t(RQ1e2_24g16_16s8_exp),type="l",lty=1,col=1,
ylab=NA,main="24>16, 16<8",xaxt="n")
### 24>8, 16>8
matplot(t(RQ1e2_24g8_16g8_exp),type="l",lty=1,col=1,
ylab="log average estimated expression",main="24>8, 16>8",xaxt="n")
axis(1,at=c(1,2,3),labels=c("24ppt","16ppt","8ppt"),cex.axis=1)
### 24>8, 24>16 & 16>8
matplot(t(RQ1e2_24g8_24g16_16g8_exp),type="l",lty=1,col=1,
ylab=NA,main="24>8, 24>16, 16>8",xaxt="n")
matlines(t(RQ1e2_allcontrasts_core1), type = "l", lty = 1, lwd = 1,col="red")
axis(1,at=c(1,2,3),labels=c("24ppt","16ppt","8ppt"),cex.axis=1)
### 24<8, 16<8
matplot(t(RQ1e2_24s8_16s8_exp),type="l",lty=1,col=1,
ylab=NA,main="24<8, 16<8",xaxt="n")
axis(1,at=c(1,2,3),labels=c("24ppt","16ppt","8ppt"),cex.axis=1)
### 24<8, 24<16 & 16<8
matplot(t(RQ1e2_24s8_24s16_16s8_exp),type="l",lty=1,col=1,
ylab=NA,main="24<8, 24<16, 16<8",xaxt="n")
matlines(t(RQ1e2_allcontrasts_core2), type = "l", lty = 1, lwd = 1,col="red")
axis(1,at=c(1,2,3),labels=c("24ppt","16ppt","8ppt"),cex.axis=1)
### 24>8, 16<8
matplot(t(RQ1e2_24g8_16s8_exp),type="l",lty=1,col=1,
ylab=NA,main="24>8, 16<8",xaxt="n")
axis(1,at=c(1,2,3),labels=c("24ppt","16ppt","8ppt"),cex.axis=1)
### 24>8, 24>16 & 16<8
matplot(t(RQ1e2_24g8_24g16_16s8_exp),type="l",lty=1,col=1,
ylab=NA,main="24>8, 24>16, 16<8",xaxt="n")
axis(1,at=c(1,2,3),labels=c("24ppt","16ppt","8ppt"),cex.axis=1)
### 24<8, 24>16 & 16<8
matplot(t(RQ1e2_24s8_24g16_16s8_exp),type="l",lty=1,col=1,
ylab=NA,main="24<8, 24>16, 16<8",xaxt="n")
axis(1,at=c(1,2,3),labels=c("24ppt","16ppt","8ppt"),cex.axis=1)
In above figure, core response genes were indicated in red.
Next, we did the same thing for all genotypes separately, but without plotting the data.
Below, the code for genotype A is shown. This code was paralleled for the other genotypes (not shown). If you want to rerun the entire analysis in this document, you will have to apply below code for the other genotypes as well, because we used the final output of all genotypes for creating a data object that was used downstream.
# calculate the average expression per gene for each salinity treatment
genoA_mean_16ppt_RQ1e2 = as.matrix(apply(expression_RQ1e2[,c(1:3)], 1, mean))
genoA_mean_24ppt_RQ1e2 = as.matrix(apply(expression_RQ1e2[,c(4:6)], 1, mean))
genoA_mean_8ppt_RQ1e2 = as.matrix(apply(expression_RQ1e2[,c(7:9)], 1, mean))
# add column names to the matrices with the average values
colnames(genoA_mean_16ppt_RQ1e2) = c("16ppt")
colnames(genoA_mean_24ppt_RQ1e2) = c("24ppt")
colnames(genoA_mean_8ppt_RQ1e2) = c("8ppt")
# merge the data frames & take the log value of the expression
genoA_mean_expression_RQ1e2_log = as.data.frame(log(cbind(genoA_mean_24ppt_RQ1e2,
genoA_mean_16ppt_RQ1e2,
genoA_mean_8ppt_RQ1e2)+1))
# extract expression levels of subsets of genes
## take only genes significant in all contrasts
genoA_OnlySignificantGenes_RQ1e2_allcontrasts = c(rownames(subset(
OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`A8-A16` == 1 &
OnlySignGenes_RQ1e2_ConStage$`A8-A24` == 1 &
OnlySignGenes_RQ1e2_ConStage$`A16-A24` == 1)))
genoA_mean_expression_RQ1e2_allcontrasts_log = subset(genoA_mean_expression_RQ1e2_log,
rownames(genoA_mean_expression_RQ1e2_log)%in%genoA_OnlySignificantGenes_RQ1e2_allcontrasts)
## take only genes significant in 24-16 and 24-8
genoA_OnlySignificantGenes_RQ1e2_16vs24_8vs24 = c(rownames(subset(
OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`A8-A16` == 0 &
OnlySignGenes_RQ1e2_ConStage$`A8-A24` == 1 & OnlySignGenes_RQ1e2_ConStage$`A16-A24` == 1)))
genoA_mean_expression_RQ1e2_16vs24_8vs24_log = subset(genoA_mean_expression_RQ1e2_log,
rownames(genoA_mean_expression_RQ1e2_log)%in%genoA_OnlySignificantGenes_RQ1e2_16vs24_8vs24)
# take only genes significant in 24-16 and 16-8
genoA_OnlySignificantGenes_RQ1e2_16vs24_8vs16 = c(rownames(subset(
OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`A8-A16` == 1 &
OnlySignGenes_RQ1e2_ConStage$`A8-A24` == 0 & OnlySignGenes_RQ1e2_ConStage$`A16-A24` == 1)))
genoA_mean_expression_RQ1e2_16vs24_8vs16_log = subset(genoA_mean_expression_RQ1e2_log,
rownames(genoA_mean_expression_RQ1e2_log)%in%genoA_OnlySignificantGenes_RQ1e2_16vs24_8vs16)
# take only genes significant in 24-8 and 16-8
genoA_OnlySignificantGenes_RQ1e2_8vs24_8vs16 = c(rownames(subset(
OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`A8-A16` == 1 &
OnlySignGenes_RQ1e2_ConStage$`A8-A24` == 1 & OnlySignGenes_RQ1e2_ConStage$`A16-A24` == 0)))
genoA_mean_expression_RQ1e2_8vs24_8vs16_log = subset(genoA_mean_expression_RQ1e2_log,
rownames(genoA_mean_expression_RQ1e2_log)%in%genoA_OnlySignificantGenes_RQ1e2_8vs24_8vs16)
# take only genes significant in 24-16
genoA_OnlySignificantGenes_RQ1e2_16vs24 = c(rownames(subset(
OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`A8-A16` == 0 &
OnlySignGenes_RQ1e2_ConStage$`A8-A24` == 0 & OnlySignGenes_RQ1e2_ConStage$`A16-A24` == 1)))
genoA_mean_expression_RQ1e2_16vs24_log = subset(genoA_mean_expression_RQ1e2_log,
rownames(genoA_mean_expression_RQ1e2_log)%in%genoA_OnlySignificantGenes_RQ1e2_16vs24)
# take only genes significant in 24-8
genoA_OnlySignificantGenes_RQ1e2_8vs24 = c(rownames(subset(
OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`A8-A16` == 0 &
OnlySignGenes_RQ1e2_ConStage$`A8-A24` == 1 & OnlySignGenes_RQ1e2_ConStage$`A16-A24` == 0)))
genoA_mean_expression_RQ1e2_8vs24_log = subset(genoA_mean_expression_RQ1e2_log,
rownames(genoA_mean_expression_RQ1e2_log)%in%genoA_OnlySignificantGenes_RQ1e2_8vs24)
# take only genes significant in 16-8
genoA_OnlySignificantGenes_RQ1e2_8vs16 = c(rownames(subset(
OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`A8-A16` == 1 &
OnlySignGenes_RQ1e2_ConStage$`A8-A24` == 0 & OnlySignGenes_RQ1e2_ConStage$`A16-A24` == 0)))
genoA_mean_expression_RQ1e2_8vs16_log = subset(genoA_mean_expression_RQ1e2_log,
rownames(genoA_mean_expression_RQ1e2_log)%in%genoA_OnlySignificantGenes_RQ1e2_8vs16)
# take only genes that are not significant in the posthoc tests OR that are not selected in the average response
genoA_OnlySignGenes_RQ1e2_ConStage_average_names_posthoc = c(rownames(subset(
OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`A8-A16` == 0 &
OnlySignGenes_RQ1e2_ConStage$`A8-A24` == 0 & OnlySignGenes_RQ1e2_ConStage$`A16-A24` == 0)))
genoA_mean_expression_RQ1e2_posthoc_log = subset(genoA_mean_expression_RQ1e2_log,
rownames(genoA_mean_expression_RQ1e2_log)%in%genoA_OnlySignGenes_RQ1e2_ConStage_average_names_posthoc)
# define the clusters
## significant in one contrast
RQ1e2_genoA_16g8 = rownames(subset(genoA_mean_expression_RQ1e2_8vs16_log,
genoA_mean_expression_RQ1e2_8vs16_log$`16ppt` > genoA_mean_expression_RQ1e2_8vs16_log$`8ppt`))
RQ1e2_genoA_16s8 = rownames(subset(genoA_mean_expression_RQ1e2_8vs16_log,
genoA_mean_expression_RQ1e2_8vs16_log$`16ppt` < genoA_mean_expression_RQ1e2_8vs16_log$`8ppt`))
RQ1e2_genoA_24g8 = rownames(subset(genoA_mean_expression_RQ1e2_8vs24_log,
genoA_mean_expression_RQ1e2_8vs24_log$`24ppt` > genoA_mean_expression_RQ1e2_8vs24_log$`8ppt`))
RQ1e2_genoA_24s8 = rownames(subset(genoA_mean_expression_RQ1e2_8vs24_log,
genoA_mean_expression_RQ1e2_8vs24_log$`24ppt` < genoA_mean_expression_RQ1e2_8vs24_log$`8ppt`))
RQ1e2_genoA_24g16 = rownames(subset(genoA_mean_expression_RQ1e2_16vs24_log,
genoA_mean_expression_RQ1e2_16vs24_log$`24ppt` > genoA_mean_expression_RQ1e2_16vs24_log$`16ppt`))
RQ1e2_genoA_24s16 = rownames(subset(genoA_mean_expression_RQ1e2_16vs24_log,
genoA_mean_expression_RQ1e2_16vs24_log$`24ppt` < genoA_mean_expression_RQ1e2_16vs24_log$`16ppt`))
## significant in two contrasts
RQ1e2_genoA_24g16_24g8 = rownames(subset(genoA_mean_expression_RQ1e2_16vs24_8vs24_log,
genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` > genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`8ppt` &
genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` > genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`16ppt`))
RQ1e2_genoA_24s16_24s8 = rownames(subset(genoA_mean_expression_RQ1e2_16vs24_8vs24_log,
genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` < genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`8ppt` &
genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` < genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`16ppt`))
RQ1e2_genoA_24g16_24s8 = rownames(subset(genoA_mean_expression_RQ1e2_16vs24_8vs24_log,
genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` < genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`8ppt` &
genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` > genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`16ppt`))
RQ1e2_genoA_24s16_24g8 = rownames(subset(genoA_mean_expression_RQ1e2_16vs24_8vs24_log,
genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` > genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`8ppt` &
genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` < genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`16ppt`))
RQ1e2_genoA_24g8_16g8 = rownames(subset(genoA_mean_expression_RQ1e2_8vs24_8vs16_log,
genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`24ppt` > genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt` &
genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`16ppt` > genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt`))
RQ1e2_genoA_24s8_16s8 = rownames(subset(genoA_mean_expression_RQ1e2_8vs24_8vs16_log,
genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`24ppt` < genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt` &
genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`16ppt` < genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt`))
RQ1e2_genoA_24g8_16s8 = rownames(subset(genoA_mean_expression_RQ1e2_8vs24_8vs16_log,
genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`24ppt` > genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt` &
genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`16ppt` < genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt`))
RQ1e2_genoA_24s8_16g8 = rownames(subset(genoA_mean_expression_RQ1e2_8vs24_8vs16_log,
genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`24ppt` < genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt` &
genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`16ppt` > genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt`))
RQ1e2_genoA_24g16_16g8 = rownames(subset(genoA_mean_expression_RQ1e2_16vs24_8vs16_log,
genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`24ppt` > genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` &
genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` > genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`8ppt`))
RQ1e2_genoA_24s16_16s8 = rownames(subset(genoA_mean_expression_RQ1e2_16vs24_8vs16_log,
genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`24ppt` < genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` &
genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` < genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`8ppt`))
RQ1e2_genoA_24g16_16s8 = rownames(subset(genoA_mean_expression_RQ1e2_16vs24_8vs16_log,
genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`24ppt` > genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` &
genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` < genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`8ppt`))
RQ1e2_genoA_24s16_16g8 = rownames(subset(genoA_mean_expression_RQ1e2_16vs24_8vs16_log,
genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`24ppt` < genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` &
genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` > genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`8ppt`))
## significant in three contrasts
RQ1e2_genoA_24g8_24g16_16g8 = rownames(subset(genoA_mean_expression_RQ1e2_allcontrasts_log,
genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` > genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` > genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` > genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_genoA_24s8_24s16_16s8 = rownames(subset(genoA_mean_expression_RQ1e2_allcontrasts_log,
genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` < genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` < genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` < genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_genoA_24g8_24s16_16g8 = rownames(subset(genoA_mean_expression_RQ1e2_allcontrasts_log,
genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` > genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` < genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` > genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_genoA_24g8_24s16_16s8 = rownames(subset(genoA_mean_expression_RQ1e2_allcontrasts_log,
genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` > genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` < genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` < genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_genoA_24g8_24g16_16s8 = rownames(subset(genoA_mean_expression_RQ1e2_allcontrasts_log,
genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` > genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` > genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` < genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_genoA_24s8_24g16_16s8 = rownames(subset(genoA_mean_expression_RQ1e2_allcontrasts_log,
genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` < genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` > genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` < genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_genoA_24s8_24s16_16g8 = rownames(subset(genoA_mean_expression_RQ1e2_allcontrasts_log,
genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` < genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` < genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` > genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_genoA_24s8_24g16_16g8 = rownames(subset(genoA_mean_expression_RQ1e2_allcontrasts_log,
genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` < genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` > genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` > genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
# combine all the clusters into a list
RQ1e2_genoA_cluster_list = list(RQ1e2_genoA_16g8, RQ1e2_genoA_16s8, RQ1e2_genoA_24g8, RQ1e2_genoA_24s8,
RQ1e2_genoA_24g16, RQ1e2_genoA_24s16, RQ1e2_genoA_24g16_24g8, RQ1e2_genoA_24s16_24s8,
RQ1e2_genoA_24g16_24s8, RQ1e2_genoA_24s16_24g8, RQ1e2_genoA_24g8_16g8,
RQ1e2_genoA_24s8_16s8, RQ1e2_genoA_24g8_16s8, RQ1e2_genoA_24s8_16g8,
RQ1e2_genoA_24g16_16g8, RQ1e2_genoA_24s16_16s8, RQ1e2_genoA_24g16_16s8,
RQ1e2_genoA_24s16_16g8, RQ1e2_genoA_24g8_24g16_16g8, RQ1e2_genoA_24s8_24s16_16s8,
RQ1e2_genoA_24g8_24s16_16g8, RQ1e2_genoA_24g8_24s16_16s8, RQ1e2_genoA_24g8_24g16_16s8,
RQ1e2_genoA_24s8_24g16_16s8, RQ1e2_genoA_24s8_24s16_16g8, RQ1e2_genoA_24s8_24g16_16g8)
names(RQ1e2_genoA_cluster_list) = c("RQ1e2 genA: 16>8","RQ1e2 genA: 16<8","RQ1e2 genA: 24>8","RQ1e2 genA: 24<8",
"RQ1e2 genA: 24>16","RQ1e2 genA: 24<16","RQ1e2 genA: 24>16 and 24>8",
"RQ1e2 genA: 24<16 and 24<8","RQ1e2 genA: 24>16 and 24<8",
"RQ1e2 genA: 24<16 and 24>8","RQ1e2 genA: 24>8 and 16>8",
"RQ1e2 genA: 24<8 and 16<8","RQ1e2 genA: 24>8 and 16<8",
"RQ1e2 genA: 24<8 and 16>8","RQ1e2 genA: 24>16 and 16>8",
"RQ1e2 genA: 24<16 and 16<8","RQ1e2 genA: 24>16 and 16<8",
"RQ1e2 genA: 24<16 and 16>8","RQ1e2 genA: 24>8 and 24>16 and 16>8",
"RQ1e2 genA: 24<8 and 24<16 and 16<8","RQ1e2 genA: 24>8 and 24<16 and 16>8",
"RQ1e2 genA: 24>8 and 24<16 and 16<8","RQ1e2 genA: 24>8 and 24>16 and 16<8",
"RQ1e2 genA: 24<8 and 24>16 and 16<8","RQ1e2 genA: 24<8 and 24<16 and 16>8",
"RQ1e2 genA: 24<8 and 24>16 and 16>8")
#create name list
namesA = c(RQ1e2_genoA_16g8,RQ1e2_genoA_16s8,RQ1e2_genoA_24g8,RQ1e2_genoA_24s8,RQ1e2_genoA_24g16,
RQ1e2_genoA_24s16,RQ1e2_genoA_24g16_24g8,RQ1e2_genoA_24s16_24s8,RQ1e2_genoA_24g16_24s8,
RQ1e2_genoA_24s16_24g8,RQ1e2_genoA_24g8_16g8,RQ1e2_genoA_24s8_16s8,RQ1e2_genoA_24g8_16s8,
RQ1e2_genoA_24s8_16g8,RQ1e2_genoA_24g16_16g8,RQ1e2_genoA_24s16_16s8,RQ1e2_genoA_24g16_16s8,
RQ1e2_genoA_24s16_16g8,RQ1e2_genoA_24g8_24g16_16g8,RQ1e2_genoA_24s8_24s16_16s8,
RQ1e2_genoA_24g8_24s16_16g8,RQ1e2_genoA_24g8_24s16_16s8,RQ1e2_genoA_24g8_24g16_16s8,
RQ1e2_genoA_24s8_24g16_16s8,RQ1e2_genoA_24s8_24s16_16g8,RQ1e2_genoA_24s8_24g16_16g8)
After running above code for all genotypes, we will now combine cluster information of the average response and each genotype-specific response into one data object:
# create list of lists that gives cluster information for each gene for each genotype and the average response
RQ1e2_cluster_df = setNames(lapply(names,
function(x) names(which(sapply(RQ1e2_cluster_list,
function(y) x %in% y)))), names)
RQ1e2_genoA_cluster_df = setNames(lapply(namesA,
function(x) names(which(sapply(RQ1e2_genoA_cluster_list,
function(y) x %in% y)))), namesA)
RQ1e2_genoB_cluster_df = setNames(lapply(namesB,
function(x) names(which(sapply(RQ1e2_genoB_cluster_list,
function(y) x %in% y)))), namesB)
RQ1e2_genoD_cluster_df = setNames(lapply(namesD,
function(x) names(which(sapply(RQ1e2_genoD_cluster_list,
function(y) x %in% y)))), namesD)
RQ1e2_genoF_cluster_df = setNames(lapply(namesF,
function(x) names(which(sapply(RQ1e2_genoF_cluster_list,
function(y) x %in% y)))), namesF)
RQ1e2_genoI_cluster_df = setNames(lapply(namesI,
function(x) names(which(sapply(RQ1e2_genoI_cluster_list,
function(y) x %in% y)))), namesI)
RQ1e2_genoJ_cluster_df = setNames(lapply(namesJ,
function(x) names(which(sapply(RQ1e2_genoJ_cluster_list,
function(y) x %in% y)))), namesJ)
RQ1e2_genoK_cluster_df = setNames(lapply(namesK,
function(x) names(which(sapply(RQ1e2_genoK_cluster_list,
function(y) x %in% y)))), namesK)
RQ1e2_genoP_cluster_df = setNames(lapply(namesP,
function(x) names(which(sapply(RQ1e2_genoP_cluster_list,
function(y) x %in% y)))), namesP)
# combine the different lists
keys = unique(c(names(RQ1e2_cluster_df), names(RQ1e2_genoA_cluster_df), names(RQ1e2_genoB_cluster_df),
names(RQ1e2_genoD_cluster_df), names(RQ1e2_genoF_cluster_df),names(RQ1e2_genoI_cluster_df),
names(RQ1e2_genoJ_cluster_df), names(RQ1e2_genoK_cluster_df), names(RQ1e2_genoP_cluster_df)))
RQ1e2_ALL_cluster_df = setNames(mapply(c, RQ1e2_cluster_df[keys], RQ1e2_genoA_cluster_df[keys],
RQ1e2_genoB_cluster_df[keys], RQ1e2_genoD_cluster_df[keys],
RQ1e2_genoF_cluster_df[keys],RQ1e2_genoI_cluster_df[keys],
RQ1e2_genoJ_cluster_df[keys], RQ1e2_genoK_cluster_df[keys],
RQ1e2_genoP_cluster_df[keys]), keys)
# example output for one gene
RQ1e2_ALL_cluster_df$Sm_g00016729
## NULL
In this section, we performed GO enrichment using Fisher’s Exact test in TopGO. GO enrichment was done using the cluster information of the average response and each individual genotype which were obtained in section 3.3.7.
We started with importing the GO annotations of the S. marinoi genome. The file Skmarinoi8x3_GOannotation.txt contained the GO terms that were obtained in the InterProScan analysis.
# import GO information
geneID2GO = readMappings(file = "01.Skeletonema_marinoi_genome_v1.1.2/Smarinoi_Ref1.1.2_GOterms.txt")
geneUniverse = names(geneID2GO)
Next, we defined four clusters of genes with similar expression patterns, based on the cluster information in section 3.3.7 on the average response:
To create above categories, several clusters were combined together:
# define four sets of genes for GO enrichment
## genes that are upregulated in low salinities
RQ1e2_UpInLowSal = c(RQ1e2_16s8,RQ1e2_24s8,RQ1e2_24s16,RQ1e2_24s16_24s8,RQ1e2_24s8_16s8,RQ1e2_24s8_24s16_16s8)
length(RQ1e2_UpInLowSal)
## [1] 2637
## genes that are downregulated in low salinities
RQ1e2_UpInHighSal = c(RQ1e2_16g8,RQ1e2_24g8,RQ1e2_24g16,RQ1e2_24g16_24g8,RQ1e2_24g8_16g8,RQ1e2_24g8_24g16_16g8)
length(RQ1e2_UpInHighSal)
## [1] 2461
## genes that are upregulated in intermediate salinities
RQ1e2_24s16_16g8_c1 = c(RQ1e2_24s16_24g8,RQ1e2_24s8_16g8,RQ1e2_24s16_16g8,RQ1e2_24g8_24s16_16g8,RQ1e2_24s8_24s16_16g8)
length(RQ1e2_24s16_16g8_c1)
## [1] 100
## genes that are downregulated in intermediate salinities
RQ1e2_24g16_16s8_c2 = c(RQ1e2_24g8_16s8,RQ1e2_24g16_16s8,RQ1e2_24g8_24g16_16s8,RQ1e2_24s8_24g16_16s8)
length(RQ1e2_24g16_16s8_c2)
## [1] 87
Next, we performed GO enrichment on these four sets of genes:
# topGO: downregulated in low salinities
## create gene list for input in topGO
geneList_cluster_UpInHighSal = factor(as.integer(geneUniverse %in% RQ1e2_UpInHighSal))
names(geneList_cluster_UpInHighSal) = geneUniverse
str(geneList_cluster_UpInHighSal)
## create a topGO object (for biological process GOs)
GOdata_BP_cluster_UpInHighSal = new("topGOdata", ontology="BP", allGenes=geneList_cluster_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
## create a topGO object (for molecular function GOs)
GOdata_MF_cluster_UpInHighSal = new("topGOdata", ontology="MF", allGenes=geneList_cluster_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
## create a topGO object (for cellular component GOs)
GOdata_CC_cluster_UpInHighSal = new("topGOdata", ontology="CC", allGenes=geneList_cluster_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
## run Fisher's exact test
resultFisher_BP_cluster_UpInHighSal_elim = runTest(GOdata_BP_cluster_UpInHighSal,
algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_UpInHighSal_elim = runTest(GOdata_MF_cluster_UpInHighSal,
algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_UpInHighSal_elim = runTest(GOdata_CC_cluster_UpInHighSal,
algorithm = "elim", statistic = "fisher")
## extract the significant GO terms
allRes_BP_cluster_UpInHighSal_elim = GenTable(GOdata_BP_cluster_UpInHighSal,
classic = resultFisher_BP_cluster_UpInHighSal_elim,
orderBy = "elim", ranksOf = "elim",
topNodes = 45, numChar=1000)
allRes_MF_cluster_UpInHighSal_elim = GenTable(GOdata_MF_cluster_UpInHighSal,
classic = resultFisher_MF_cluster_UpInHighSal_elim,
orderBy = "elim", ranksOf = "elim",
topNodes = 50, numChar=1000)
allRes_CC_cluster_UpInHighSal_elim = GenTable(GOdata_CC_cluster_UpInHighSal,
classic = resultFisher_CC_cluster_UpInHighSal_elim,
orderBy = "elim", ranksOf = "elim",
topNodes = 20, numChar=1000)
# topGO: upregulated in low salinities
## create gene list for input in topGO
geneList_cluster_UpInLowSal = factor(as.integer(geneUniverse %in% RQ1e2_UpInLowSal))
names(geneList_cluster_UpInLowSal) = geneUniverse
str(geneList_cluster_UpInLowSal)
## create a topGO object (for biological process GOs)
GOdata_BP_cluster_UpInLowSal = new("topGOdata", ontology="BP", allGenes=geneList_cluster_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
## create a topGO object (for molecular function GOs)
GOdata_MF_cluster_UpInLowSal = new("topGOdata", ontology="MF", allGenes=geneList_cluster_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
## create a topGO object (for cellular component GOs)
GOdata_CC_cluster_UpInLowSal = new("topGOdata", ontology="CC", allGenes=geneList_cluster_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
## run Fisher's exact test
resultFisher_BP_cluster_UpInLowSal_elim = runTest(GOdata_BP_cluster_UpInLowSal,
algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_UpInLowSal_elim = runTest(GOdata_MF_cluster_UpInLowSal,
algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_UpInLowSal_elim = runTest(GOdata_CC_cluster_UpInLowSal,
algorithm = "elim", statistic = "fisher")
## extract the significant GO terms
allRes_BP_cluster_UpInLowSal_elim = GenTable(GOdata_BP_cluster_UpInLowSal,
classic = resultFisher_BP_cluster_UpInLowSal_elim,
orderBy = "elim", ranksOf = "elim",
topNodes = 110, numChar=1000)
allRes_MF_cluster_UpInLowSal_elim = GenTable(GOdata_MF_cluster_UpInLowSal,
classic = resultFisher_MF_cluster_UpInLowSal_elim,
orderBy = "elim", ranksOf = "elim",
topNodes = 75, numChar=1000)
allRes_CC_cluster_UpInLowSal_elim = GenTable(GOdata_CC_cluster_UpInLowSal,
classic = resultFisher_CC_cluster_UpInLowSal_elim,
orderBy = "elim", ranksOf = "elim",
topNodes = 20, numChar=1000)
# topGO: upregulated in intermediate salinities
## create gene list for input in topGO
geneList_cluster_UpInMedSal = factor(as.integer(geneUniverse %in% RQ1e2_24s16_16g8_c1))
names(geneList_cluster_UpInMedSal) = geneUniverse
str(geneList_cluster_UpInMedSal)
## create a topGO object (for biological process GOs)
GOdata_BP_cluster_UpInMedSal = new("topGOdata", ontology="BP", allGenes=geneList_cluster_UpInMedSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
## create a topGO object (for molecular function GOs)
GOdata_MF_cluster_UpInMedSal = new("topGOdata", ontology="MF", allGenes=geneList_cluster_UpInMedSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
## create a topGO object (for cellular component GOs)
GOdata_CC_cluster_UpInMedSal = new("topGOdata", ontology="CC", allGenes=geneList_cluster_UpInMedSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
## run Fisher's exact test
resultFisher_BP_cluster_UpInMedSal_elim = runTest(GOdata_BP_cluster_UpInMedSal,
algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_UpInMedSal_elim = runTest(GOdata_MF_cluster_UpInMedSal,
algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_UpInMedSal_elim = runTest(GOdata_CC_cluster_UpInMedSal,
algorithm = "elim", statistic = "fisher")
## extract the significant GO terms
allRes_BP_cluster_UpInMedSal_elim = GenTable(GOdata_BP_cluster_UpInMedSal,
lassic = resultFisher_BP_cluster_UpInMedSal_elim,
orderBy = "elim", ranksOf = "elim",
topNodes = 10, numChar=1000)
allRes_MF_cluster_UpInMedSal_elim = GenTable(GOdata_MF_cluster_UpInMedSal,
classic = resultFisher_MF_cluster_UpInMedSal_elim,
orderBy = "elim", ranksOf = "elim",
topNodes = 10, numChar=1000)
allRes_CC_cluster_UpInMedSal_elim = GenTable(GOdata_CC_cluster_UpInMedSal,
classic = resultFisher_CC_cluster_UpInMedSal_elim,
orderBy = "elim", ranksOf = "elim",
topNodes = 10, numChar=1000)
# topGO: downregulated in intermediate salinities
## create gene list for input in topGO
geneList_cluster_DownInMedSal = factor(as.integer(geneUniverse %in% RQ1e2_24g16_16s8_c2 ))
names(geneList_cluster_DownInMedSal) = geneUniverse
str(geneList_cluster_DownInMedSal)
## create a topGO object (for biological process GOs)
GOdata_BP_cluster_DownInMedSal = new("topGOdata", ontology="BP", allGenes=geneList_cluster_DownInMedSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
## create a topGO object (for molecular function GOs)
GOdata_MF_cluster_DownInMedSal = new("topGOdata", ontology="MF", allGenes=geneList_cluster_DownInMedSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
## create a topGO object (for cellular component GOs)
GOdata_CC_cluster_DownInMedSal = new("topGOdata", ontology="CC", allGenes=geneList_cluster_DownInMedSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
## run Fisher's exact test
resultFisher_BP_cluster_DownInMedSal_elim = runTest(GOdata_BP_cluster_DownInMedSal,
algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_DownInMedSal_elim = runTest(GOdata_MF_cluster_DownInMedSal,
algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_DownInMedSal_elim = runTest(GOdata_CC_cluster_DownInMedSal,
algorithm = "elim", statistic = "fisher")
## extract the significant GO terms
allRes_BP_cluster_DownInMedSal_elim = GenTable(GOdata_BP_cluster_DownInMedSal,
classic = resultFisher_BP_cluster_DownInMedSal_elim,
orderBy = "elim", ranksOf = "elim",
topNodes = 20,numChar=1000)
allRes_MF_cluster_DownInMedSal_elim = GenTable(GOdata_MF_cluster_DownInMedSal,
classic = resultFisher_MF_cluster_DownInMedSal_elim,
orderBy = "elim", ranksOf = "elim",
topNodes = 25, numChar=1000)
allRes_CC_cluster_DownInMedSal_elim = GenTable(GOdata_CC_cluster_DownInMedSal,
classic = resultFisher_CC_cluster_DownInMedSal_elim,
orderBy = "elim", ranksOf = "elim",
topNodes = 10, numChar=1000)
In a next step, we reduced the list with significant GO terms using the online application REVIGO. For REVIGO, we used the output of the Fisher’s Exact test (only including the GO terms that had a P-value <= 0.05) and used a 0.5 similarity threshold with the SimRel algorithm. P-values from the Fisher’s Exact Test were included in the input to REVIGO.
REVIGO for the above analyses was accessed on July 6th 2021, and used the Gene Ontology database of May 1st 2021 and the UniProt-to-GO mapping database from April 9th 2021.
We repeated above analyses for each genotype separately (code only shown for genotype A). If you want to rerun the entire analysis in this document, you will have to apply below code for the other genotypes as well, because we will use the final output for creating a data object that will be used downstream.
# define clusters to test
RQ1e2_genoA_UpInLowSal = c(RQ1e2_genoA_16s8,RQ1e2_genoA_24s8,RQ1e2_genoA_24s16,RQ1e2_genoA_24s16_24s8,
RQ1e2_genoA_24s8_16s8,RQ1e2_genoA_24s8_24s16_16s8)
RQ1e2_genoA_UpInHighSal = c(RQ1e2_genoA_16g8,RQ1e2_genoA_24g8,RQ1e2_genoA_24g16,RQ1e2_genoA_24g16_24g8,
RQ1e2_genoA_24g8_16g8,RQ1e2_genoA_24g8_24g16_16g8)
# genotype A: upregulated in high salinities
## create gene list for input in topGO
geneList_cluster_genoA_UpInHighSal = factor(as.integer(geneUniverse %in% RQ1e2_genoA_UpInHighSal))
names(geneList_cluster_genoA_UpInHighSal) = geneUniverse
str(geneList_cluster_genoA_UpInHighSal)
## create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoA_UpInHighSal = new("topGOdata", ontology="BP",
allGenes=geneList_cluster_genoA_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
## create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoA_UpInHighSal = new("topGOdata", ontology="MF",
allGenes=geneList_cluster_genoA_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
## create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoA_UpInHighSal = new("topGOdata", ontology="CC",
allGenes=geneList_cluster_genoA_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
## run Fisher's exact test
resultFisher_BP_cluster_genoA_UpInHighSal = runTest(GOdata_BP_cluster_genoA_UpInHighSal,
algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoA_UpInHighSal = runTest(GOdata_MF_cluster_genoA_UpInHighSal,
algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoA_UpInHighSal = runTest(GOdata_CC_cluster_genoA_UpInHighSal,
algorithm = "elim", statistic = "fisher")
## extract the significant GO terms
allRes_BP_cluster_genoA_UpInHighSal = GenTable(GOdata_BP_cluster_genoA_UpInHighSal,
classic = resultFisher_BP_cluster_genoA_UpInHighSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoA_UpInHighSal = GenTable(GOdata_MF_cluster_genoA_UpInHighSal,
classic = resultFisher_MF_cluster_genoA_UpInHighSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoA_UpInHighSal = GenTable(GOdata_CC_cluster_genoA_UpInHighSal,
classic =resultFisher_CC_cluster_genoA_UpInHighSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
# genotype A: upregulated in low salinities
## create gene list for input in topGO
geneList_cluster_genoA_UpInLowSal = factor(as.integer(geneUniverse %in% RQ1e2_genoA_UpInLowSal))
names(geneList_cluster_genoA_UpInLowSal) = geneUniverse
str(geneList_cluster_genoA_UpInLowSal)
## create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoA_UpInLowSal = new("topGOdata", ontology="BP", allGenes=geneList_cluster_genoA_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
## create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoA_UpInLowSal = new("topGOdata", ontology="MF", allGenes=geneList_cluster_genoA_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
# #create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoA_UpInLowSal = new("topGOdata", ontology="CC", allGenes=geneList_cluster_genoA_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
## run Fisher's exact test
resultFisher_BP_cluster_genoA_UpInLowSal = runTest(GOdata_BP_cluster_genoA_UpInLowSal,
algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoA_UpInLowSal = runTest(GOdata_MF_cluster_genoA_UpInLowSal,
algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoA_UpInLowSal = runTest(GOdata_CC_cluster_genoA_UpInLowSal,
algorithm = "elim", statistic = "fisher")
## extract the significant GO terms
allRes_BP_cluster_genoA_UpInLowSal = GenTable(GOdata_BP_cluster_genoA_UpInLowSal,
classic = resultFisher_BP_cluster_genoA_UpInLowSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoA_UpInLowSal = GenTable(GOdata_MF_cluster_genoA_UpInLowSal,
classic = resultFisher_MF_cluster_genoA_UpInLowSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoA_UpInLowSal = GenTable(GOdata_CC_cluster_genoA_UpInLowSal,
classic = resultFisher_CC_cluster_genoA_UpInLowSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
###genotype B
RQ1e2_genoB_UpInLowSal = c(RQ1e2_genoB_16s8,RQ1e2_genoB_24s8,RQ1e2_genoB_24s16,RQ1e2_genoB_24s16_24s8,RQ1e2_genoB_24s8_16s8,RQ1e2_genoB_24s8_24s16_16s8)
RQ1e2_genoB_UpInHighSal = c(RQ1e2_genoB_16g8,RQ1e2_genoB_24g8,RQ1e2_genoB_24g16,RQ1e2_genoB_24g16_24g8,RQ1e2_genoB_24g8_16g8,RQ1e2_genoB_24g8_24g16_16g8)
###genotype D
RQ1e2_genoD_UpInLowSal = c(RQ1e2_genoD_16s8,RQ1e2_genoD_24s8,RQ1e2_genoD_24s16,RQ1e2_genoD_24s16_24s8,RQ1e2_genoD_24s8_16s8,RQ1e2_genoD_24s8_24s16_16s8)
RQ1e2_genoD_UpInHighSal = c(RQ1e2_genoD_16g8,RQ1e2_genoD_24g8,RQ1e2_genoD_24g16,RQ1e2_genoD_24g16_24g8,RQ1e2_genoD_24g8_16g8,RQ1e2_genoD_24g8_24g16_16g8)
###genotype F
RQ1e2_genoF_UpInLowSal = c(RQ1e2_genoF_16s8,RQ1e2_genoF_24s8,RQ1e2_genoF_24s16,RQ1e2_genoF_24s16_24s8,RQ1e2_genoF_24s8_16s8,RQ1e2_genoF_24s8_24s16_16s8)
RQ1e2_genoF_UpInHighSal = c(RQ1e2_genoF_16g8,RQ1e2_genoF_24g8,RQ1e2_genoF_24g16,RQ1e2_genoF_24g16_24g8,RQ1e2_genoF_24g8_16g8,RQ1e2_genoF_24g8_24g16_16g8)
###genotype I
RQ1e2_genoI_UpInLowSal = c(RQ1e2_genoI_16s8,RQ1e2_genoI_24s8,RQ1e2_genoI_24s16,RQ1e2_genoI_24s16_24s8,RQ1e2_genoI_24s8_16s8,RQ1e2_genoI_24s8_24s16_16s8)
RQ1e2_genoI_UpInHighSal = c(RQ1e2_genoI_16g8,RQ1e2_genoI_24g8,RQ1e2_genoI_24g16,RQ1e2_genoI_24g16_24g8,RQ1e2_genoI_24g8_16g8,RQ1e2_genoI_24g8_24g16_16g8)
###genotype J
RQ1e2_genoJ_UpInLowSal = c(RQ1e2_genoJ_16s8,RQ1e2_genoJ_24s8,RQ1e2_genoJ_24s16,RQ1e2_genoJ_24s16_24s8,RQ1e2_genoJ_24s8_16s8,RQ1e2_genoJ_24s8_24s16_16s8)
RQ1e2_genoJ_UpInHighSal = c(RQ1e2_genoJ_16g8,RQ1e2_genoJ_24g8,RQ1e2_genoJ_24g16,RQ1e2_genoJ_24g16_24g8,RQ1e2_genoJ_24g8_16g8,RQ1e2_genoJ_24g8_24g16_16g8)
length(RQ1e2_genoJ_UpInLowSal)
length(RQ1e2_genoJ_UpInHighSal)
###genotype K
RQ1e2_genoK_UpInLowSal = c(RQ1e2_genoK_16s8,RQ1e2_genoK_24s8,RQ1e2_genoK_24s16,RQ1e2_genoK_24s16_24s8,RQ1e2_genoK_24s8_16s8,RQ1e2_genoK_24s8_24s16_16s8)
RQ1e2_genoK_UpInHighSal = c(RQ1e2_genoK_16g8,RQ1e2_genoK_24g8,RQ1e2_genoK_24g16,RQ1e2_genoK_24g16_24g8,RQ1e2_genoK_24g8_16g8,RQ1e2_genoK_24g8_24g16_16g8)
###genotype P
RQ1e2_genoP_UpInLowSal = c(RQ1e2_genoP_16s8,RQ1e2_genoP_24s8,RQ1e2_genoP_24s16,RQ1e2_genoP_24s16_24s8,RQ1e2_genoP_24s8_16s8,RQ1e2_genoP_24s8_24s16_16s8)
RQ1e2_genoP_UpInHighSal = c(RQ1e2_genoP_16g8,RQ1e2_genoP_24g8,RQ1e2_genoP_24g16,RQ1e2_genoP_24g16_24g8,RQ1e2_genoP_24g8_16g8,RQ1e2_genoP_24g8_24g16_16g8)
###genotype B Up in High Salinities
####create gene list for input in topGO
geneList_cluster_genoB_UpInHighSal = factor(as.integer(geneUniverse %in% RQ1e2_genoB_UpInHighSal))
names(geneList_cluster_genoB_UpInHighSal) = geneUniverse
str(geneList_cluster_genoB_UpInHighSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoB_UpInHighSal = new("topGOdata", ontology="BP",
allGenes=geneList_cluster_genoB_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoB_UpInHighSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoB_UpInHighSal = new("topGOdata", ontology="MF",
allGenes=geneList_cluster_genoB_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoB_UpInHighSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoB_UpInHighSal = new("topGOdata", ontology="CC",
allGenes=geneList_cluster_genoB_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoB_UpInHighSal
####run Fisher's exact test
resultFisher_BP_cluster_genoB_UpInHighSal = runTest(GOdata_BP_cluster_genoB_UpInHighSal,
algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoB_UpInHighSal
resultFisher_MF_cluster_genoB_UpInHighSal = runTest(GOdata_MF_cluster_genoB_UpInHighSal,
algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoB_UpInHighSal
resultFisher_CC_cluster_genoB_UpInHighSal = runTest(GOdata_CC_cluster_genoB_UpInHighSal,
algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoB_UpInHighSal
####extract the significant GO terms
allRes_BP_cluster_genoB_UpInHighSal = GenTable(GOdata_BP_cluster_genoB_UpInHighSal,
classic = resultFisher_BP_cluster_genoB_UpInHighSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoB_UpInHighSal
allRes_MF_cluster_genoB_UpInHighSal = GenTable(GOdata_MF_cluster_genoB_UpInHighSal,
classic = resultFisher_MF_cluster_genoB_UpInHighSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoB_UpInHighSal
allRes_CC_cluster_genoB_UpInHighSal = GenTable(GOdata_CC_cluster_genoB_UpInHighSal,
classic = resultFisher_CC_cluster_genoB_UpInHighSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoB_UpInHighSal
###genotype B Up in Low Salinities
####create gene list for input in topGO
geneList_cluster_genoB_UpInLowSal = factor(as.integer(geneUniverse %in% RQ1e2_genoB_UpInLowSal))
names(geneList_cluster_genoB_UpInLowSal) = geneUniverse
str(geneList_cluster_genoB_UpInLowSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoB_UpInLowSal = new("topGOdata", ontology="BP",
allGenes=geneList_cluster_genoB_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoB_UpInLowSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoB_UpInLowSal = new("topGOdata", ontology="MF",
allGenes=geneList_cluster_genoB_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoB_UpInLowSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoB_UpInLowSal = new("topGOdata", ontology="CC",
allGenes=geneList_cluster_genoB_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoB_UpInLowSal
####run Fisher's exact test
resultFisher_BP_cluster_genoB_UpInLowSal = runTest(GOdata_BP_cluster_genoB_UpInLowSal,
algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoB_UpInLowSal
resultFisher_MF_cluster_genoB_UpInLowSal = runTest(GOdata_MF_cluster_genoB_UpInLowSal,
algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoB_UpInLowSal
resultFisher_CC_cluster_genoB_UpInLowSal = runTest(GOdata_CC_cluster_genoB_UpInLowSal,
algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoB_UpInLowSal
####extract the significant GO terms
allRes_BP_cluster_genoB_UpInLowSal = GenTable(GOdata_BP_cluster_genoB_UpInLowSal,
classic = resultFisher_BP_cluster_genoB_UpInLowSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoB_UpInLowSal
allRes_MF_cluster_genoB_UpInLowSal = GenTable(GOdata_MF_cluster_genoB_UpInLowSal,
classic = resultFisher_MF_cluster_genoB_UpInLowSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoB_UpInLowSal
allRes_CC_cluster_genoB_UpInLowSal = GenTable(GOdata_CC_cluster_genoB_UpInLowSal,
classic = resultFisher_CC_cluster_genoB_UpInLowSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoB_UpInLowSal
###genotype D Up in High Salinities
####create gene list for input in topGO
geneList_cluster_genoD_UpInHighSal = factor(as.integer(geneUniverse %in% RQ1e2_genoD_UpInHighSal))
names(geneList_cluster_genoD_UpInHighSal) = geneUniverse
str(geneList_cluster_genoD_UpInHighSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoD_UpInHighSal = new("topGOdata", ontology="BP",
allGenes=geneList_cluster_genoD_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoD_UpInHighSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoD_UpInHighSal = new("topGOdata", ontology="MF",
allGenes=geneList_cluster_genoD_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoD_UpInHighSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoD_UpInHighSal = new("topGOdata", ontology="CC",
allGenes=geneList_cluster_genoD_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoD_UpInHighSal
####run Fisher's exact test
resultFisher_BP_cluster_genoD_UpInHighSal = runTest(GOdata_BP_cluster_genoD_UpInHighSal,
algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoD_UpInHighSal
resultFisher_MF_cluster_genoD_UpInHighSal = runTest(GOdata_MF_cluster_genoD_UpInHighSal,
algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoD_UpInHighSal
resultFisher_CC_cluster_genoD_UpInHighSal = runTest(GOdata_CC_cluster_genoD_UpInHighSal,
algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoD_UpInHighSal
####extract the significant GO terms
allRes_BP_cluster_genoD_UpInHighSal = GenTable(GOdata_BP_cluster_genoD_UpInHighSal,
classic = resultFisher_BP_cluster_genoD_UpInHighSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoD_UpInHighSal
allRes_MF_cluster_genoD_UpInHighSal = GenTable(GOdata_MF_cluster_genoD_UpInHighSal,
classic = resultFisher_MF_cluster_genoD_UpInHighSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoD_UpInHighSal
allRes_CC_cluster_genoD_UpInHighSal = GenTable(GOdata_CC_cluster_genoD_UpInHighSal,
classic = resultFisher_CC_cluster_genoD_UpInHighSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoD_UpInHighSal
###genotype D Up in Low Salinities
####create gene list for input in topGO
geneList_cluster_genoD_UpInLowSal = factor(as.integer(geneUniverse %in% RQ1e2_genoD_UpInLowSal))
names(geneList_cluster_genoD_UpInLowSal) = geneUniverse
str(geneList_cluster_genoD_UpInLowSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoD_UpInLowSal = new("topGOdata", ontology="BP",
allGenes=geneList_cluster_genoD_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoD_UpInLowSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoD_UpInLowSal = new("topGOdata", ontology="MF",
allGenes=geneList_cluster_genoD_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoD_UpInLowSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoD_UpInLowSal = new("topGOdata", ontology="CC",
allGenes=geneList_cluster_genoD_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoD_UpInLowSal
####run Fisher's exact test
resultFisher_BP_cluster_genoD_UpInLowSal = runTest(GOdata_BP_cluster_genoD_UpInLowSal,
algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoD_UpInLowSal
resultFisher_MF_cluster_genoD_UpInLowSal = runTest(GOdata_MF_cluster_genoD_UpInLowSal,
algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoD_UpInLowSal
resultFisher_CC_cluster_genoD_UpInLowSal = runTest(GOdata_CC_cluster_genoD_UpInLowSal,
algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoD_UpInLowSal
####extract the significant GO terms
allRes_BP_cluster_genoD_UpInLowSal = GenTable(GOdata_BP_cluster_genoD_UpInLowSal,
classic = resultFisher_BP_cluster_genoD_UpInLowSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoD_UpInLowSal
allRes_MF_cluster_genoD_UpInLowSal = GenTable(GOdata_MF_cluster_genoD_UpInLowSal,
classic = resultFisher_MF_cluster_genoD_UpInLowSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoD_UpInLowSal
allRes_CC_cluster_genoD_UpInLowSal = GenTable(GOdata_CC_cluster_genoD_UpInLowSal,
classic = resultFisher_CC_cluster_genoD_UpInLowSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoD_UpInLowSal
###genotype F Up in High Salinities
####create gene list for input in topGO
geneList_cluster_genoF_UpInHighSal = factor(as.integer(geneUniverse %in% RQ1e2_genoF_UpInHighSal))
names(geneList_cluster_genoF_UpInHighSal) = geneUniverse
str(geneList_cluster_genoF_UpInHighSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoF_UpInHighSal = new("topGOdata", ontology="BP",
allGenes=geneList_cluster_genoF_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoF_UpInHighSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoF_UpInHighSal = new("topGOdata", ontology="MF",
allGenes=geneList_cluster_genoF_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoF_UpInHighSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoF_UpInHighSal = new("topGOdata", ontology="CC",
allGenes=geneList_cluster_genoF_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoF_UpInHighSal
####run Fisher's exact test
resultFisher_BP_cluster_genoF_UpInHighSal = runTest(GOdata_BP_cluster_genoF_UpInHighSal,
algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoF_UpInHighSal
resultFisher_MF_cluster_genoF_UpInHighSal = runTest(GOdata_MF_cluster_genoF_UpInHighSal,
algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoF_UpInHighSal
resultFisher_CC_cluster_genoF_UpInHighSal = runTest(GOdata_CC_cluster_genoF_UpInHighSal,
algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoF_UpInHighSal
####extract the significant GO terms
allRes_BP_cluster_genoF_UpInHighSal = GenTable(GOdata_BP_cluster_genoF_UpInHighSal,
classic = resultFisher_BP_cluster_genoF_UpInHighSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoF_UpInHighSal
allRes_MF_cluster_genoF_UpInHighSal = GenTable(GOdata_MF_cluster_genoF_UpInHighSal,
classic = resultFisher_MF_cluster_genoF_UpInHighSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoF_UpInHighSal
allRes_CC_cluster_genoF_UpInHighSal = GenTable(GOdata_CC_cluster_genoF_UpInHighSal,
classic = resultFisher_CC_cluster_genoF_UpInHighSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoF_UpInHighSal
###genotype F Up in Low Salinities
####create gene list for input in topGO
geneList_cluster_genoF_UpInLowSal = factor(as.integer(geneUniverse %in% RQ1e2_genoF_UpInLowSal))
names(geneList_cluster_genoF_UpInLowSal) = geneUniverse
str(geneList_cluster_genoF_UpInLowSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoF_UpInLowSal = new("topGOdata", ontology="BP",
allGenes=geneList_cluster_genoF_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoF_UpInLowSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoF_UpInLowSal = new("topGOdata", ontology="MF",
allGenes=geneList_cluster_genoF_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoF_UpInLowSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoF_UpInLowSal = new("topGOdata", ontology="CC",
allGenes=geneList_cluster_genoF_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoF_UpInLowSal
####run Fisher's exact test
resultFisher_BP_cluster_genoF_UpInLowSal = runTest(GOdata_BP_cluster_genoF_UpInLowSal,
algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoF_UpInLowSal
resultFisher_MF_cluster_genoF_UpInLowSal = runTest(GOdata_MF_cluster_genoF_UpInLowSal,
algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoF_UpInLowSal
resultFisher_CC_cluster_genoF_UpInLowSal = runTest(GOdata_CC_cluster_genoF_UpInLowSal,
algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoF_UpInLowSal
####extract the significant GO terms
allRes_BP_cluster_genoF_UpInLowSal = GenTable(GOdata_BP_cluster_genoF_UpInLowSal,
classic = resultFisher_BP_cluster_genoF_UpInLowSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoF_UpInLowSal
allRes_MF_cluster_genoF_UpInLowSal = GenTable(GOdata_MF_cluster_genoF_UpInLowSal,
classic = resultFisher_MF_cluster_genoF_UpInLowSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoF_UpInLowSal
allRes_CC_cluster_genoF_UpInLowSal = GenTable(GOdata_CC_cluster_genoF_UpInLowSal,
classic = resultFisher_CC_cluster_genoF_UpInLowSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoF_UpInLowSal
###genotype I Up in High Salinities
####create gene list for input in topGO
geneList_cluster_genoI_UpInHighSal = factor(as.integer(geneUniverse %in% RQ1e2_genoI_UpInHighSal))
names(geneList_cluster_genoI_UpInHighSal) = geneUniverse
str(geneList_cluster_genoI_UpInHighSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoI_UpInHighSal = new("topGOdata", ontology="BP",
allGenes=geneList_cluster_genoI_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoI_UpInHighSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoI_UpInHighSal = new("topGOdata", ontology="MF",
allGenes=geneList_cluster_genoI_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoI_UpInHighSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoI_UpInHighSal = new("topGOdata", ontology="CC",
allGenes=geneList_cluster_genoI_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoI_UpInHighSal
####run Fisher's exact test
resultFisher_BP_cluster_genoI_UpInHighSal = runTest(GOdata_BP_cluster_genoI_UpInHighSal,
algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoI_UpInHighSal
resultFisher_MF_cluster_genoI_UpInHighSal = runTest(GOdata_MF_cluster_genoI_UpInHighSal,
algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoI_UpInHighSal
resultFisher_CC_cluster_genoI_UpInHighSal = runTest(GOdata_CC_cluster_genoI_UpInHighSal,
algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoI_UpInHighSal
####extract the significant GO terms
allRes_BP_cluster_genoI_UpInHighSal = GenTable(GOdata_BP_cluster_genoI_UpInHighSal,
classic = resultFisher_BP_cluster_genoI_UpInHighSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoI_UpInHighSal
allRes_MF_cluster_genoI_UpInHighSal = GenTable(GOdata_MF_cluster_genoI_UpInHighSal,
classic = resultFisher_MF_cluster_genoI_UpInHighSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoI_UpInHighSal
allRes_CC_cluster_genoI_UpInHighSal = GenTable(GOdata_CC_cluster_genoI_UpInHighSal,
classic = resultFisher_CC_cluster_genoI_UpInHighSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoI_UpInHighSal
###genotype I Up in Low Salinities
####create gene list for input in topGO
geneList_cluster_genoI_UpInLowSal = factor(as.integer(geneUniverse %in% RQ1e2_genoI_UpInLowSal))
names(geneList_cluster_genoI_UpInLowSal) = geneUniverse
str(geneList_cluster_genoI_UpInLowSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoI_UpInLowSal = new("topGOdata", ontology="BP",
allGenes=geneList_cluster_genoI_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoI_UpInLowSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoI_UpInLowSal = new("topGOdata", ontology="MF",
allGenes=geneList_cluster_genoI_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoI_UpInLowSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoI_UpInLowSal = new("topGOdata", ontology="CC",
allGenes=geneList_cluster_genoI_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoI_UpInLowSal
####run Fisher's exact test
resultFisher_BP_cluster_genoI_UpInLowSal = runTest(GOdata_BP_cluster_genoI_UpInLowSal,
algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoI_UpInLowSal
resultFisher_MF_cluster_genoI_UpInLowSal = runTest(GOdata_MF_cluster_genoI_UpInLowSal,
algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoI_UpInLowSal
resultFisher_CC_cluster_genoI_UpInLowSal = runTest(GOdata_CC_cluster_genoI_UpInLowSal,
algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoI_UpInLowSal
####extract the significant GO terms
allRes_BP_cluster_genoI_UpInLowSal = GenTable(GOdata_BP_cluster_genoI_UpInLowSal,
classic = resultFisher_BP_cluster_genoI_UpInLowSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoI_UpInLowSal
allRes_MF_cluster_genoI_UpInLowSal = GenTable(GOdata_MF_cluster_genoI_UpInLowSal,
classic = resultFisher_MF_cluster_genoI_UpInLowSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoI_UpInLowSal
allRes_CC_cluster_genoI_UpInLowSal = GenTable(GOdata_CC_cluster_genoI_UpInLowSal,
classic = resultFisher_CC_cluster_genoI_UpInLowSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoI_UpInLowSal
###genotype J Up in High Salinities
####create gene list for input in topGO
geneList_cluster_genoJ_UpInHighSal = factor(as.integer(geneUniverse %in% RQ1e2_genoJ_UpInHighSal))
names(geneList_cluster_genoJ_UpInHighSal) = geneUniverse
str(geneList_cluster_genoJ_UpInHighSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoJ_UpInHighSal = new("topGOdata", ontology="BP",
allGenes=geneList_cluster_genoJ_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoJ_UpInHighSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoJ_UpInHighSal = new("topGOdata", ontology="MF",
allGenes=geneList_cluster_genoJ_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoJ_UpInHighSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoJ_UpInHighSal = new("topGOdata", ontology="CC",
allGenes=geneList_cluster_genoJ_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoJ_UpInHighSal
####run Fisher's exact test
resultFisher_BP_cluster_genoJ_UpInHighSal = runTest(GOdata_BP_cluster_genoJ_UpInHighSal,
algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoJ_UpInHighSal
resultFisher_MF_cluster_genoJ_UpInHighSal = runTest(GOdata_MF_cluster_genoJ_UpInHighSal,
algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoJ_UpInHighSal
resultFisher_CC_cluster_genoJ_UpInHighSal = runTest(GOdata_CC_cluster_genoJ_UpInHighSal,
algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoJ_UpInHighSal
####extract the significant GO terms
allRes_BP_cluster_genoJ_UpInHighSal = GenTable(GOdata_BP_cluster_genoJ_UpInHighSal,
classic = resultFisher_BP_cluster_genoJ_UpInHighSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoJ_UpInHighSal
allRes_MF_cluster_genoJ_UpInHighSal = GenTable(GOdata_MF_cluster_genoJ_UpInHighSal,
classic = resultFisher_MF_cluster_genoJ_UpInHighSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoJ_UpInHighSal
allRes_CC_cluster_genoJ_UpInHighSal = GenTable(GOdata_CC_cluster_genoJ_UpInHighSal,
classic = resultFisher_CC_cluster_genoJ_UpInHighSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoJ_UpInHighSal
###genotype J Up in Low Salinities
####create gene list for input in topGO
geneList_cluster_genoJ_UpInLowSal = factor(as.integer(geneUniverse %in% RQ1e2_genoJ_UpInLowSal))
names(geneList_cluster_genoJ_UpInLowSal) = geneUniverse
str(geneList_cluster_genoJ_UpInLowSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoJ_UpInLowSal = new("topGOdata", ontology="BP",
allGenes=geneList_cluster_genoJ_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoJ_UpInLowSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoJ_UpInLowSal = new("topGOdata", ontology="MF",
allGenes=geneList_cluster_genoJ_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoJ_UpInLowSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoJ_UpInLowSal = new("topGOdata", ontology="CC",
allGenes=geneList_cluster_genoJ_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoJ_UpInLowSal
####run Fisher's exact test
resultFisher_BP_cluster_genoJ_UpInLowSal = runTest(GOdata_BP_cluster_genoJ_UpInLowSal,
algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoJ_UpInLowSal
resultFisher_MF_cluster_genoJ_UpInLowSal = runTest(GOdata_MF_cluster_genoJ_UpInLowSal,
algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoJ_UpInLowSal
resultFisher_CC_cluster_genoJ_UpInLowSal = runTest(GOdata_CC_cluster_genoJ_UpInLowSal,
algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoJ_UpInLowSal
####extract the significant GO terms
allRes_BP_cluster_genoJ_UpInLowSal = GenTable(GOdata_BP_cluster_genoJ_UpInLowSal,
classic = resultFisher_BP_cluster_genoJ_UpInLowSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoJ_UpInLowSal
allRes_MF_cluster_genoJ_UpInLowSal = GenTable(GOdata_MF_cluster_genoJ_UpInLowSal,
classic = resultFisher_MF_cluster_genoJ_UpInLowSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoJ_UpInLowSal
allRes_CC_cluster_genoJ_UpInLowSal = GenTable(GOdata_CC_cluster_genoJ_UpInLowSal,
classic = resultFisher_CC_cluster_genoJ_UpInLowSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoJ_UpInLowSal
###genotype K Up in High Salinities
####create gene list for input in topGO
geneList_cluster_genoK_UpInHighSal = factor(as.integer(geneUniverse %in% RQ1e2_genoK_UpInHighSal))
names(geneList_cluster_genoK_UpInHighSal) = geneUniverse
str(geneList_cluster_genoK_UpInHighSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoK_UpInHighSal = new("topGOdata", ontology="BP",
allGenes=geneList_cluster_genoK_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoK_UpInHighSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoK_UpInHighSal = new("topGOdata", ontology="MF",
allGenes=geneList_cluster_genoK_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoK_UpInHighSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoK_UpInHighSal = new("topGOdata", ontology="CC",
allGenes=geneList_cluster_genoK_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoK_UpInHighSal
####run Fisher's exact test
resultFisher_BP_cluster_genoK_UpInHighSal = runTest(GOdata_BP_cluster_genoK_UpInHighSal,
algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoK_UpInHighSal
resultFisher_MF_cluster_genoK_UpInHighSal = runTest(GOdata_MF_cluster_genoK_UpInHighSal,
algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoK_UpInHighSal
resultFisher_CC_cluster_genoK_UpInHighSal = runTest(GOdata_CC_cluster_genoK_UpInHighSal,
algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoK_UpInHighSal
####extract the significant GO terms
allRes_BP_cluster_genoK_UpInHighSal = GenTable(GOdata_BP_cluster_genoK_UpInHighSal,
classic = resultFisher_BP_cluster_genoK_UpInHighSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoK_UpInHighSal
allRes_MF_cluster_genoK_UpInHighSal = GenTable(GOdata_MF_cluster_genoK_UpInHighSal,
classic = resultFisher_MF_cluster_genoK_UpInHighSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoK_UpInHighSal
allRes_CC_cluster_genoK_UpInHighSal = GenTable(GOdata_CC_cluster_genoK_UpInHighSal,
classic = resultFisher_CC_cluster_genoK_UpInHighSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoK_UpInHighSal
###genotype K Up in Low Salinities
####create gene list for input in topGO
geneList_cluster_genoK_UpInLowSal = factor(as.integer(geneUniverse %in% RQ1e2_genoK_UpInLowSal))
names(geneList_cluster_genoK_UpInLowSal) = geneUniverse
str(geneList_cluster_genoK_UpInLowSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoK_UpInLowSal = new("topGOdata", ontology="BP",
allGenes=geneList_cluster_genoK_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoK_UpInLowSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoK_UpInLowSal = new("topGOdata", ontology="MF",
allGenes=geneList_cluster_genoK_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoK_UpInLowSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoK_UpInLowSal = new("topGOdata", ontology="CC",
allGenes=geneList_cluster_genoK_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoK_UpInLowSal
####run Fisher's exact test
resultFisher_BP_cluster_genoK_UpInLowSal = runTest(GOdata_BP_cluster_genoK_UpInLowSal,
algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoK_UpInLowSal
resultFisher_MF_cluster_genoK_UpInLowSal = runTest(GOdata_MF_cluster_genoK_UpInLowSal,
algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoK_UpInLowSal
resultFisher_CC_cluster_genoK_UpInLowSal = runTest(GOdata_CC_cluster_genoK_UpInLowSal,
algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoK_UpInLowSal
####extract the significant GO terms
allRes_BP_cluster_genoK_UpInLowSal = GenTable(GOdata_BP_cluster_genoK_UpInLowSal,
classic = resultFisher_BP_cluster_genoK_UpInLowSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoK_UpInLowSal
allRes_MF_cluster_genoK_UpInLowSal = GenTable(GOdata_MF_cluster_genoK_UpInLowSal,
classic = resultFisher_MF_cluster_genoK_UpInLowSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoK_UpInLowSal
allRes_CC_cluster_genoK_UpInLowSal = GenTable(GOdata_CC_cluster_genoK_UpInLowSal,
classic = resultFisher_CC_cluster_genoK_UpInLowSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoK_UpInLowSal
###genotype P Up in High Salinities
####create gene list for input in topGO
geneList_cluster_genoP_UpInHighSal = factor(as.integer(geneUniverse %in% RQ1e2_genoP_UpInHighSal))
names(geneList_cluster_genoP_UpInHighSal) = geneUniverse
str(geneList_cluster_genoP_UpInHighSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoP_UpInHighSal = new("topGOdata", ontology="BP",
allGenes=geneList_cluster_genoP_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoP_UpInHighSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoP_UpInHighSal = new("topGOdata", ontology="MF",
allGenes=geneList_cluster_genoP_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoP_UpInHighSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoP_UpInHighSal = new("topGOdata", ontology="CC",
allGenes=geneList_cluster_genoP_UpInHighSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoP_UpInHighSal
####run Fisher's exact test
resultFisher_BP_cluster_genoP_UpInHighSal = runTest(GOdata_BP_cluster_genoP_UpInHighSal,
algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoP_UpInHighSal
resultFisher_MF_cluster_genoP_UpInHighSal = runTest(GOdata_MF_cluster_genoP_UpInHighSal,
algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoP_UpInHighSal
resultFisher_CC_cluster_genoP_UpInHighSal = runTest(GOdata_CC_cluster_genoP_UpInHighSal,
algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoP_UpInHighSal
####extract the significant GO terms
allRes_BP_cluster_genoP_UpInHighSal = GenTable(GOdata_BP_cluster_genoP_UpInHighSal,
classic = resultFisher_BP_cluster_genoP_UpInHighSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoP_UpInHighSal
allRes_MF_cluster_genoP_UpInHighSal = GenTable(GOdata_MF_cluster_genoP_UpInHighSal,
classic = resultFisher_MF_cluster_genoP_UpInHighSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoP_UpInHighSal
allRes_CC_cluster_genoP_UpInHighSal = GenTable(GOdata_CC_cluster_genoP_UpInHighSal,
classic = resultFisher_CC_cluster_genoP_UpInHighSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoP_UpInHighSal
###genotype P Up in Low Salinities
####create gene list for input in topGO
geneList_cluster_genoP_UpInLowSal = factor(as.integer(geneUniverse %in% RQ1e2_genoP_UpInLowSal))
names(geneList_cluster_genoP_UpInLowSal) = geneUniverse
str(geneList_cluster_genoP_UpInLowSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoP_UpInLowSal = new("topGOdata", ontology="BP",
allGenes=geneList_cluster_genoP_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoP_UpInLowSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoP_UpInLowSal = new("topGOdata", ontology="MF",
allGenes=geneList_cluster_genoP_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoP_UpInLowSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoP_UpInLowSal = new("topGOdata", ontology="CC",
allGenes=geneList_cluster_genoP_UpInLowSal,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoP_UpInLowSal
####run Fisher's exact test
resultFisher_BP_cluster_genoP_UpInLowSal = runTest(GOdata_BP_cluster_genoP_UpInLowSal,
algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoP_UpInLowSal
resultFisher_MF_cluster_genoP_UpInLowSal = runTest(GOdata_MF_cluster_genoP_UpInLowSal,
algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoP_UpInLowSal
resultFisher_CC_cluster_genoP_UpInLowSal = runTest(GOdata_CC_cluster_genoP_UpInLowSal,
algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoP_UpInLowSal
####extract the significant GO terms
allRes_BP_cluster_genoP_UpInLowSal = GenTable(GOdata_BP_cluster_genoP_UpInLowSal,
classic = resultFisher_BP_cluster_genoP_UpInLowSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoP_UpInLowSal
allRes_MF_cluster_genoP_UpInLowSal = GenTable(GOdata_MF_cluster_genoP_UpInLowSal,
classic = resultFisher_MF_cluster_genoP_UpInLowSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoP_UpInLowSal
allRes_CC_cluster_genoP_UpInLowSal = GenTable(GOdata_CC_cluster_genoP_UpInLowSal,
classic = resultFisher_CC_cluster_genoP_UpInLowSal,
orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoP_UpInLowSal
Next, we performed GSEA GO enrichment using CAMERA.
We first separated the list of GO terms from InterProScan in the three categories: Biological Process (BP), Molecular Function (MF), and Cellular Component (CC). This allowed us to perform GSEA analysis on the three categories separately. To do this, we used the topGO package. TopGO adds additional GO terms to those provided by InterProScan. However, we only wanted to include the GO terms that were selected by InterProScan. Therefore, we first used the topGO package to obtain a full list of all GO terms associated with the DE genes in our dataset, we then separated these GO terms in three lists (BP, MF and CC), after which we selected only the GO terms detected by InterProScan:
# turn geneID2GO into GO2geneID
GO_terms_geneID2GO = unique(unlist(geneID2GO, use.names = FALSE))
GO2geneID_InterPro = unstack(subset(stack(geneID2GO), values%in%GO_terms_geneID2GO), ind~values)
## the GO2geneID_InterPro object contained lists of genes with a given GO term as determined by InterProScan
# create gene list
genesOfInterest_allSign = rownames(subset(OnlySignGenes_RQ1e2_ConStage,
rownames(OnlySignGenes_RQ1e2_ConStage)%in%geneUniverse))
geneList_allDE = factor(as.integer(geneUniverse %in% genesOfInterest_allSign))
names(geneList_allDE) = geneUniverse
str(geneList_allDE)
# subset GOs into BP, MF and CC processes
GOdata_BP_allDE = new("topGOdata", ontology="BP", allGenes=geneList_allDE,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_allDE = new("topGOdata", ontology="MF", allGenes=geneList_allDE,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_allDE = new("topGOdata", ontology="CC", allGenes=geneList_allDE,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
# extract all the GO terms for a given set
ug_BP = usedGO(GOdata_BP_allDE)
ug_MF = usedGO(GOdata_MF_allDE)
ug_CC = usedGO(GOdata_CC_allDE)
# create a list of lists which contains the genes that are associated with a given GO term
## BP GO terms
GO2geneID_InterPro_BP = list()
for (GO in ug_BP){
selection = unlist(GO2geneID_InterPro[names(GO2geneID_InterPro) %in% GO == TRUE], use.names = FALSE)
GO2geneID_InterPro_BP[[GO]] = selection
}
## MF GO terms
GO2geneID_InterPro_MF = list()
for (GO in ug_MF){
selection = unlist(GO2geneID_InterPro[names(GO2geneID_InterPro) %in% GO == TRUE], use.names = FALSE)
GO2geneID_InterPro_MF[[GO]] = selection
}
## CC GO terms
GO2geneID_InterPro_CC= list()
for (GO in ug_CC){
selection = unlist(GO2geneID_InterPro[names(GO2geneID_InterPro) %in% GO == TRUE], use.names = FALSE)
GO2geneID_InterPro_CC[[GO]] = selection
}
Next, we ran CAMERA for the average contrasts:
#CAMERA on biological process GOs for each average contrast
CAMERA_InterPro_BP_avg8vs16 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_BP, design=design,
contrast=C_RQ1e2[,25])
CAMERA_InterPro_BP_avg16vs24 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_BP, design=design,
contrast=C_RQ1e2[,26])
CAMERA_InterPro_BP_avg8vs24 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_BP, design=design,
contrast=C_RQ1e2[,27])
#CAMERA on molecular function GOs for each average contrast
CAMERA_InterPro_MF_avg8vs16 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_MF,
design=design, contrast=C_RQ1e2[,25])
CAMERA_InterPro_MF_avg16vs24 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_MF,
design=design, contrast=C_RQ1e2[,26])
CAMERA_InterPro_MF_avg8vs24 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_MF,
design=design, contrast=C_RQ1e2[,27])
#CAMERA on cellular component GOs for each average contrast
CAMERA_InterPro_CC_avg8vs16 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_CC,
design=design, contrast=C_RQ1e2[,25])
CAMERA_InterPro_CC_avg16vs24 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_CC,
design=design, contrast=C_RQ1e2[,26])
CAMERA_InterPro_CC_avg8vs24 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_CC,
design=design, contrast=C_RQ1e2[,27])
We also ran CAMERA for each genotype separately (code only shown for genotype A). If you want to rerun the entire analysis in this document, you will have to apply below code for the other genotypes as well, because we will use the final output of all genotypes further downstream.
# genotype A
## CAMERA on biological process GOs for each average contrast
CAMERA_InterPro_BP_genoA_8vs16 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_BP,
design=design, contrast=C_RQ1e2[,1])
CAMERA_InterPro_BP_genoA_16vs24 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_BP,
design=design, contrast=C_RQ1e2[,2])
CAMERA_InterPro_BP_genoA_8vs24 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_BP,
design=design, contrast=C_RQ1e2[,3])
## CAMERA on molecular function GOs for each average contrast
CAMERA_InterPro_MF_genoA_8vs16 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_MF,
design=design, contrast=C_RQ1e2[,1])
CAMERA_InterPro_MF_genoA_16vs24 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_MF,
design=design, contrast=C_RQ1e2[,2])
CAMERA_InterPro_MF_genoA_8vs24 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_MF,
design=design, contrast=C_RQ1e2[,3])
## CAMERA on cellular componenmt GOs for each average contrast
CAMERA_InterPro_CC_genoA_8vs16 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_CC,
design=design, contrast=C_RQ1e2[,1])
CAMERA_InterPro_CC_genoA_16vs24 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_CC,
design=design, contrast=C_RQ1e2[,2])
CAMERA_InterPro_CC_genoA_8vs24 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_CC,
design=design, contrast=C_RQ1e2[,3])
In a next step, we reduced the list with significant GO terms using the online application REVIGO. For REVIGO, we used the output of the Fisher’s Exact test (only including the GO terms that had a P-value <= 0.05) and used a 0.5 similarity threshold with the SimRel algorithm. P-values from the Fisher’s Exact Test were included in the input to REVIGO. These were the same settings as for the REVIGO runs on the topGO results.
REVIGO for the above analyses was accessed on July 6th 2021, and used the Gene Ontology database of May 1st 2021 and the UniProt-to-GO mapping database from April 9th 2021.
Next, we created an overview of the GO enrichment results by plotting barplots.
First, topGO results:
# calculate number of significantly enriched GO terms ORA (topGO)
ORA_BP_UpInHighSal_sign = length(which(lapply(allRes_BP_cluster_UpInHighSal_elim$classic,as.numeric) < 0.05))
ORA_BP_UpInLowSal_sign = length(which(lapply(allRes_BP_cluster_UpInLowSal_elim$classic,as.numeric) < 0.05))
ORA_MF_UpInHighSal_sign = length(which(lapply(allRes_MF_cluster_UpInHighSal_elim$classic,as.numeric) < 0.05))
ORA_MF_UpInLowSal_sign = length(which(lapply(allRes_MF_cluster_UpInLowSal_elim$classic,as.numeric) < 0.05))
ORA_CC_UpInHighSal_sign = length(which(lapply(allRes_CC_cluster_UpInHighSal_elim$classic,as.numeric) < 0.05))
ORA_CC_UpInLowSal_sign = length(which(lapply(allRes_CC_cluster_UpInLowSal_elim$classic,as.numeric) < 0.05))
ORA_BP_UpInHighSal_genoA_sign = length(which(lapply(allRes_BP_cluster_genoA_UpInHighSal$classic,
as.numeric) < 0.05))
ORA_BP_UpInLowSal_genoA_sign = length(which(lapply(allRes_BP_cluster_genoA_UpInLowSal$classic,
as.numeric) < 0.05))
ORA_MF_UpInHighSal_genoA_sign = length(which(lapply(allRes_MF_cluster_genoA_UpInHighSal$classic,
as.numeric) < 0.05))
ORA_MF_UpInLowSal_genoA_sign = length(which(lapply(allRes_MF_cluster_genoA_UpInLowSal$classic,
as.numeric) < 0.05))
ORA_CC_UpInHighSal_genoA_sign = length(which(lapply(allRes_CC_cluster_genoA_UpInHighSal$classic,
as.numeric) < 0.05))
ORA_CC_UpInLowSal_genoA_sign = length(which(lapply(allRes_CC_cluster_genoA_UpInLowSal$classic,
as.numeric) < 0.05))
ORA_BP_UpInHighSal_genoB_sign = length(which(lapply(allRes_BP_cluster_genoB_UpInHighSal$classic,
as.numeric) < 0.05))
ORA_BP_UpInLowSal_genoB_sign = length(which(lapply(allRes_BP_cluster_genoB_UpInLowSal$classic,
as.numeric) < 0.05))
ORA_MF_UpInHighSal_genoB_sign = length(which(lapply(allRes_MF_cluster_genoB_UpInHighSal$classic,
as.numeric) < 0.05))
ORA_MF_UpInLowSal_genoB_sign = length(which(lapply(allRes_MF_cluster_genoB_UpInLowSal$classic,
as.numeric) < 0.05))
ORA_CC_UpInHighSal_genoB_sign = length(which(lapply(allRes_CC_cluster_genoB_UpInHighSal$classic,
as.numeric) < 0.05))
ORA_CC_UpInLowSal_genoB_sign = length(which(lapply(allRes_CC_cluster_genoB_UpInLowSal$classic,
as.numeric) < 0.05))
ORA_BP_UpInHighSal_genoD_sign = length(which(lapply(allRes_BP_cluster_genoD_UpInHighSal$classic,
as.numeric) < 0.05))
ORA_BP_UpInLowSal_genoD_sign = length(which(lapply(allRes_BP_cluster_genoD_UpInLowSal$classic,
as.numeric) < 0.05))
ORA_MF_UpInHighSal_genoD_sign = length(which(lapply(allRes_MF_cluster_genoD_UpInHighSal$classic,
as.numeric) < 0.05))
ORA_MF_UpInLowSal_genoD_sign = length(which(lapply(allRes_MF_cluster_genoD_UpInLowSal$classic,
as.numeric) < 0.05))
ORA_CC_UpInHighSal_genoD_sign = length(which(lapply(allRes_CC_cluster_genoD_UpInHighSal$classic,
as.numeric) < 0.05))
ORA_CC_UpInLowSal_genoD_sign = length(which(lapply(allRes_CC_cluster_genoD_UpInLowSal$classic,
as.numeric) < 0.05))
ORA_BP_UpInHighSal_genoF_sign = length(which(lapply(allRes_BP_cluster_genoF_UpInHighSal$classic,
as.numeric) < 0.05))
ORA_BP_UpInLowSal_genoF_sign = length(which(lapply(allRes_BP_cluster_genoF_UpInLowSal$classic,
as.numeric) < 0.05))
ORA_MF_UpInHighSal_genoF_sign = length(which(lapply(allRes_MF_cluster_genoF_UpInHighSal$classic,
as.numeric) < 0.05))
ORA_MF_UpInLowSal_genoF_sign = length(which(lapply(allRes_MF_cluster_genoF_UpInLowSal$classic,
as.numeric) < 0.05))
ORA_CC_UpInHighSal_genoF_sign = length(which(lapply(allRes_CC_cluster_genoF_UpInHighSal$classic,
as.numeric) < 0.05))
ORA_CC_UpInLowSal_genoF_sign = length(which(lapply(allRes_CC_cluster_genoF_UpInLowSal$classic,
as.numeric) < 0.05))
ORA_BP_UpInHighSal_genoI_sign = length(which(lapply(allRes_BP_cluster_genoI_UpInHighSal$classic,
as.numeric) < 0.05))
ORA_BP_UpInLowSal_genoI_sign = length(which(lapply(allRes_BP_cluster_genoI_UpInLowSal$classic,
as.numeric) < 0.05))
ORA_MF_UpInHighSal_genoI_sign = length(which(lapply(allRes_MF_cluster_genoI_UpInHighSal$classic,
as.numeric) < 0.05))
ORA_MF_UpInLowSal_genoI_sign = length(which(lapply(allRes_MF_cluster_genoI_UpInLowSal$classic,
as.numeric) < 0.05))
ORA_CC_UpInHighSal_genoI_sign = length(which(lapply(allRes_CC_cluster_genoI_UpInHighSal$classic,
as.numeric) < 0.05))
ORA_CC_UpInLowSal_genoI_sign = length(which(lapply(allRes_CC_cluster_genoI_UpInLowSal$classic,
as.numeric) < 0.05))
ORA_BP_UpInHighSal_genoJ_sign = length(which(lapply(allRes_BP_cluster_genoJ_UpInHighSal$classic,
as.numeric) < 0.05))
ORA_BP_UpInLowSal_genoJ_sign = length(which(lapply(allRes_BP_cluster_genoJ_UpInLowSal$classic,
as.numeric) < 0.05))
ORA_MF_UpInHighSal_genoJ_sign = length(which(lapply(allRes_MF_cluster_genoJ_UpInHighSal$classic,
as.numeric) < 0.05))
ORA_MF_UpInLowSal_genoJ_sign = length(which(lapply(allRes_MF_cluster_genoJ_UpInLowSal$classic,
as.numeric) < 0.05))
ORA_CC_UpInHighSal_genoJ_sign = length(which(lapply(allRes_CC_cluster_genoJ_UpInHighSal$classic,
as.numeric) < 0.05))
ORA_CC_UpInLowSal_genoJ_sign = length(which(lapply(allRes_CC_cluster_genoJ_UpInLowSal$classic,
as.numeric) < 0.05))
ORA_BP_UpInHighSal_genoK_sign = length(which(lapply(allRes_BP_cluster_genoK_UpInHighSal$classic,
as.numeric) < 0.05))
ORA_BP_UpInLowSal_genoK_sign = length(which(lapply(allRes_BP_cluster_genoK_UpInLowSal$classic,
as.numeric) < 0.05))
ORA_MF_UpInHighSal_genoK_sign = length(which(lapply(allRes_MF_cluster_genoK_UpInHighSal$classic,
as.numeric) < 0.05))
ORA_MF_UpInLowSal_genoK_sign = length(which(lapply(allRes_MF_cluster_genoK_UpInLowSal$classic,
as.numeric) < 0.05))
ORA_CC_UpInHighSal_genoK_sign = length(which(lapply(allRes_CC_cluster_genoK_UpInHighSal$classic,
as.numeric) < 0.05))
ORA_CC_UpInLowSal_genoK_sign = length(which(lapply(allRes_CC_cluster_genoK_UpInLowSal$classic,
as.numeric) < 0.05))
ORA_BP_UpInHighSal_genoP_sign = length(which(lapply(allRes_BP_cluster_genoP_UpInHighSal$classic,
as.numeric) < 0.05))
ORA_BP_UpInLowSal_genoP_sign = length(which(lapply(allRes_BP_cluster_genoP_UpInLowSal$classic,
as.numeric) < 0.05))
ORA_MF_UpInHighSal_genoP_sign = length(which(lapply(allRes_MF_cluster_genoP_UpInHighSal$classic,
as.numeric) < 0.05))
ORA_MF_UpInLowSal_genoP_sign = length(which(lapply(allRes_MF_cluster_genoP_UpInLowSal$classic,
as.numeric) < 0.05))
ORA_CC_UpInHighSal_genoP_sign = length(which(lapply(allRes_CC_cluster_genoP_UpInHighSal$classic,
as.numeric) < 0.05))
ORA_CC_UpInLowSal_genoP_sign = length(which(lapply(allRes_CC_cluster_genoP_UpInLowSal$classic,
as.numeric) < 0.05))
# combine data
ORA_UpInHighSal_SignNum = -sum(ORA_BP_UpInHighSal_sign,ORA_MF_UpInHighSal_sign,ORA_CC_UpInHighSal_sign)
ORA_UpInLowSal_SignNum = sum(ORA_BP_UpInLowSal_sign,ORA_MF_UpInLowSal_sign,ORA_CC_UpInLowSal_sign)
ORA_UpInHighSal_genoA_SignNum = -sum(ORA_BP_UpInHighSal_genoA_sign,ORA_MF_UpInHighSal_genoA_sign,
ORA_CC_UpInHighSal_genoA_sign)
ORA_UpInLowSal_genoA_SignNum = sum(ORA_BP_UpInLowSal_genoA_sign,ORA_MF_UpInLowSal_genoA_sign,
ORA_CC_UpInLowSal_genoA_sign)
ORA_UpInHighSal_genoB_SignNum = -sum(ORA_BP_UpInHighSal_genoB_sign,ORA_MF_UpInHighSal_genoB_sign,
ORA_CC_UpInHighSal_genoB_sign)
ORA_UpInLowSal_genoB_SignNum = sum(ORA_BP_UpInLowSal_genoB_sign,ORA_MF_UpInLowSal_genoB_sign,
ORA_CC_UpInLowSal_genoB_sign)
ORA_UpInHighSal_genoD_SignNum = -sum(ORA_BP_UpInHighSal_genoD_sign,ORA_MF_UpInHighSal_genoD_sign,
ORA_CC_UpInHighSal_genoD_sign)
ORA_UpInLowSal_genoD_SignNum = sum(ORA_BP_UpInLowSal_genoD_sign,ORA_MF_UpInLowSal_genoD_sign,
ORA_CC_UpInLowSal_genoD_sign)
ORA_UpInHighSal_genoF_SignNum = -sum(ORA_BP_UpInHighSal_genoF_sign,ORA_MF_UpInHighSal_genoF_sign,
ORA_CC_UpInHighSal_genoF_sign)
ORA_UpInLowSal_genoF_SignNum = sum(ORA_BP_UpInLowSal_genoF_sign,ORA_MF_UpInLowSal_genoF_sign,
ORA_CC_UpInLowSal_genoF_sign)
ORA_UpInHighSal_genoI_SignNum = -sum(ORA_BP_UpInHighSal_genoI_sign,ORA_MF_UpInHighSal_genoI_sign,
ORA_CC_UpInHighSal_genoI_sign)
ORA_UpInLowSal_genoI_SignNum = sum(ORA_BP_UpInLowSal_genoI_sign,ORA_MF_UpInLowSal_genoI_sign,
ORA_CC_UpInLowSal_genoI_sign)
ORA_UpInHighSal_genoJ_SignNum = -sum(ORA_BP_UpInHighSal_genoJ_sign,ORA_MF_UpInHighSal_genoJ_sign,
ORA_CC_UpInHighSal_genoJ_sign)
ORA_UpInLowSal_genoJ_SignNum = sum(ORA_BP_UpInLowSal_genoJ_sign,ORA_MF_UpInLowSal_genoJ_sign,
ORA_CC_UpInLowSal_genoJ_sign)
ORA_UpInHighSal_genoK_SignNum = -sum(ORA_BP_UpInHighSal_genoK_sign,ORA_MF_UpInHighSal_genoK_sign,
ORA_CC_UpInHighSal_genoK_sign)
ORA_UpInLowSal_genoK_SignNum = sum(ORA_BP_UpInLowSal_genoK_sign,ORA_MF_UpInLowSal_genoK_sign,
ORA_CC_UpInLowSal_genoK_sign)
ORA_UpInHighSal_genoP_SignNum = -sum(ORA_BP_UpInHighSal_genoP_sign,ORA_MF_UpInHighSal_genoP_sign,
ORA_CC_UpInHighSal_genoP_sign)
ORA_UpInLowSal_genoP_SignNum = sum(ORA_BP_UpInLowSal_genoP_sign,ORA_MF_UpInLowSal_genoP_sign,
ORA_CC_UpInLowSal_genoP_sign)
# create data frame for ggplot
values = c(ORA_UpInHighSal_SignNum,ORA_UpInHighSal_genoA_SignNum,
ORA_UpInHighSal_genoB_SignNum,ORA_UpInHighSal_genoD_SignNum,
ORA_UpInHighSal_genoF_SignNum,ORA_UpInHighSal_genoI_SignNum,
ORA_UpInHighSal_genoJ_SignNum,ORA_UpInHighSal_genoK_SignNum,
ORA_UpInHighSal_genoP_SignNum,
ORA_UpInLowSal_SignNum,ORA_UpInLowSal_genoA_SignNum,
ORA_UpInLowSal_genoB_SignNum,ORA_UpInLowSal_genoD_SignNum,
ORA_UpInLowSal_genoF_SignNum,ORA_UpInLowSal_genoI_SignNum,
ORA_UpInLowSal_genoJ_SignNum,ORA_UpInLowSal_genoK_SignNum,
ORA_UpInLowSal_genoP_SignNum)
meta = c('average','genotype A', 'genotype B', 'genotype D', 'genotype F',
'genotype I', 'genotype J', 'genotype K', 'genotype P')
df_ORA = as.data.frame(cbind(values,meta))
#reorder data for plotting
df_ORA$meta = factor(df_ORA$meta, levels = c('average','genotype P','genotype K','genotype J',
'genotype I','genotype F','genotype D','genotype B','genotype A'))
# plot barplot
g = ggplot(df_ORA, aes(x = meta, y = as.numeric(values))) +
geom_bar(stat = "identity", position = "identity", fill = 'black',
color = "white") + coord_flip() +
scale_x_discrete(limits = rev(levels(x))) +
ylab("Number of GO terms") +
theme_test()
g
Next, CAMERA results:
# calculate number of significantly enriched terms GO enrichment analysis CAMERA Interpro (distinguishing between up- and downregulated GO terms)
## combine relevant data frames in data frame list
df_summary = list(CAMERA_InterPro_BP_avg8vs16, CAMERA_InterPro_BP_avg16vs24, CAMERA_InterPro_BP_avg8vs24,
CAMERA_InterPro_MF_avg8vs16, CAMERA_InterPro_MF_avg16vs24, CAMERA_InterPro_MF_avg8vs24,
CAMERA_InterPro_CC_avg8vs16, CAMERA_InterPro_CC_avg16vs24, CAMERA_InterPro_CC_avg8vs24,
CAMERA_InterPro_BP_genoA_8vs16, CAMERA_InterPro_BP_genoA_16vs24,
CAMERA_InterPro_BP_genoA_8vs24, CAMERA_InterPro_MF_genoA_8vs16,
CAMERA_InterPro_MF_genoA_16vs24, CAMERA_InterPro_MF_genoA_8vs24,
CAMERA_InterPro_CC_genoA_8vs16, CAMERA_InterPro_CC_genoA_16vs24,
CAMERA_InterPro_CC_genoA_8vs24, CAMERA_InterPro_BP_genoB_8vs16,
CAMERA_InterPro_BP_genoB_16vs24, CAMERA_InterPro_BP_genoB_8vs24,
CAMERA_InterPro_MF_genoB_8vs16, CAMERA_InterPro_MF_genoB_16vs24,
CAMERA_InterPro_MF_genoB_8vs24, CAMERA_InterPro_CC_genoB_8vs16,
CAMERA_InterPro_CC_genoB_16vs24, CAMERA_InterPro_CC_genoB_8vs24,
CAMERA_InterPro_BP_genoD_8vs16, CAMERA_InterPro_BP_genoD_16vs24,
CAMERA_InterPro_BP_genoD_8vs24, CAMERA_InterPro_MF_genoD_8vs16,
CAMERA_InterPro_MF_genoD_16vs24, CAMERA_InterPro_MF_genoD_8vs24,
CAMERA_InterPro_CC_genoD_8vs16, CAMERA_InterPro_CC_genoD_16vs24,
CAMERA_InterPro_CC_genoD_8vs24, CAMERA_InterPro_BP_genoF_8vs16,
CAMERA_InterPro_BP_genoF_16vs24, CAMERA_InterPro_BP_genoF_8vs24,
CAMERA_InterPro_MF_genoF_8vs16, CAMERA_InterPro_MF_genoF_16vs24,
CAMERA_InterPro_MF_genoF_8vs24, CAMERA_InterPro_CC_genoF_8vs16,
CAMERA_InterPro_CC_genoF_16vs24, CAMERA_InterPro_CC_genoF_8vs24,
CAMERA_InterPro_BP_genoI_8vs16, CAMERA_InterPro_BP_genoI_16vs24,
CAMERA_InterPro_BP_genoI_8vs24, CAMERA_InterPro_MF_genoI_8vs16,
CAMERA_InterPro_MF_genoI_16vs24, CAMERA_InterPro_MF_genoI_8vs24,
CAMERA_InterPro_CC_genoI_8vs16, CAMERA_InterPro_CC_genoI_16vs24,
CAMERA_InterPro_CC_genoI_8vs24, CAMERA_InterPro_BP_genoJ_8vs16,
CAMERA_InterPro_BP_genoJ_16vs24, CAMERA_InterPro_BP_genoJ_8vs24,
CAMERA_InterPro_MF_genoJ_8vs16, CAMERA_InterPro_MF_genoJ_16vs24,
CAMERA_InterPro_MF_genoJ_8vs24, CAMERA_InterPro_CC_genoJ_8vs16,
CAMERA_InterPro_CC_genoJ_16vs24, CAMERA_InterPro_CC_genoJ_8vs24,
CAMERA_InterPro_BP_genoK_8vs16, CAMERA_InterPro_BP_genoK_16vs24,
CAMERA_InterPro_BP_genoK_8vs24, CAMERA_InterPro_MF_genoK_8vs16,
CAMERA_InterPro_MF_genoK_16vs24, CAMERA_InterPro_MF_genoK_8vs24,
CAMERA_InterPro_CC_genoK_8vs16, CAMERA_InterPro_CC_genoK_16vs24,
CAMERA_InterPro_CC_genoK_8vs24, CAMERA_InterPro_BP_genoP_8vs16,
CAMERA_InterPro_BP_genoP_16vs24, CAMERA_InterPro_BP_genoP_8vs24,
CAMERA_InterPro_MF_genoP_8vs16, CAMERA_InterPro_MF_genoP_16vs24,
CAMERA_InterPro_MF_genoP_8vs24, CAMERA_InterPro_CC_genoP_8vs16,
CAMERA_InterPro_CC_genoP_16vs24, CAMERA_InterPro_CC_genoP_8vs24)
# give names to the data frames in the list
names(df_summary) = c('BP_avg8vs16', 'BP_avg16vs24', 'BP_avg8vs24',
'MF_avg8vs16', 'MF_avg16vs24', 'MF_avg8vs24',
'CC_avg8vs16', 'CC_avg16vs24', 'CC_avg8vs24',
'BP_genoA_8vs16', 'BP_genoA_16vs24', 'BP_genoA_8vs24',
'MF_genoA_8vs16', 'MF_genoA_16vs24', 'MF_genoA_8vs24',
'CC_genoA_8vs16', 'CC_genoA_16vs24', 'CC_genoA_8vs24',
'BP_genoB_8vs16', 'BP_genoB_16vs24', 'BP_genoB_8vs24',
'MF_genoB_8vs16', 'MF_genoB_16vs24', 'MF_genoB_8vs24',
'CC_genoB_8vs16', 'CC_genoB_16vs24', 'CC_genoB_8vs24',
'BP_genoD_8vs16', 'BP_genoD_16vs24', 'BP_genoD_8vs24',
'MF_genoD_8vs16', 'MF_genoD_16vs24', 'MF_genoD_8vs24',
'CC_genoD_8vs16', 'CC_genoD_16vs24', 'CC_genoD_8vs24',
'BP_genoF_8vs16', 'BP_genoF_16vs24', 'BP_genoF_8vs24',
'MF_genoF_8vs16', 'MF_genoF_16vs24', 'MF_genoF_8vs24',
'CC_genoF_8vs16', 'CC_genoF_16vs24', 'CC_genoF_8vs24',
'BP_genoI_8vs16', 'BP_genoI_16vs24', 'BP_genoI_8vs24',
'MF_genoI_8vs16', 'MF_genoI_16vs24', 'MF_genoI_8vs24',
'CC_genoI_8vs16', 'CC_genoI_16vs24', 'CC_genoI_8vs24',
'BP_genoJ_8vs16', 'BP_genoJ_16vs24', 'BP_genoJ_8vs24',
'MF_genoJ_8vs16', 'MF_genoJ_16vs24', 'MF_genoJ_8vs24',
'CC_genoJ_8vs16', 'CC_genoJ_16vs24', 'CC_genoJ_8vs24',
'BP_genoK_8vs16', 'BP_genoK_16vs24', 'BP_genoK_8vs24',
'MF_genoK_8vs16', 'MF_genoK_16vs24', 'MF_genoK_8vs24',
'CC_genoK_8vs16', 'CC_genoK_16vs24', 'CC_genoK_8vs24',
'BP_genoP_8vs16', 'BP_genoP_16vs24', 'BP_genoP_8vs24',
'MF_genoP_8vs16', 'MF_genoP_16vs24', 'MF_genoP_8vs24',
'CC_genoP_8vs16', 'CC_genoP_16vs24', 'CC_genoP_8vs24')
# loop over the data frame list to select number of up- and downregulated GO terms in different contrasts
df3 = data.frame()
for (df in df_summary){
df_down = df[df$Direction == 'Down',]
df_up = df[df$Direction == 'Up',]
df_down_sign = length(which(df_down$FDR < 0.05))
df_up_sign = length(which(df_up$FDR < 0.05))
df3 = rbind.data.frame(df3, c(df_up_sign,df_down_sign))
}
# add column with metadata grouping sets of similar GO terms together
meta = c('avg8vs16', 'avg16vs24', 'avg8vs24',
'avg8vs16', 'avg16vs24', 'avg8vs24',
'avg8vs16', 'avg16vs24', 'avg8vs24',
'genoA_8vs16', 'genoA_16vs24', 'genoA_8vs24',
'genoA_8vs16', 'genoA_16vs24', 'genoA_8vs24',
'genoA_8vs16', 'genoA_16vs24', 'genoA_8vs24',
'genoB_8vs16', 'genoB_16vs24', 'genoB_8vs24',
'genoB_8vs16', 'genoB_16vs24', 'genoB_8vs24',
'genoB_8vs16', 'genoB_16vs24', 'genoB_8vs24',
'genoD_8vs16', 'genoD_16vs24', 'genoD_8vs24',
'genoD_8vs16', 'genoD_16vs24', 'genoD_8vs24',
'genoD_8vs16', 'genoD_16vs24', 'genoD_8vs24',
'genoF_8vs16', 'genoF_16vs24', 'genoF_8vs24',
'genoF_8vs16', 'genoF_16vs24', 'genoF_8vs24',
'genoF_8vs16', 'genoF_16vs24', 'genoF_8vs24',
'genoI_8vs16', 'genoI_16vs24', 'genoI_8vs24',
'genoI_8vs16', 'genoI_16vs24', 'genoI_8vs24',
'genoI_8vs16', 'genoI_16vs24', 'genoI_8vs24',
'genoJ_8vs16', 'genoJ_16vs24', 'genoJ_8vs24',
'genoJ_8vs16', 'genoJ_16vs24', 'genoJ_8vs24',
'genoJ_8vs16', 'genoJ_16vs24', 'genoJ_8vs24',
'genoK_8vs16', 'genoK_16vs24', 'genoK_8vs24',
'genoK_8vs16', 'genoK_16vs24', 'genoK_8vs24',
'genoK_8vs16', 'genoK_16vs24', 'genoK_8vs24',
'genoP_8vs16', 'genoP_16vs24', 'genoP_8vs24',
'genoP_8vs16', 'genoP_16vs24', 'genoP_8vs24',
'genoP_8vs16', 'genoP_16vs24', 'genoP_8vs24')
df3bis = cbind(df3,meta)
# add column names and row names
rownames(df3bis) = c('BP_avg8vs16', 'BP_avg16vs24', 'BP_avg8vs24',
'MF_avg8vs16', 'MF_avg16vs24', 'MF_avg8vs24',
'CC_avg8vs16', 'CC_avg16vs24', 'CC_avg8vs24',
'BP_genoA_8vs16', 'BP_genoA_16vs24', 'BP_genoA_8vs24',
'MF_genoA_8vs16', 'MF_genoA_16vs24', 'MF_genoA_8vs24',
'CC_genoA_8vs16', 'CC_genoA_16vs24', 'CC_genoA_8vs24',
'BP_genoB_8vs16', 'BP_genoB_16vs24', 'BP_genoB_8vs24',
'MF_genoB_8vs16', 'MF_genoB_16vs24', 'MF_genoB_8vs24',
'CC_genoB_8vs16', 'CC_genoB_16vs24', 'CC_genoB_8vs24',
'BP_genoD_8vs16', 'BP_genoD_16vs24', 'BP_genoD_8vs24',
'MF_genoD_8vs16', 'MF_genoD_16vs24', 'MF_genoD_8vs24',
'CC_genoD_8vs16', 'CC_genoD_16vs24', 'CC_genoD_8vs24',
'BP_genoF_8vs16', 'BP_genoF_16vs24', 'BP_genoF_8vs24',
'MF_genoF_8vs16', 'MF_genoF_16vs24', 'MF_genoF_8vs24',
'CC_genoF_8vs16', 'CC_genoF_16vs24', 'CC_genoF_8vs24',
'BP_genoI_8vs16', 'BP_genoI_16vs24', 'BP_genoI_8vs24',
'MF_genoI_8vs16', 'MF_genoI_16vs24', 'MF_genoI_8vs24',
'CC_genoI_8vs16', 'CC_genoI_16vs24', 'CC_genoI_8vs24',
'BP_genoJ_8vs16', 'BP_genoJ_16vs24', 'BP_genoJ_8vs24',
'MF_genoJ_8vs16', 'MF_genoJ_16vs24', 'MF_genoJ_8vs24',
'CC_genoJ_8vs16', 'CC_genoJ_16vs24', 'CC_genoJ_8vs24',
'BP_genoK_8vs16', 'BP_genoK_16vs24', 'BP_genoK_8vs24',
'MF_genoK_8vs16', 'MF_genoK_16vs24', 'MF_genoK_8vs24',
'CC_genoK_8vs16', 'CC_genoK_16vs24', 'CC_genoK_8vs24',
'BP_genoP_8vs16', 'BP_genoP_16vs24', 'BP_genoP_8vs24',
'MF_genoP_8vs16', 'MF_genoP_16vs24', 'MF_genoP_8vs24',
'CC_genoP_8vs16', 'CC_genoP_16vs24', 'CC_genoP_8vs24')
colnames(df3bis) = c('upregulated', 'downregulated', 'metadata')
# sum over the different GO classes
df3_sum = ddply(df3bis, "metadata", numcolwise(sum))
# turn downregulated values negative for plotting in ggplot
df3_sum[,3] = -df3_sum[,3]
# stack data
df3_stack = stack(df3_sum[,2:3])
# add metadata
meta = c('avg16-avg24', 'avg8-avg16', 'avg8-avg24',
'A16-A24', 'A8-A16', 'A8-A24',
'B16-B24', 'B8-B16', 'B8-B24',
'D16-D24', 'D8-D16', 'D8-D24',
'F16-F24', 'F8-F16', 'F8-F24',
'I16-I24', 'I8-I16', 'I8-I24',
'J16-J24', 'J8-J16', 'J8-J24',
'K16-K24', 'K8-K16', 'K8-K24',
'P16-P24', 'P8-P16', 'P8-P24')
category = c('16vs24', '8vs16', '8vs24')
df3_stack2 = cbind(df3_stack,meta,category)
# reorder data for plotting
df3_stack2$meta = factor(df3_stack2$meta, levels = c('avg8-avg24', 'avg8-avg16', 'avg16-avg24',
'P8-P24', 'P8-P16', 'P16-P24',
'K8-K24', 'K8-K16', 'K16-K24',
'J8-J24', 'J8-J16', 'J16-J24',
'I8-I24', 'I8-I16', 'I16-I24',
'F8-F24', 'F8-F16', 'F16-F24',
'D8-D24', 'D8-D16', 'D16-D24',
'B8-B24', 'B8-B16', 'B16-B24',
'A8-A24', 'A8-A16', 'A16-A24'))
# plot barplot
g = ggplot(df3_stack2, aes(x = meta, y = values, fill = category)) +
geom_bar(stat = "identity", position = "identity",
color = "white") + coord_flip() +
scale_fill_manual("legend", values = c('8vs16' = "#3690C0",
'16vs24' = "#A6BDDB",
'8vs24' = "#023858")) +
scale_x_discrete(limits = rev(levels(x))) +
theme_test()
g
Finally, we combined topGO and CAMERA results:
# combine CAMERA and ORA data in one ggplot
## drop ind column which is not present in the ORA data frame
df3_stack2 = df3_stack2[-c(2)]
## add category column to the ORA data frame for coloring
category = rep(c('ORA'),18)
df_ORA2 = cbind(df_ORA,category)
## combine data frames
GO_all = rbind(df3_stack2,df_ORA2)
# plot barplot
g = ggplot(GO_all, aes(x = meta, y = as.numeric(values), fill = category)) +
geom_bar(stat = "identity", position = "identity",
color = "white") + coord_flip() +
scale_fill_manual("legend", values = c('8vs16' = "#3690C0",
'16vs24' = "#A6BDDB",
'8vs24' = "#023858",
'ORA' = 'gray20')) +
scale_x_discrete(limits = rev(levels(x))) +
theme_test()
g
In this omnibus test we tested for interaction-effects between genotypes for a given salinity contrast. Interaction-effects give information on differences between genotypes in their response to changing salinity.
In a first step, we defined all the contrasts that need to be tested: 84 in total (each genotype-salinity combination):
# define all contrasts to test
C_RQ3=matrix(0,nrow=ncol(fit_group_model$coefficients),ncol=84)
rownames(C_RQ3)=colnames(fit_group_model$coefficients)
colnames(C_RQ3)=c("8vs16_A-8vs16_B","8vs16_A-8vs16_D","8vs16_A-8vs16_F","8vs16_A-8vs16_I",
"8vs16_A-8vs16_J","8vs16_A-8vs16_K","8vs16_A-8vs16_P","8vs16_B-8vs16_D",
"8vs16_B-8vs16_F","8vs16_B-8vs16_I","8vs16_B-8vs16_J","8vs16_B-8vs16_K",
"8vs16_B-8vs16_P","8vs16_D-8vs16_F","8vs16_D-8vs16_I","8vs16_D-8vs16_J",
"8vs16_D-8vs16_K","8vs16_D-8vs16_P","8vs16_F-8vs16_I","8vs16_F-8vs16_J",
"8vs16_F-8vs16_K","8vs16_F-8vs16_P","8vs16_I-8vs16_J","8vs16_I-8vs16_K",
"8vs16_I-8vs16_P","8vs16_J-8vs16_K","8vs16_J-8vs16_P","8vs16_K-8vs16_P",
"8vs24_A-8vs24_B","8vs24_A-8vs24_D","8vs24_A-8vs24_F","8vs24_A-8vs24_I",
"8vs24_A-8vs24_J","8vs24_A-8vs24_K","8vs24_A-8vs24_P","8vs24_B-8vs24_D",
"8vs24_B-8vs24_F","8vs24_B-8vs24_I","8vs24_B-8vs24_J","8vs24_B-8vs24_K",
"8vs24_B-8vs24_P","8vs24_D-8vs24_F","8vs24_D-8vs24_I","8vs24_D-8vs24_J",
"8vs24_D-8vs24_K","8vs24_D-8vs24_P","8vs24_F-8vs24_I","8vs24_F-8vs24_J",
"8vs24_F-8vs24_K","8vs24_F-8vs24_P","8vs24_I-8vs24_J","8vs24_I-8vs24_K",
"8vs24_I-8vs24_P","8vs24_J-8vs24_K","8vs24_J-8vs24_P","8vs24_K-8vs24_P",
"16vs24_A-16vs24_B","16vs24_A-16vs24_D","16vs24_A-16vs24_F","16vs24_A-16vs24_I",
"16vs24_A-16vs24_J","16vs24_A-16vs24_K","16vs24_A-16vs24_P","16vs24_B-16vs24_D",
"16vs24_B-16vs24_F","16vs24_B-16vs24_I","16vs24_B-16vs24_J","16vs24_B-16vs24_K",
"16vs24_B-16vs24_P","16vs24_D-16vs24_F","16vs24_D-16vs24_I","16vs24_D-16vs24_J",
"16vs24_D-16vs24_K","16vs24_D-16vs24_P","16vs24_F-16vs24_I","16vs24_F-16vs24_J",
"16vs24_F-16vs24_K","16vs24_F-16vs24_P","16vs24_I-16vs24_J","16vs24_I-16vs24_K",
"16vs24_I-16vs24_P","16vs24_J-16vs24_K","16vs24_J-16vs24_P","16vs24_K-16vs24_P")
# 8vs16 contrast
C_RQ3[c("A.16ppt","A.8ppt","B.16ppt","B.8ppt"),"8vs16_A-8vs16_B"]=c(-1,1,1,-1)
C_RQ3[c("A.16ppt","A.8ppt","D.16ppt","D.8ppt"),"8vs16_A-8vs16_D"]=c(-1,1,1,-1)
C_RQ3[c("A.16ppt","A.8ppt","F.16ppt","F.8ppt"),"8vs16_A-8vs16_F"]=c(-1,1,1,-1)
C_RQ3[c("A.16ppt","A.8ppt","I.16ppt","I.8ppt"),"8vs16_A-8vs16_I"]=c(-1,1,1,-1)
C_RQ3[c("A.16ppt","A.8ppt","J.16ppt","J.8ppt"),"8vs16_A-8vs16_J"]=c(-1,1,1,-1)
C_RQ3[c("A.16ppt","A.8ppt","K.16ppt","K.8ppt"),"8vs16_A-8vs16_K"]=c(-1,1,1,-1)
C_RQ3[c("A.16ppt","A.8ppt","P.16ppt","P.8ppt"),"8vs16_A-8vs16_P"]=c(-1,1,1,-1)
C_RQ3[c("B.16ppt","B.8ppt","D.16ppt","D.8ppt"),"8vs16_B-8vs16_D"]=c(-1,1,1,-1)
C_RQ3[c("B.16ppt","B.8ppt","F.16ppt","F.8ppt"),"8vs16_B-8vs16_F"]=c(-1,1,1,-1)
C_RQ3[c("B.16ppt","B.8ppt","I.16ppt","I.8ppt"),"8vs16_B-8vs16_I"]=c(-1,1,1,-1)
C_RQ3[c("B.16ppt","B.8ppt","J.16ppt","J.8ppt"),"8vs16_B-8vs16_J"]=c(-1,1,1,-1)
C_RQ3[c("B.16ppt","B.8ppt","K.16ppt","K.8ppt"),"8vs16_B-8vs16_K"]=c(-1,1,1,-1)
C_RQ3[c("B.16ppt","B.8ppt","P.16ppt","P.8ppt"),"8vs16_B-8vs16_P"]=c(-1,1,1,-1)
C_RQ3[c("D.16ppt","D.8ppt","F.16ppt","F.8ppt"),"8vs16_D-8vs16_F"]=c(-1,1,1,-1)
C_RQ3[c("D.16ppt","D.8ppt","I.16ppt","I.8ppt"),"8vs16_D-8vs16_I"]=c(-1,1,1,-1)
C_RQ3[c("D.16ppt","D.8ppt","J.16ppt","J.8ppt"),"8vs16_D-8vs16_J"]=c(-1,1,1,-1)
C_RQ3[c("D.16ppt","D.8ppt","K.16ppt","K.8ppt"),"8vs16_D-8vs16_K"]=c(-1,1,1,-1)
C_RQ3[c("D.16ppt","D.8ppt","P.16ppt","P.8ppt"),"8vs16_D-8vs16_P"]=c(-1,1,1,-1)
C_RQ3[c("F.16ppt","F.8ppt","I.16ppt","I.8ppt"),"8vs16_F-8vs16_I"]=c(-1,1,1,-1)
C_RQ3[c("F.16ppt","F.8ppt","J.16ppt","J.8ppt"),"8vs16_F-8vs16_J"]=c(-1,1,1,-1)
C_RQ3[c("F.16ppt","F.8ppt","K.16ppt","K.8ppt"),"8vs16_F-8vs16_K"]=c(-1,1,1,-1)
C_RQ3[c("F.16ppt","F.8ppt","P.16ppt","P.8ppt"),"8vs16_F-8vs16_P"]=c(-1,1,1,-1)
C_RQ3[c("I.16ppt","I.8ppt","J.16ppt","J.8ppt"),"8vs16_I-8vs16_J"]=c(-1,1,1,-1)
C_RQ3[c("I.16ppt","I.8ppt","K.16ppt","K.8ppt"),"8vs16_I-8vs16_K"]=c(-1,1,1,-1)
C_RQ3[c("I.16ppt","I.8ppt","P.16ppt","P.8ppt"),"8vs16_I-8vs16_P"]=c(-1,1,1,-1)
C_RQ3[c("J.16ppt","J.8ppt","K.16ppt","K.8ppt"),"8vs16_J-8vs16_K"]=c(-1,1,1,-1)
C_RQ3[c("J.16ppt","J.8ppt","P.16ppt","P.8ppt"),"8vs16_J-8vs16_P"]=c(-1,1,1,-1)
C_RQ3[c("K.16ppt","K.8ppt","P.16ppt","P.8ppt"),"8vs16_K-8vs16_P"]=c(-1,1,1,-1)
# 16vs24 contrast
C_RQ3[c("A.24ppt","A.16ppt","B.24ppt","B.16ppt"),"16vs24_A-16vs24_B"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.16ppt","D.24ppt","D.16ppt"),"16vs24_A-16vs24_D"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.16ppt","F.24ppt","F.16ppt"),"16vs24_A-16vs24_F"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.16ppt","I.24ppt","I.16ppt"),"16vs24_A-16vs24_I"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.16ppt","J.24ppt","J.16ppt"),"16vs24_A-16vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.16ppt","K.24ppt","K.16ppt"),"16vs24_A-16vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.16ppt","P.24ppt","P.16ppt"),"16vs24_A-16vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.16ppt","D.24ppt","D.16ppt"),"16vs24_B-16vs24_D"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.16ppt","F.24ppt","F.16ppt"),"16vs24_B-16vs24_F"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.16ppt","I.24ppt","I.16ppt"),"16vs24_B-16vs24_I"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.16ppt","J.24ppt","J.16ppt"),"16vs24_B-16vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.16ppt","K.24ppt","K.16ppt"),"16vs24_B-16vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.16ppt","P.24ppt","P.16ppt"),"16vs24_B-16vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.16ppt","F.24ppt","F.16ppt"),"16vs24_D-16vs24_F"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.16ppt","I.24ppt","I.16ppt"),"16vs24_D-16vs24_I"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.16ppt","J.24ppt","J.16ppt"),"16vs24_D-16vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.16ppt","K.24ppt","K.16ppt"),"16vs24_D-16vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.16ppt","P.24ppt","P.16ppt"),"16vs24_D-16vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("F.24ppt","F.16ppt","I.24ppt","I.16ppt"),"16vs24_F-16vs24_I"]=c(-1,1,1,-1)
C_RQ3[c("F.24ppt","F.16ppt","J.24ppt","J.16ppt"),"16vs24_F-16vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("F.24ppt","F.16ppt","K.24ppt","K.16ppt"),"16vs24_F-16vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("F.24ppt","F.16ppt","P.24ppt","P.16ppt"),"16vs24_F-16vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("I.24ppt","I.16ppt","J.24ppt","J.16ppt"),"16vs24_I-16vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("I.24ppt","I.16ppt","K.24ppt","K.16ppt"),"16vs24_I-16vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("I.24ppt","I.16ppt","P.24ppt","P.16ppt"),"16vs24_I-16vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("J.24ppt","J.16ppt","K.24ppt","K.16ppt"),"16vs24_J-16vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("J.24ppt","J.16ppt","P.24ppt","P.16ppt"),"16vs24_J-16vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("K.24ppt","K.16ppt","P.24ppt","P.16ppt"),"16vs24_K-16vs24_P"]=c(-1,1,1,-1)
# 8vs24 contrast
C_RQ3[c("A.24ppt","A.8ppt","B.24ppt","B.8ppt"),"8vs24_A-8vs24_B"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.8ppt","D.24ppt","D.8ppt"),"8vs24_A-8vs24_D"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.8ppt","F.24ppt","F.8ppt"),"8vs24_A-8vs24_F"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.8ppt","I.24ppt","I.8ppt"),"8vs24_A-8vs24_I"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.8ppt","J.24ppt","J.8ppt"),"8vs24_A-8vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.8ppt","K.24ppt","K.8ppt"),"8vs24_A-8vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.8ppt","P.24ppt","P.8ppt"),"8vs24_A-8vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.8ppt","D.24ppt","D.8ppt"),"8vs24_B-8vs24_D"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.8ppt","F.24ppt","F.8ppt"),"8vs24_B-8vs24_F"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.8ppt","I.24ppt","I.8ppt"),"8vs24_B-8vs24_I"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.8ppt","J.24ppt","J.8ppt"),"8vs24_B-8vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.8ppt","K.24ppt","K.8ppt"),"8vs24_B-8vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.8ppt","P.24ppt","P.8ppt"),"8vs24_B-8vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.8ppt","F.24ppt","F.8ppt"),"8vs24_D-8vs24_F"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.8ppt","I.24ppt","I.8ppt"),"8vs24_D-8vs24_I"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.8ppt","J.24ppt","J.8ppt"),"8vs24_D-8vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.8ppt","K.24ppt","K.8ppt"),"8vs24_D-8vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.8ppt","P.24ppt","P.8ppt"),"8vs24_D-8vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("F.24ppt","F.8ppt","I.24ppt","I.8ppt"),"8vs24_F-8vs24_I"]=c(-1,1,1,-1)
C_RQ3[c("F.24ppt","F.8ppt","J.24ppt","J.8ppt"),"8vs24_F-8vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("F.24ppt","F.8ppt","K.24ppt","K.8ppt"),"8vs24_F-8vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("F.24ppt","F.8ppt","P.24ppt","P.8ppt"),"8vs24_F-8vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("I.24ppt","I.8ppt","J.24ppt","J.8ppt"),"8vs24_I-8vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("I.24ppt","I.8ppt","K.24ppt","K.8ppt"),"8vs24_I-8vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("I.24ppt","I.8ppt","P.24ppt","P.8ppt"),"8vs24_I-8vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("J.24ppt","J.8ppt","K.24ppt","K.8ppt"),"8vs24_J-8vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("J.24ppt","J.8ppt","P.24ppt","P.8ppt"),"8vs24_J-8vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("K.24ppt","K.8ppt","P.24ppt","P.8ppt"),"8vs24_K-8vs24_P"]=c(-1,1,1,-1)
We followed the same procedure as in section 3.3.2.
# screening stage
alpha=0.05
screenTest_RQ3 = glmQLFTest(fit_group_model, contrast=C_RQ3)
pScreen_RQ3 = screenTest_RQ3$table$PValue
names(pScreen_RQ3) = rownames(screenTest_RQ3$table)
# confirmation stage
confirmationResults_RQ3 = sapply(1:ncol(C_RQ3),function(i) glmQLFTest(fit_group_model,
contrast = C_RQ3[,i]), simplify=FALSE)
confirmationPList_RQ3 = lapply(confirmationResults_RQ3, function(x) x$table$PValue)
confirmationP_RQ3 = as.matrix(Reduce(f=cbind,confirmationPList_RQ3))
rownames(confirmationP_RQ3) = rownames(confirmationResults_RQ3[[1]]$table)
colnames(confirmationP_RQ3) = colnames(C_RQ3)
stageRObj_RQ3 = stageR(pScreen=pScreen_RQ3, pConfirmation=confirmationP_RQ3)
stageRAdj_RQ3 = stageWiseAdjustment(object=stageRObj_RQ3, method="holm", alpha=0.05)
resRQ3 = getResults(stageRAdj_RQ3)
## The returned adjusted p-values are based on a stage-wise testing approach and are only valid for the provided target OFDR level of 5%. If a different target OFDR level is of interest,the entire adjustment should be re-run.
# get the significant genes for each contrast
SignifGenesRQ3 = colSums(resRQ3)
SignifGenesRQ3
## padjScreen 8vs16_A-8vs16_B 8vs16_A-8vs16_D 8vs16_A-8vs16_F
## 5099 104 440 268
## 8vs16_A-8vs16_I 8vs16_A-8vs16_J 8vs16_A-8vs16_K 8vs16_A-8vs16_P
## 46 150 141 248
## 8vs16_B-8vs16_D 8vs16_B-8vs16_F 8vs16_B-8vs16_I 8vs16_B-8vs16_J
## 399 308 54 80
## 8vs16_B-8vs16_K 8vs16_B-8vs16_P 8vs16_D-8vs16_F 8vs16_D-8vs16_I
## 68 193 729 305
## 8vs16_D-8vs16_J 8vs16_D-8vs16_K 8vs16_D-8vs16_P 8vs16_F-8vs16_I
## 444 209 966 158
## 8vs16_F-8vs16_J 8vs16_F-8vs16_K 8vs16_F-8vs16_P 8vs16_I-8vs16_J
## 142 118 251 29
## 8vs16_I-8vs16_K 8vs16_I-8vs16_P 8vs16_J-8vs16_K 8vs16_J-8vs16_P
## 39 175 58 102
## 8vs16_K-8vs16_P 8vs24_A-8vs24_B 8vs24_A-8vs24_D 8vs24_A-8vs24_F
## 238 506 441 272
## 8vs24_A-8vs24_I 8vs24_A-8vs24_J 8vs24_A-8vs24_K 8vs24_A-8vs24_P
## 181 188 350 488
## 8vs24_B-8vs24_D 8vs24_B-8vs24_F 8vs24_B-8vs24_I 8vs24_B-8vs24_J
## 595 325 379 218
## 8vs24_B-8vs24_K 8vs24_B-8vs24_P 8vs24_D-8vs24_F 8vs24_D-8vs24_I
## 232 260 303 114
## 8vs24_D-8vs24_J 8vs24_D-8vs24_K 8vs24_D-8vs24_P 8vs24_F-8vs24_I
## 125 411 565 172
## 8vs24_F-8vs24_J 8vs24_F-8vs24_K 8vs24_F-8vs24_P 8vs24_I-8vs24_J
## 98 167 197 52
## 8vs24_I-8vs24_K 8vs24_I-8vs24_P 8vs24_J-8vs24_K 8vs24_J-8vs24_P
## 175 494 141 105
## 8vs24_K-8vs24_P 16vs24_A-16vs24_B 16vs24_A-16vs24_D 16vs24_A-16vs24_F
## 118 113 114 88
## 16vs24_A-16vs24_I 16vs24_A-16vs24_J 16vs24_A-16vs24_K 16vs24_A-16vs24_P
## 25 29 115 37
## 16vs24_B-16vs24_D 16vs24_B-16vs24_F 16vs24_B-16vs24_I 16vs24_B-16vs24_J
## 41 390 180 26
## 16vs24_B-16vs24_K 16vs24_B-16vs24_P 16vs24_D-16vs24_F 16vs24_D-16vs24_I
## 42 132 159 112
## 16vs24_D-16vs24_J 16vs24_D-16vs24_K 16vs24_D-16vs24_P 16vs24_F-16vs24_I
## 65 48 195 119
## 16vs24_F-16vs24_J 16vs24_F-16vs24_K 16vs24_F-16vs24_P 16vs24_I-16vs24_J
## 107 271 200 39
## 16vs24_I-16vs24_K 16vs24_I-16vs24_P 16vs24_J-16vs24_K 16vs24_J-16vs24_P
## 163 49 123 45
## 16vs24_K-16vs24_P
## 134
adjusted_p_RQ3 = getAdjustedPValues(stageRAdj_RQ3, onlySignificantGenes = FALSE, order = FALSE)
## The returned adjusted p-values are based on a stage-wise testing approach and are only valid for the provided target OFDR level of 5%. If a different target OFDR level is of interest,the entire adjustment should be re-run.
# visualize number of significant genes
resRQ3_df = as.data.frame(resRQ3)
resRQ3_df2 = resRQ3_df
resRQ3_df2$gene = rownames(resRQ3_df2)
OnlySignGenes_RQ3 = resRQ3_df[resRQ3_df$padjScreen == 1,]
dim(OnlySignGenes_RQ3)
## [1] 5099 85
# select genes that were significant after the screening stage, but not the confirmation stage
genesSI_RQ3 = rownames(adjusted_p_RQ3)[adjusted_p_RQ3[,"padjScreen"]<=0.05]
genesNotFoundStageII_RQ3 = genesSI_RQ3[genesSI_RQ3 %in% rownames(resRQ3)[rowSums(resRQ3==0)==84]]
length(genesNotFoundStageII_RQ3)
## [1] 1141
# select genes that were significant after the confirmation stage
OnlySignGenes_RQ3_ConStage = OnlySignGenes_RQ3 [!rownames(OnlySignGenes_RQ3 ) %in% genesNotFoundStageII_RQ3, ]
nrow(OnlySignGenes_RQ3_ConStage)
## [1] 3958
3857 genes are significant after the confirmation stage.
Before we continued with the downstream analyses, we created a single data object that contains some key-information of the statistical pipeline outlined above. This included information on logFC, logCPM and P-values for each gene for each contrast.
# select the adjusted P-values for each contrast
adjusted_p_RQ3 = getAdjustedPValues(stageRAdj_RQ3, onlySignificantGenes = FALSE, order = FALSE)
## The returned adjusted p-values are based on a stage-wise testing approach and are only valid for the provided target OFDR level of 5%. If a different target OFDR level is of interest,the entire adjustment should be re-run.
# rename column headers in adjusted_p_RQ3
colnames(adjusted_p_RQ3) = c("padjScreen","8vs16_A-8vs16_B_Padj","8vs16_A-8vs16_D_Padj","8vs16_A-8vs16_F_Padj",
"8vs16_A-8vs16_I_Padj","8vs16_A-8vs16_J_Padj","8vs16_A-8vs16_K_Padj",
"8vs16_A-8vs16_P_Padj","8vs16_B-8vs16_D_Padj","8vs16_B-8vs16_F_Padj",
"8vs16_B-8vs16_I_Padj","8vs16_B-8vs16_J_Padj","8vs16_B-8vs16_K_Padj",
"8vs16_B-8vs16_P_Padj","8vs16_D-8vs16_F_Padj","8vs16_D-8vs16_I_Padj",
"8vs16_D-8vs16_J_Padj","8vs16_D-8vs16_K_Padj","8vs16_D-8vs16_P_Padj",
"8vs16_F-8vs16_I_Padj","8vs16_F-8vs16_J_Padj","8vs16_F-8vs16_K_Padj",
"8vs16_F-8vs16_P_Padj","8vs16_I-8vs16_J_Padj","8vs16_I-8vs16_K_Padj",
"8vs16_I-8vs16_P_Padj","8vs16_J-8vs16_K_Padj","8vs16_J-8vs16_P_Padj",
"8vs16_K-8vs16_P_Padj","8vs24_A-8vs24_B_Padj","8vs24_A-8vs24_D_Padj",
"8vs24_A-8vs24_F_Padj","8vs24_A-8vs24_I_Padj","8vs24_A-8vs24_J_Padj",
"8vs24_A-8vs24_K_Padj","8vs24_A-8vs24_P_Padj","8vs24_B-8vs24_D_Padj",
"8vs24_B-8vs24_F_Padj","8vs24_B-8vs24_I_Padj","8vs24_B-8vs24_J_Padj",
"8vs24_B-8vs24_K_Padj","8vs24_B-8vs24_P_Padj","8vs24_D-8vs24_F_Padj",
"8vs24_D-8vs24_I_Padj","8vs24_D-8vs24_J_Padj","8vs24_D-8vs24_K_Padj",
"8vs24_D-8vs24_P_Padj","8vs24_F-8vs24_I_Padj","8vs24_F-8vs24_J_Padj",
"8vs24_F-8vs24_K_Padj","8vs24_F-8vs24_P_Padj","8vs24_I-8vs24_J_Padj",
"8vs24_I-8vs24_K_Padj","8vs24_I-8vs24_P_Padj","8vs24_J-8vs24_K_Padj",
"8vs24_J-8vs24_P_Padj","8vs24_K-8vs24_P_Padj","16vs24_A-16vs24_B_Padj",
"16vs24_A-16vs24_D_Padj","16vs24_A-16vs24_F_Padj","16vs24_A-16vs24_I_Padj",
"16vs24_A-16vs24_J_Padj","16vs24_A-16vs24_K_Padj","16vs24_A-16vs24_P_Padj",
"16vs24_B-16vs24_D_Padj","16vs24_B-16vs24_F_Padj","16vs24_B-16vs24_I_Padj",
"16vs24_B-16vs24_J_Padj","16vs24_B-16vs24_K_Padj","16vs24_B-16vs24_P_Padj",
"16vs24_D-16vs24_F_Padj","16vs24_D-16vs24_I_Padj","16vs24_D-16vs24_J_Padj",
"16vs24_D-16vs24_K_Padj","16vs24_D-16vs24_P_Padj","16vs24_F-16vs24_I_Padj",
"16vs24_F-16vs24_J_Padj","16vs24_F-16vs24_K_Padj","16vs24_F-16vs24_P_Padj",
"16vs24_I-16vs24_J_Padj","16vs24_I-16vs24_K_Padj","16vs24_I-16vs24_P_Padj",
"16vs24_J-16vs24_K_Padj","16vs24_J-16vs24_P_Padj","16vs24_K-16vs24_P_Padj")
# create empty list to hold the data values
datalist_RQ3 = list()
# loop over the confirmationResults_RQ3 object to obtain the relevant information (table)
for (contrast in c(1:84)){
table = confirmationResults_RQ3[[contrast]]$table
datalist_RQ3[[contrast]] = table
}
# turn list into data frame
confirmationResults_RQ3_total_dataset = data.frame(datalist_RQ3)
# rename column names for tractability
colnames(confirmationResults_RQ3_total_dataset)=c("8vs16_A-8vs16_B_logFC","8vs16_A-8vs16_B_logCPM","8vs16_A-8vs16_B_F","8vs16_A-8vs16_B_nonadjPValue","8vs16_A-8vs16_D_logFC","8vs16_A-8vs16_D_logCPM","8vs16_A-8vs16_D_F","8vs16_A-8vs16_D_nonadjPValue","8vs16_A-8vs16_F_logFC","8vs16_A-8vs16_F_logCPM","8vs16_A-8vs16_F_F","8vs16_A-8vs16_F_nonadjPValue","8vs16_A-8vs16_I_logFC","8vs16_A-8vs16_I_logCPM","8vs16_A-8vs16_I_F","8vs16_A-8vs16_I_nonadjPValue","8vs16_A-8vs16_J_logFC","8vs16_A-8vs16_J_logCPM","8vs16_A-8vs16_J_F","8vs16_A-8vs16_J_nonadjPValue","8vs16_A-8vs16_K_logFC","8vs16_A-8vs16_K_logCPM","8vs16_A-8vs16_K_F","8vs16_A-8vs16_K_nonadjPValue","8vs16_A-8vs16_P_logFC","8vs16_A-8vs16_P_logCPM","8vs16_A-8vs16_P_F","8vs16_A-8vs16_P_nonadjPValue","8vs16_B-8vs16_D_logFC","8vs16_B-8vs16_D_logCPM","8vs16_B-8vs16_D_F","8vs16_B-8vs16_D_nonadjPValue","8vs16_B-8vs16_F_logFC","8vs16_B-8vs16_F_logCPM","8vs16_B-8vs16_F_F","8vs16_B-8vs16_F_nonadjPValue","8vs16_B-8vs16_I_logFC","8vs16_B-8vs16_I_logCPM","8vs16_B-8vs16_I_F","8vs16_B-8vs16_I_nonadjPValue","8vs16_B-8vs16_J_logFC","8vs16_B-8vs16_J_logCPM","8vs16_B-8vs16_J_F","8vs16_B-8vs16_J_nonadjPValue","8vs16_B-8vs16_K_logFC","8vs16_B-8vs16_K_logCPM","8vs16_B-8vs16_K_F","8vs16_B-8vs16_K_nonadjPValue","8vs16_B-8vs16_P_logFC","8vs16_B-8vs16_P_logCPM","8vs16_B-8vs16_P_F","8vs16_B-8vs16_P_nonadjPValue","8vs16_D-8vs16_F_logFC","8vs16_D-8vs16_F_logCPM","8vs16_D-8vs16_F_F","8vs16_D-8vs16_F_nonadjPValue","8vs16_D-8vs16_I_logFC","8vs16_D-8vs16_I_logCPM","8vs16_D-8vs16_I_F","8vs16_D-8vs16_I_nonadjPValue", "8vs16_D-8vs16_J_logFC","8vs16_D-8vs16_J_logCPM","8vs16_D-8vs16_J_F","8vs16_D-8vs16_J_nonadjPValue","8vs16_D-8vs16_K_logFC","8vs16_D-8vs16_K_logCPM","8vs16_D-8vs16_K_F","8vs16_D-8vs16_K_nonadjPValue","8vs16_D-8vs16_P_logFC","8vs16_D-8vs16_P_logCPM","8vs16_D-8vs16_P_F","8vs16_D-8vs16_P_nonadjPValue","8vs16_F-8vs16_I_logFC","8vs16_F-8vs16_I_logCPM","8vs16_F-8vs16_I_F","8vs16_F-8vs16_I_nonadjPValue","8vs16_F-8vs16_J_logFC","8vs16_F-8vs16_J_logCPM","8vs16_F-8vs16_J_F","8vs16_F-8vs16_J_nonadjPValue","8vs16_F-8vs16_K_logFC","8vs16_F-8vs16_K_logCPM","8vs16_F-8vs16_K_F","8vs16_F-8vs16_K_nonadjPValue","8vs16_F-8vs16_P_logFC","8vs16_F-8vs16_P_logCPM","8vs16_F-8vs16_P_F","8vs16_F-8vs16_P_nonadjPValue","8vs16_I-8vs16_J_logFC","8vs16_I-8vs16_J_logCPM","8vs16_I-8vs16_J_F","8vs16_I-8vs16_J_nonadjPValue","8vs16_I-8vs16_K_logFC","8vs16_I-8vs16_K_logCPM","8vs16_I-8vs16_K_F","8vs16_I-8vs16_K_nonadjPValue","8vs16_I-8vs16_P_logFC","8vs16_I-8vs16_P_logCPM","8vs16_I-8vs16_P_F","8vs16_I-8vs16_P_nonadjPValue","8vs16_J-8vs16_K_logFC","8vs16_J-8vs16_K_logCPM","8vs16_J-8vs16_K_F","8vs16_J-8vs16_K_nonadjPValue","8vs16_J-8vs16_P_logFC","8vs16_J-8vs16_P_logCPM","8vs16_J-8vs16_P_F","8vs16_J-8vs16_P_nonadjPValue","8vs16_K-8vs16_P_logFC","8vs16_K-8vs16_P_logCPM","8vs16_K-8vs16_P_F","8vs16_K-8vs16_P_nonadjPValue","8vs24_A-8vs24_B_logFC","8vs24_A-8vs24_B_logCPM","8vs24_A-8vs24_B_F","8vs24_A-8vs24_B_nonadjPValue", "8vs24_A-8vs24_D_logFC","8vs24_A-8vs24_D_logCPM","8vs24_A-8vs24_D_F","8vs24_A-8vs24_D_nonadjPValue","8vs24_A-8vs24_F_logFC","8vs24_A-8vs24_F_logCPM","8vs24_A-8vs24_F_F","8vs24_A-8vs24_F_nonadjPValue","8vs24_A-8vs24_I_logFC","8vs24_A-8vs24_I_logCPM","8vs24_A-8vs24_I_F","8vs24_A-8vs24_I_nonadjPValue","8vs24_A-8vs24_J_logFC","8vs24_A-8vs24_J_logCPM","8vs24_A-8vs24_J_F","8vs24_A-8vs24_J_nonadjPValue","8vs24_A-8vs24_K_logFC","8vs24_A-8vs24__logCPM","8vs24_A-8vs24_K_F","8vs24_A-8vs24_K_nonadjPValue","8vs24_A-8vs24_P_logFC","8vs24_A-8vs24_P_logCPM","8vs24_A-8vs24_P_F","8vs24_A-8vs24_P_nonadjPValue","8vs24_B-8vs24_D_logFC","8vs24_B-8vs24_D_logCPM","8vs24_B-8vs24_D_F","8vs24_B-8vs24_D_nonadjPValue","8vs24_B-8vs24_F_logFC","8vs24_B-8vs24_F_logCPM","8vs24_B-8vs24_F_F","8vs24_B-8vs24_F_nonadjPValue","8vs24_B-8vs24_I_logFC","8vs24_B-8vs24_I_logCPM","8vs24_B-8vs24_I_F","8vs24_B-8vs24_I_nonadjPValue","8vs24_B-8vs24_J_logFC","8vs24_B-8vs24_J_logCPM","8vs24_B-8vs24_J_F","8vs24_B-8vs24_J_nonadjPValue","8vs24_B-8vs24_K_logFC","8vs24_B-8vs24_K_logCPM","8vs24_B-8vs24_K_F","8vs24_B-8vs24_K_nonadjPValue","8vs24_B-8vs24_P_logFC","8vs24_B-8vs24_P_logCPM","8vs24_B-8vs24_P_F","8vs24_B-8vs24_P_nonadjPValue","8vs24_D-8vs24_F_logFC","8vs24_D-8vs24_F_logCPM","8vs24_D-8vs24_F_F","8vs24_D-8vs24_F_nonadjPValue","8vs24_D-8vs24_I_logFC","8vs24_D-8vs24_I_logCPM","8vs24_D-8vs24_I_F","8vs24_D-8vs24_I_nonadjPValue","8vs24_D-8vs24_J_logFC","8vs24_D-8vs24_J_logCPM","8vs24_D-8vs24_J_F","8vs24_D-8vs24_J_nonadjPValue","8vs24_D-8vs24_K_logFC","8vs24_D-8vs24_K_logCPM","8vs24_D-8vs24_K_F","8vs24_D-8vs24_K_nonadjPValue","8vs24_D-8vs24_P_logFC","8vs24_D-8vs24_P_logCPM","8vs24_D-8vs24_P_F","8vs24_D-8vs24_P_nonadjPValue","8vs24_F-8vs24_I_logFC","8vs24_F-8vs24_I_logCPM","8vs24_F-8vs24_I_F","8vs24_F-8vs24_I_nonadjPValue","8vs24_F-8vs24_J_logFC","8vs24_F-8vs24_J_logCPM","8vs24_F-8vs24_J_F","8vs24_F-8vs24_J_nonadjPValue","8vs24_F-8vs24_K_logFC","8vs24_F-8vs24_K_logCPM","8vs24_F-8vs24_K_F","8vs24_F-8vs24_K_nonadjPValue","8vs24_F-8vs24_P_logFC","8vs24_F-8vs24_P_logCPM","8vs24_F-8vs24_P_F","8vs24_F-8vs24_P_nonadjPValue","8vs24_I-8vs24_J_logFC","8vs24_I-8vs24_J_logCPM","8vs24_I-8vs24_J_F","8vs24_I-8vs24_J_nonadjPValue","8vs24_I-8vs24_K_logFC","8vs24_I-8vs24_K_logCPM","8vs24_I-8vs24_K_F","8vs24_I-8vs24_K_nonadjPValue","8vs24_I-8vs24_P_logFC","8vs24_I-8vs24_P_logCPM","8vs24_I-8vs24_P_F","8vs24_I-8vs24_P_nonadjPValue","8vs24_J-8vs24_K_logFC","8vs24_J-8vs24_K_logCPM","8vs24_J-8vs24_K_F","8vs24_J-8vs24_K_nonadjPValue","8vs24_J-8vs24_P_logFC","8vs24_J-8vs24_P_logCPM","8vs24_J-8vs24_P_F","8vs24_J-8vs24_P_nonadjPValue","8vs24_K-8vs24_P_logFC","8vs24_K-8vs24_P_logCPM","8vs24_K-8vs24_P_F","8vs24_K-8vs24_P_nonadjPValue","16vs24_A-16vs24_B_logFC","16vs24_A-16vs24_B_logCPM","16vs24_A-16vs24_B_F","16vs24_A-16vs24_B_nonadjPValue","16vs24_A-16vs24_D_logFC","16vs24_A-16vs24_D_logCPM","16vs24_A-16vs24_D_F","16vs24_A-16vs24_D_nonadjPValue","16vs24_A-16vs24_F_logFC","16vs24_A-16vs24_F_logCPM","16vs24_A-16vs24_F_F","16vs24_A-16vs24_F_nonadjPValue","16vs24_A-16vs24_I_logFC","16vs24_A-16vs24_I_logCPM","16vs24_A-16vs24_I_F","16vs24_A-16vs24_I_nonadjPValue","16vs24_A-16vs24_J_logFC","16vs24_A-16vs24_J_logCPM","16vs24_A-16vs24_J_F","16vs24_A-16vs24_J_nonadjPValue","16vs24_A-16vs24_K_logFC","16vs24_A-16vs24_K_logCPM","16vs24_A-16vs24_K_F","16vs24_A-16vs24_K_nonadjPValue","16vs24_A-16vs24_P_logFC","16vs24_A-16vs24_P_logCPM","16vs24_A-16vs24_P_F","16vs24_A-16vs24_P_nonadjPValue","16vs24_B-16vs24_D_logFC","16vs24_B-16vs24_D_logCPM","16vs24_B-16vs24_D_F","16vs24_B-16vs24_D_nonadjPValue","16vs24_B-16vs24_F_logFC","16vs24_B-16vs24_F_logCPM","16vs24_B-16vs24_F_F","16vs24_B-16vs24_F_nonadjPValue","16vs24_B-16vs24_I_logFC","16vs24_B-16vs24_I_logCPM","16vs24_B-16vs24_I_F","16vs24_B-16vs24_I_nonadjPValue","16vs24_B-16vs24_J_logFC","16vs24_B-16vs24_J_logCPM","16vs24_B-16vs24_J_F","16vs24_B-16vs24_J_nonadjPValue","16vs24_B-16vs24_K_logFC","16vs24_B-16vs24_K_logCPM","16vs24_B-16vs24_K_F","16vs24_B-16vs24_K_nonadjPValue","16vs24_B-16vs24_P_logFC","16vs24_B-16vs24_P_logCPM","16vs24_B-16vs24_P_F","16vs24_B-16vs24_P_nonadjPValue","16vs24_D-16vs24_F_logFC","16vs24_D-16vs24_F_logCPM","16vs24_D-16vs24_F_F","16vs24_D-16vs24_F_nonadjPValue","16vs24_D-16vs24_I_logFC","16vs24_D-16vs24_I_logCPM","16vs24_D-16vs24_I_F","16vs24_D-16vs24_I_nonadjPValue","16vs24_D-16vs24_J_logFC","16vs24_D-16vs24_J_logCPM","16vs24_D-16vs24_J_F","16vs24_D-16vs24_J_nonadjPValue","16vs24_D-16vs24_K_logFC","16vs24_D-16vs24_K_logCPM","16vs24_D-16vs24_K_F","16vs24_D-16vs24_K_nonadjPValue","16vs24_D-16vs24_P_logFC","16vs24_D-16vs24_P_logCPM","16vs24_D-16vs24_P_F","16vs24_D-16vs24_P_nonadjPValue","16vs24_F-16vs24_I_logFC","16vs24_F-16vs24_I_logCPM","16vs24_F-16vs24_I_F","16vs24_F-16vs24_I_nonadjPValue","16vs24_F-16vs24_J_logFC","16vs24_F-16vs24_J_logCPM","16vs24_F-16vs24_J_F","16vs24_F-16vs24_J_nonadjPValue","16vs24_F-16vs24_K_logFC","16vs24_F-16vs24_K_logCPM","16vs24_F-16vs24_K_F","16vs24_F-16vs24_K_nonadjPValue","16vs24_F-16vs24_P_logFC","16vs24_F-16vs24_P_logCPM","16vs24_F-16vs24_P_F","16vs24_F-16vs24_P_nonadjPValue","16vs24_I-16vs24_J_logFC","16vs24_I-16vs24_J_logCPM","16vs24_I-16vs24_J_F","16vs24_I-16vs24_J_nonadjPValue","16vs24_I-16vs24_K_logFC","16vs24_I-16vs24_K_logCPM","16vs24_I-16vs24_K_F","16vs24_I-16vs24_K_nonadjPValue","16vs24_I-16vs24_P_logFC","16vs24_I-16vs24_P_logCPM","16vs24_I-16vs24_P_F","16vs24_I-16vs24_P_nonadjPValue","16vs24_J-16vs24_K_logFC","16vs24_J-16vs24_K_logCPM","16vs24_J-16vs24_K_F","16vs24_J-16vs24_K_nonadjPValue","16vs24_J-16vs24_P_logFC","16vs24_J-16vs24_P_logCPM","16vs24_J-16vs24_P_F","16vs24_J-16vs24_P_nonadjPValue","16vs24_K-16vs24_P_logFC","16vs24_K-16vs24_P_logCPM","16vs24_K-16vs24_P_F","16vs24_K-16vs24_P_nonadjPValue")
# merge the data frames
table = merge(confirmationResults_RQ3_total_dataset,adjusted_p_RQ3, by = 0, all = TRUE)
# use the first column (gene names) for the row names
all_results_RQ3 = table[,-1]
rownames(all_results_RQ3) = table[,1]
In this section, we visualized the top 100 set of interaction-effect genes. This set of 100 genes is selected based on stageR’s FDR-adjusted P-value of the global null hypothesis (Padjscreen).
We started with selecting the top 100 set of genes with interaction-effects:
# rank genes based on P-value
Padjscreen_sorted_RQ3 = all_results_RQ3[with(all_results_RQ3, order(all_results_RQ3$padjScreen)),]
# select top 100 genes based on Padjscreen
Padjscreen_sorted_RQ3_top100 = rownames(Padjscreen_sorted_RQ3[1:100 ,])
OnlySignGenes_RQ3_ConStage_top100 = subset(OnlySignGenes_RQ3_ConStage, rownames(OnlySignGenes_RQ3_ConStage)%in%Padjscreen_sorted_RQ3_top100)
Next, we selected the same genes in the summary data frame of RQ1e2 (all_results_RQ1e2 from section 3.3):
# select top 100 interaction-effect genes in the response for each genotype (RQ2 - section 3.3)
all_results_RQ1e2_top100_RQ3 = subset(all_results_RQ1e2,
rownames(all_results_RQ1e2)%in%rownames(OnlySignGenes_RQ3_ConStage_top100))
# select columns with logFC values
all_results_RQ1e2_top100_RQ3_logFC = all_results_RQ1e2_top100_RQ3[,grepl("logFC",
colnames(all_results_RQ1e2_top100_RQ3))]
# remove columns of the average effect
all_results_RQ1e2_top100_RQ3_logFC = all_results_RQ1e2_top100_RQ3_logFC[-c(25,26,27)]
# select top 100 in OnlySignGenes_RQ3_ConStage object
OnlySignGenes_RQ1e2_ConStage_RQ3_top100 = subset(OnlySignGenes_RQ1e2_ConStage, rownames(OnlySignGenes_RQ1e2_ConStage)%in%rownames(OnlySignGenes_RQ3_ConStage_top100))
# remove columns of the average effect
OnlySignGenes_RQ1e2_ConStage_RQ3_top100_sel = OnlySignGenes_RQ1e2_ConStage_RQ3_top100[-c(1,26,27,28)]
# combine both data frames to replace all non-significant logFC values by NA
all_results_RQ1e2_top100_RQ3_logFC = type.convert(all_results_RQ1e2_top100_RQ3_logFC, as.is = TRUE)
all_results_RQ1e2_top100_RQ3_logFC_OnlySign = (0^(OnlySignGenes_RQ1e2_ConStage_RQ3_top100_sel == 0)) * all_results_RQ1e2_top100_RQ3_logFC
Next, we subdivided the top 100 set of interaction-effect genes in two categories: genes that differ in the direction of their response between genotypes, and genes that differ in the magnitude of their response between genotypes. We did this using the information retrieved in the code section above, thus using information on significance and logFC from RQ1e2 (section 3.3):
# select genes that are up- or downregulated in different directions regardless of logFC
RQ3_top100_DiffDir = subset(all_results_RQ1e2_top100_RQ3_logFC_OnlySign,
(rowSums(all_results_RQ1e2_top100_RQ3_logFC_OnlySign < 0) > 0) &
(rowSums(all_results_RQ1e2_top100_RQ3_logFC_OnlySign > 0) > 0))
length(rownames(RQ3_top100_DiffDir))
## [1] 91
# select genes that show effects in the same direction
RQ3_top100_SameDir = setdiff(rownames(OnlySignGenes_RQ3_ConStage_top100), rownames(RQ3_top100_DiffDir))
length(RQ3_top100_SameDir)
## [1] 9
Then, we checked whether the set of genes that differ in the direction of their response contain any genes that are significant in one genotype only:
# check for genes only DE in one genotype
RQ3_top100_DiffDir_uniqueDE = subset(RQ3_top100_DiffDir, rownames(RQ3_top100_DiffDir)%in%RQ1e2_uniqueDE)
RQ3_top100_DiffDir_uniqueDE
## [1] A8vsA16_logFC A16vsA24_logFC A8vsA24_logFC B8vsB16_logFC B16vsB24_logFC
## [6] B8vsB24_logFC D8vsD16_logFC D16vsD24_logFC D8vsD24_logFC F8vsF16_logFC
## [11] F16vsF24_logFC F8vsF24_logFC I8vsI16_logFC I16vsI24_logFC I8vsI24_logFC
## [16] J8vsJ16_logFC J16vsJ24_logFC J8vsJ24_logFC K8vsK16_logFC K16vsK24_logFC
## [21] K8vsK24_logFC P8vsP16_logFC P16vsP24_logFC P8vsP24_logFC
## <0 rows> (or 0-length row.names)
In this section, we performed GO enrichment using Fisher’s Exact test in TopGO. This GO enrichment was done on two sets of interaction-effect genes: genes that differ in the direction of their response between genotypes, and genes that differ in the magnitude of their response between genotypes.
We assigned all interaction-effect genes to these two categories using the same criteria as in section 3.4.4:
# create object with names of all genes that show interaction-effects
genes_OnlySign_Constage_RQ3 = rownames(OnlySignGenes_RQ3_ConStage)
# select columns with logFC values in the RQ1e2 summary data frame
all_results_RQ1e2_logFC = all_results_RQ1e2[,grepl("logFC", colnames(all_results_RQ1e2))]
all_results_RQ1e2_logFC = all_results_RQ1e2_logFC[-c(25,26,27)]
# select only significant genes in RQ3
OnlySignGenes_RQ1e2_ConStage_RQ3_OnlySign = subset(OnlySignGenes_RQ1e2_ConStage,rownames(OnlySignGenes_RQ1e2_ConStage)%in%genes_OnlySign_Constage_RQ3)
OnlySignGenes_RQ1e2_ConStage_RQ3_OnlySign = OnlySignGenes_RQ1e2_ConStage_RQ3_OnlySign[-c(1,26,27,28)]
all_results_RQ1e2_all_logFC_RQ3_OnlySign = subset(all_results_RQ1e2_logFC,
rownames(all_results_RQ1e2_logFC)%in%genes_OnlySign_Constage_RQ3)
# remove genes that are not significant in RQ1e2
intersect_genes = intersect(rownames(OnlySignGenes_RQ1e2_ConStage_RQ3_OnlySign),
rownames(all_results_RQ1e2_all_logFC_RQ3_OnlySign))
all_results_RQ1e2_all_logFC_RQ3_OnlySign_sub = subset(all_results_RQ1e2_all_logFC_RQ3_OnlySign,
rownames(all_results_RQ1e2_all_logFC_RQ3_OnlySign)
%in%intersect_genes)
# combine both data frames to replace all non-significant logFC values by NA
all_results_RQ1e2_all_logFC_RQ3_OnlySign = type.convert(all_results_RQ1e2_all_logFC_RQ3_OnlySign_sub,
as.is = TRUE)
all_results_RQ1e2_all_RQ3_logFC_OnlySign = (0^(OnlySignGenes_RQ1e2_ConStage_RQ3_OnlySign == 0)) * all_results_RQ1e2_all_logFC_RQ3_OnlySign_sub
# select genes that are up- or downregulated in different directions regardless of logFC
RQ3_all_DiffDir = subset(all_results_RQ1e2_all_RQ3_logFC_OnlySign,
(rowSums(all_results_RQ1e2_all_RQ3_logFC_OnlySign < 0) > 0) &
(rowSums(all_results_RQ1e2_all_RQ3_logFC_OnlySign > 0) > 0))
nrow(RQ3_all_DiffDir)
## [1] 1178
# select genes that are up- or downregulated in the same directions, but to a different extent or that are not significant in RQ1e2
RQ3_all_SameDir = setdiff(rownames(OnlySignGenes_RQ3_ConStage), rownames(RQ3_all_DiffDir))
length(RQ3_all_SameDir)
## [1] 2780
# check the DiffDir object for presence of genes that are uniquely DE in one genotype, as those do not actually represent differences in direction between genotypes, but actually represent differences in magnitude between genotypes. We did this using the RQ1e2_uniqueDE object created in section 3.3.4:
RQ3_all_DiffDir_uniqueDE = subset(RQ3_all_DiffDir, rownames(RQ3_all_DiffDir)%in%RQ1e2_uniqueDE)
nrow(RQ3_all_DiffDir_uniqueDE) # these special genes will be added to the SameDir object and removed from the DiffDir object
## [1] 39
# add the special genes identified above to the SameDir object
RQ3_all_SameDir2 = c(RQ3_all_SameDir,rownames(RQ3_all_DiffDir_uniqueDE))
length(RQ3_all_SameDir2)
## [1] 2819
# remove special genes from the DiffDir object
RQ3_all_DiffDir2 = RQ3_all_DiffDir[ ! rownames(RQ3_all_DiffDir) %in% rownames(RQ3_all_DiffDir_uniqueDE), ]
nrow(RQ3_all_DiffDir2)
## [1] 1139
Next, we performed GO enrichment on both sets separately.
First, the set of genes that differ in the direction of their response between genotypes:
# select the set of significant DE genes with GO terms
genesOfInterest_RQ3_all_DiffDir = rownames(subset(RQ3_all_DiffDir2, rownames(RQ3_all_DiffDir2) %in%geneUniverse))
length(genesOfInterest_RQ3_all_DiffDir)
# create gene list for input in topGO
geneList_RQ3_all_DiffDir = factor(as.integer(geneUniverse %in% genesOfInterest_RQ3_all_DiffDir))
names(geneList_RQ3_all_DiffDir) = geneUniverse
# create a topGO object (for biological process GOs)
GOdata_BP_RQ3_all_DiffDir = new("topGOdata", ontology="BP", allGenes=geneList_RQ3_all_DiffDir,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
# create a topGO object (for molecular function GOs)
GOdata_MF_RQ3_all_DiffDir = new("topGOdata", ontology="MF", allGenes=geneList_RQ3_all_DiffDir,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
# create a topGO object (for cellular component GOs)
GOdata_CC_RQ3_all_DiffDir = new("topGOdata", ontology="CC", allGenes=geneList_RQ3_all_DiffDir,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
# run Fisher's exact test
resultFisher_BP_RQ3_all_DiffDir = runTest(GOdata_BP_RQ3_all_DiffDir, algorithm = "elim", statistic = "fisher")
resultFisher_MF_RQ3_all_DiffDir = runTest(GOdata_MF_RQ3_all_DiffDir, algorithm = "elim", statistic = "fisher")
resultFisher_CC_RQ3_all_DiffDir = runTest(GOdata_CC_RQ3_all_DiffDir, algorithm = "elim", statistic = "fisher")
# extract the significant GO terms
allRes_BP_allDE_elim_RQ3_all_DiffDir = GenTable(GOdata_BP_RQ3_all_DiffDir,
classic = resultFisher_BP_RQ3_all_DiffDir ,
orderBy = "elim", ranksOf = "elim", topNodes = 45,
numChar=1000)
allRes_MF_allDE_elim_RQ3_all_DiffDir = GenTable(GOdata_MF_RQ3_all_DiffDir,
classic = resultFisher_MF_RQ3_all_DiffDir ,
orderBy = "elim", ranksOf = "elim", topNodes = 20,
numChar=1000)
allRes_CC_allDE_elim_RQ3_all_DiffDir = GenTable(GOdata_CC_RQ3_all_DiffDir,
classic = resultFisher_CC_RQ3_all_DiffDir ,
orderBy = "elim", ranksOf = "elim", topNodes = 10,
numChar=1000)
Next, the set of genes that differ in the magnitude of their response between genotypes:
# select the set of significant DE genes with GO terms
genesOfInterest_RQ3_all_SameDir = intersect(RQ3_all_SameDir2, geneUniverse)
length(genesOfInterest_RQ3_all_SameDir)
# create gene list for input in topGO
geneList_RQ3_all_SameDir = factor(as.integer(geneUniverse %in% genesOfInterest_RQ3_all_SameDir))
names(geneList_RQ3_all_SameDir) = geneUniverse
# create a topGO object (for biological process GOs)
GOdata_BP_RQ3_all_SameDir = new("topGOdata", ontology="BP", allGenes=geneList_RQ3_all_SameDir,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
# create a topGO object (for molecular function GOs)
GOdata_MF_RQ3_all_SameDir = new("topGOdata", ontology="MF", allGenes=geneList_RQ3_all_SameDir,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
# create a topGO object (for cellular component GOs)
GOdata_CC_RQ3_all_SameDir = new("topGOdata", ontology="CC", allGenes=geneList_RQ3_all_SameDir,
annot = annFUN.gene2GO, gene2GO = geneID2GO)
# run Fisher's exact test
resultFisher_BP_RQ3_all_SameDir = runTest(GOdata_BP_RQ3_all_SameDir, algorithm = "elim", statistic = "fisher")
resultFisher_MF_RQ3_all_SameDir = runTest(GOdata_MF_RQ3_all_SameDir, algorithm = "elim", statistic = "fisher")
resultFisher_CC_RQ3_all_SameDir = runTest(GOdata_CC_RQ3_all_SameDir, algorithm = "elim", statistic = "fisher")
# extract the significant GO terms
allRes_BP_allDE_elim_RQ3_all_SameDir = GenTable(GOdata_BP_RQ3_all_SameDir,
classic = resultFisher_BP_RQ3_all_SameDir ,
orderBy = "elim", ranksOf = "elim", topNodes = 75,
numChar=1000)
allRes_MF_allDE_elim_RQ3_all_SameDir = GenTable(GOdata_MF_RQ3_all_SameDir,
classic = resultFisher_MF_RQ3_all_SameDir ,
orderBy = "elim", ranksOf = "elim", topNodes = 60,
numChar=1000)
allRes_CC_allDE_elim_RQ3_all_SameDir = GenTable(GOdata_CC_RQ3_all_SameDir,
classic = resultFisher_CC_RQ3_all_SameDir ,
orderBy = "elim", ranksOf = "elim", topNodes = 10,
numChar=1000)
In a next step, we reduced the list with significant GO terms using the online application REVIGO. For REVIGO, we used the output of the Fisher’s Exact test (only including the GO terms that had a P-value <= 0.05) and used a 0.5 similarity threshold with the SimRel algorithm. P-values from the Fisher’s Exact Test were included in the input to REVIGO.
REVIGO for the above analyses was accessed on August 18th 2021, and used the Gene Ontology database of July 2nd 2021 and the UniProt-to-GO mapping database from June 17th 2021.
In a last step, we reran the RQ3 model, but now only for the 8-24 contrast. This was done because we are interested in these logFC’s for comparison with allele frequencies. The model was reran to decrease the FDR penalty imposed by multiple testing correction for this set of genes:
# define contrasts to test
C_RQ3=matrix(0,nrow=ncol(fit_group_model$coefficients),ncol=28)
rownames(C_RQ3)=colnames(fit_group_model$coefficients)
colnames(C_RQ3)=c("8vs24_A-8vs24_B","8vs24_A-8vs24_D","8vs24_A-8vs24_F","8vs24_A-8vs24_I",
"8vs24_A-8vs24_J","8vs24_A-8vs24_K","8vs24_A-8vs24_P","8vs24_B-8vs24_D",
"8vs24_B-8vs24_F","8vs24_B-8vs24_I","8vs24_B-8vs24_J","8vs24_B-8vs24_K",
"8vs24_B-8vs24_P","8vs24_D-8vs24_F","8vs24_D-8vs24_I","8vs24_D-8vs24_J",
"8vs24_D-8vs24_K","8vs24_D-8vs24_P","8vs24_F-8vs24_I","8vs24_F-8vs24_J",
"8vs24_F-8vs24_K","8vs24_F-8vs24_P","8vs24_I-8vs24_J","8vs24_I-8vs24_K",
"8vs24_I-8vs24_P","8vs24_J-8vs24_K","8vs24_J-8vs24_P","8vs24_K-8vs24_P")
# 8vs24 contrast
C_RQ3[c("A.24ppt","A.8ppt","B.24ppt","B.8ppt"),"8vs24_A-8vs24_B"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.8ppt","D.24ppt","D.8ppt"),"8vs24_A-8vs24_D"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.8ppt","F.24ppt","F.8ppt"),"8vs24_A-8vs24_F"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.8ppt","I.24ppt","I.8ppt"),"8vs24_A-8vs24_I"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.8ppt","J.24ppt","J.8ppt"),"8vs24_A-8vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.8ppt","K.24ppt","K.8ppt"),"8vs24_A-8vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.8ppt","P.24ppt","P.8ppt"),"8vs24_A-8vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.8ppt","D.24ppt","D.8ppt"),"8vs24_B-8vs24_D"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.8ppt","F.24ppt","F.8ppt"),"8vs24_B-8vs24_F"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.8ppt","I.24ppt","I.8ppt"),"8vs24_B-8vs24_I"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.8ppt","J.24ppt","J.8ppt"),"8vs24_B-8vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.8ppt","K.24ppt","K.8ppt"),"8vs24_B-8vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.8ppt","P.24ppt","P.8ppt"),"8vs24_B-8vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.8ppt","F.24ppt","F.8ppt"),"8vs24_D-8vs24_F"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.8ppt","I.24ppt","I.8ppt"),"8vs24_D-8vs24_I"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.8ppt","J.24ppt","J.8ppt"),"8vs24_D-8vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.8ppt","K.24ppt","K.8ppt"),"8vs24_D-8vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.8ppt","P.24ppt","P.8ppt"),"8vs24_D-8vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("F.24ppt","F.8ppt","I.24ppt","I.8ppt"),"8vs24_F-8vs24_I"]=c(-1,1,1,-1)
C_RQ3[c("F.24ppt","F.8ppt","J.24ppt","J.8ppt"),"8vs24_F-8vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("F.24ppt","F.8ppt","K.24ppt","K.8ppt"),"8vs24_F-8vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("F.24ppt","F.8ppt","P.24ppt","P.8ppt"),"8vs24_F-8vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("I.24ppt","I.8ppt","J.24ppt","J.8ppt"),"8vs24_I-8vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("I.24ppt","I.8ppt","K.24ppt","K.8ppt"),"8vs24_I-8vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("I.24ppt","I.8ppt","P.24ppt","P.8ppt"),"8vs24_I-8vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("J.24ppt","J.8ppt","K.24ppt","K.8ppt"),"8vs24_J-8vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("J.24ppt","J.8ppt","P.24ppt","P.8ppt"),"8vs24_J-8vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("K.24ppt","K.8ppt","P.24ppt","P.8ppt"),"8vs24_K-8vs24_P"]=c(-1,1,1,-1)
We followed the same procedure as in section 3.3.2.
# screening stage
alpha=0.05
screenTest_RQ3 = glmQLFTest(fit_group_model, contrast=C_RQ3)
pScreen_RQ3 = screenTest_RQ3$table$PValue
names(pScreen_RQ3) = rownames(screenTest_RQ3$table)
# confirmation stage
confirmationResults_RQ3 = sapply(1:ncol(C_RQ3),function(i) glmQLFTest(fit_group_model,
contrast = C_RQ3[,i]), simplify=FALSE)
confirmationPList_RQ3 = lapply(confirmationResults_RQ3, function(x) x$table$PValue)
confirmationP_RQ3 = as.matrix(Reduce(f=cbind,confirmationPList_RQ3))
rownames(confirmationP_RQ3) = rownames(confirmationResults_RQ3[[1]]$table)
colnames(confirmationP_RQ3) = colnames(C_RQ3)
stageRObj_RQ3 = stageR(pScreen=pScreen_RQ3, pConfirmation=confirmationP_RQ3)
stageRAdj_RQ3 = stageWiseAdjustment(object=stageRObj_RQ3, method="holm", alpha=0.05)
resRQ3 = getResults(stageRAdj_RQ3)
## The returned adjusted p-values are based on a stage-wise testing approach and are only valid for the provided target OFDR level of 5%. If a different target OFDR level is of interest,the entire adjustment should be re-run.
# get the significant genes for each contrast
SignifGenesRQ3 = colSums(resRQ3)
SignifGenesRQ3
## padjScreen 8vs24_A-8vs24_B 8vs24_A-8vs24_D 8vs24_A-8vs24_F 8vs24_A-8vs24_I
## 4472 725 636 417 273
## 8vs24_A-8vs24_J 8vs24_A-8vs24_K 8vs24_A-8vs24_P 8vs24_B-8vs24_D 8vs24_B-8vs24_F
## 318 563 690 742 481
## 8vs24_B-8vs24_I 8vs24_B-8vs24_J 8vs24_B-8vs24_K 8vs24_B-8vs24_P 8vs24_D-8vs24_F
## 539 323 374 390 450
## 8vs24_D-8vs24_I 8vs24_D-8vs24_J 8vs24_D-8vs24_K 8vs24_D-8vs24_P 8vs24_F-8vs24_I
## 186 208 559 753 264
## 8vs24_F-8vs24_J 8vs24_F-8vs24_K 8vs24_F-8vs24_P 8vs24_I-8vs24_J 8vs24_I-8vs24_K
## 145 252 298 105 260
## 8vs24_I-8vs24_P 8vs24_J-8vs24_K 8vs24_J-8vs24_P 8vs24_K-8vs24_P
## 649 206 168 199
adjusted_p_RQ3 = getAdjustedPValues(stageRAdj_RQ3, onlySignificantGenes = FALSE, order = FALSE)
## The returned adjusted p-values are based on a stage-wise testing approach and are only valid for the provided target OFDR level of 5%. If a different target OFDR level is of interest,the entire adjustment should be re-run.
# visualize number of significant genes
resRQ3_df = as.data.frame(resRQ3)
resRQ3_df2 = resRQ3_df
resRQ3_df2$gene = rownames(resRQ3_df2)
OnlySignGenes_RQ3 = resRQ3_df[resRQ3_df$padjScreen == 1,]
dim(OnlySignGenes_RQ3)
## [1] 4472 29
# select genes that were significant after the screening stage, but not the confirmation stage
genesSI_RQ3 = rownames(adjusted_p_RQ3)[adjusted_p_RQ3[,"padjScreen"]<=0.05]
genesNotFoundStageII_RQ3 = genesSI_RQ3[genesSI_RQ3 %in% rownames(resRQ3)[rowSums(resRQ3==0)==28]]
length(genesNotFoundStageII_RQ3)
## [1] 847
# select genes that were significant after the confirmation stage
OnlySignGenes_RQ3_ConStage = OnlySignGenes_RQ3 [!rownames(OnlySignGenes_RQ3 ) %in% genesNotFoundStageII_RQ3, ]
nrow(OnlySignGenes_RQ3_ConStage)
## [1] 3625
Before we continued with the downstream analyses, we created a single data object that contains some key-information of the statistical pipeline outlined above. This included information on logFC, logCPM and P-values for each gene for each contrast.
# select the adjusted P-values for each contrast
adjusted_p_RQ3 = getAdjustedPValues(stageRAdj_RQ3, onlySignificantGenes = FALSE, order = FALSE)
## The returned adjusted p-values are based on a stage-wise testing approach and are only valid for the provided target OFDR level of 5%. If a different target OFDR level is of interest,the entire adjustment should be re-run.
# rename column headers in adjusted_p_RQ3
colnames(adjusted_p_RQ3) = c("padjScreen","8vs24_A-8vs24_B_Padj","8vs24_A-8vs24_D_Padj",
"8vs24_A-8vs24_F_Padj","8vs24_A-8vs24_I_Padj","8vs24_A-8vs24_J_Padj",
"8vs24_A-8vs24_K_Padj","8vs24_A-8vs24_P_Padj","8vs24_B-8vs24_D_Padj",
"8vs24_B-8vs24_F_Padj","8vs24_B-8vs24_I_Padj","8vs24_B-8vs24_J_Padj",
"8vs24_B-8vs24_K_Padj","8vs24_B-8vs24_P_Padj","8vs24_D-8vs24_F_Padj",
"8vs24_D-8vs24_I_Padj","8vs24_D-8vs24_J_Padj","8vs24_D-8vs24_K_Padj",
"8vs24_D-8vs24_P_Padj","8vs24_F-8vs24_I_Padj","8vs24_F-8vs24_J_Padj",
"8vs24_F-8vs24_K_Padj","8vs24_F-8vs24_P_Padj","8vs24_I-8vs24_J_Padj",
"8vs24_I-8vs24_K_Padj","8vs24_I-8vs24_P_Padj","8vs24_J-8vs24_K_Padj",
"8vs24_J-8vs24_P_Padj","8vs24_K-8vs24_P_Padj")
# create empty list to hold the data values
datalist_RQ3 = list()
# loop over the confirmationResults_RQ3 object to obtain the relevant information (table)
for (contrast in c(1:28)){
table = confirmationResults_RQ3[[contrast]]$table
datalist_RQ3[[contrast]] = table
}
# turn list into data frame
confirmationResults_RQ3_total_dataset = data.frame(datalist_RQ3)
# rename column names for tractability
colnames(confirmationResults_RQ3_total_dataset)=c("8vs24_A-8vs24_B_logFC","8vs24_A-8vs24_B_logCPM","8vs24_A-8vs24_B_F","8vs24_A-8vs24_B_nonadjPValue", "8vs24_A-8vs24_D_logFC","8vs24_A-8vs24_D_logCPM","8vs24_A-8vs24_D_F","8vs24_A-8vs24_D_nonadjPValue","8vs24_A-8vs24_F_logFC","8vs24_A-8vs24_F_logCPM","8vs24_A-8vs24_F_F","8vs24_A-8vs24_F_nonadjPValue","8vs24_A-8vs24_I_logFC","8vs24_A-8vs24_I_logCPM","8vs24_A-8vs24_I_F","8vs24_A-8vs24_I_nonadjPValue","8vs24_A-8vs24_J_logFC","8vs24_A-8vs24_J_logCPM","8vs24_A-8vs24_J_F","8vs24_A-8vs24_J_nonadjPValue", "8vs24_A-8vs24_K_logFC","8vs24_A-8vs24__logCPM","8vs24_A-8vs24_K_F","8vs24_A-8vs24_K_nonadjPValue","8vs24_A-8vs24_P_logFC","8vs24_A-8vs24_P_logCPM", "8vs24_A-8vs24_P_F","8vs24_A-8vs24_P_nonadjPValue","8vs24_B-8vs24_D_logFC","8vs24_B-8vs24_D_logCPM","8vs24_B-8vs24_D_F","8vs24_B-8vs24_D_nonadjPValue","8vs24_B-8vs24_F_logFC","8vs24_B-8vs24_F_logCPM","8vs24_B-8vs24_F_F","8vs24_B-8vs24_F_nonadjPValue","8vs24_B-8vs24_I_logFC","8vs24_B-8vs24_I_logCPM","8vs24_B-8vs24_I_F","8vs24_B-8vs24_I_nonadjPValue","8vs24_B-8vs24_J_logFC","8vs24_B-8vs24_J_logCPM","8vs24_B-8vs24_J_F","8vs24_B-8vs24_J_nonadjPValue","8vs24_B-8vs24_K_logFC","8vs24_B-8vs24_K_logCPM","8vs24_B-8vs24_K_F", "8vs24_B-8vs24_K_nonadjPValue","8vs24_B-8vs24_P_logFC","8vs24_B-8vs24_P_logCPM","8vs24_B-8vs24_P_F","8vs24_B-8vs24_P_nonadjPValue","8vs24_D-8vs24_F_logFC", "8vs24_D-8vs24_F_logCPM","8vs24_D-8vs24_F_F","8vs24_D-8vs24_F_nonadjPValue","8vs24_D-8vs24_I_logFC","8vs24_D-8vs24_I_logCPM","8vs24_D-8vs24_I_F", "8vs24_D-8vs24_I_nonadjPValue","8vs24_D-8vs24_J_logFC","8vs24_D-8vs24_J_logCPM","8vs24_D-8vs24_J_F","8vs24_D-8vs24_J_nonadjPValue","8vs24_D-8vs24_K_logFC","8vs24_D-8vs24_K_logCPM","8vs24_D-8vs24_K_F","8vs24_D-8vs24_K_nonadjPValue","8vs24_D-8vs24_P_logFC","8vs24_D-8vs24_P_logCPM","8vs24_D-8vs24_P_F","8vs24_D-8vs24_P_nonadjPValue","8vs24_F-8vs24_I_logFC","8vs24_F-8vs24_I_logCPM", "8vs24_F-8vs24_I_F","8vs24_F-8vs24_I_nonadjPValue","8vs24_F-8vs24_J_logFC","8vs24_F-8vs24_J_logCPM","8vs24_F-8vs24_J_F","8vs24_F-8vs24_J_nonadjPValue","8vs24_F-8vs24_K_logFC","8vs24_F-8vs24_K_logCPM","8vs24_F-8vs24_K_F","8vs24_F-8vs24_K_nonadjPValue","8vs24_F-8vs24_P_logFC","8vs24_F-8vs24_P_logCPM","8vs24_F-8vs24_P_F","8vs24_F-8vs24_P_nonadjPValue","8vs24_I-8vs24_J_logFC","8vs24_I-8vs24_J_logCPM","8vs24_I-8vs24_J_F","8vs24_I-8vs24_J_nonadjPValue","8vs24_I-8vs24_K_logFC","8vs24_I-8vs24_K_logCPM","8vs24_I-8vs24_K_F","8vs24_I-8vs24_K_nonadjPValue","8vs24_I-8vs24_P_logFC","8vs24_I-8vs24_P_logCPM","8vs24_I-8vs24_P_F","8vs24_I-8vs24_P_nonadjPValue","8vs24_J-8vs24_K_logFC","8vs24_J-8vs24_K_logCPM","8vs24_J-8vs24_K_F","8vs24_J-8vs24_K_nonadjPValue","8vs24_J-8vs24_P_logFC","8vs24_J-8vs24_P_logCPM","8vs24_J-8vs24_P_F","8vs24_J-8vs24_P_nonadjPValue","8vs24_K-8vs24_P_logFC","8vs24_K-8vs24_P_logCPM","8vs24_K-8vs24_P_F","8vs24_K-8vs24_P_nonadjPValue")
# merge the data frames
table = merge(confirmationResults_RQ3_total_dataset,adjusted_p_RQ3, by = 0, all = TRUE)
# use the first column (gene names) for the row names
all_results_RQ3 = table[,-1]
rownames(all_results_RQ3) = table[,1]
Make a dataframe of significant logFC values:
# use the first column (gene names) for the row names
all_results_RQ3 = table[,-1]
rownames(all_results_RQ3) = table[,1]
# grep all the columns with logFC data
confirmationResults_RQ3_logFC = confirmationResults_RQ3_total_dataset[ , grepl("logFC", names(confirmationResults_RQ3_total_dataset))]
# grep for significant genes
confirmationResults_RQ3_logFC_sign = confirmationResults_RQ3_logFC[rownames(confirmationResults_RQ3_logFC) %in% rownames(OnlySignGenes_RQ3_ConStage), ]
# reset logFC values to zero that are not significant
OnlySignGenes_RQ3_ConStage.sub = OnlySignGenes_RQ3_ConStage[,-1]
dim(OnlySignGenes_RQ3_ConStage.sub)
## [1] 3625 28
dim(confirmationResults_RQ3_logFC_sign)
## [1] 3625 28
colnames(OnlySignGenes_RQ3_ConStage.sub)
## [1] "8vs24_A-8vs24_B" "8vs24_A-8vs24_D" "8vs24_A-8vs24_F" "8vs24_A-8vs24_I"
## [5] "8vs24_A-8vs24_J" "8vs24_A-8vs24_K" "8vs24_A-8vs24_P" "8vs24_B-8vs24_D"
## [9] "8vs24_B-8vs24_F" "8vs24_B-8vs24_I" "8vs24_B-8vs24_J" "8vs24_B-8vs24_K"
## [13] "8vs24_B-8vs24_P" "8vs24_D-8vs24_F" "8vs24_D-8vs24_I" "8vs24_D-8vs24_J"
## [17] "8vs24_D-8vs24_K" "8vs24_D-8vs24_P" "8vs24_F-8vs24_I" "8vs24_F-8vs24_J"
## [21] "8vs24_F-8vs24_K" "8vs24_F-8vs24_P" "8vs24_I-8vs24_J" "8vs24_I-8vs24_K"
## [25] "8vs24_I-8vs24_P" "8vs24_J-8vs24_K" "8vs24_J-8vs24_P" "8vs24_K-8vs24_P"
colnames(confirmationResults_RQ3_logFC_sign)
## [1] "8vs24_A-8vs24_B_logFC" "8vs24_A-8vs24_D_logFC" "8vs24_A-8vs24_F_logFC"
## [4] "8vs24_A-8vs24_I_logFC" "8vs24_A-8vs24_J_logFC" "8vs24_A-8vs24_K_logFC"
## [7] "8vs24_A-8vs24_P_logFC" "8vs24_B-8vs24_D_logFC" "8vs24_B-8vs24_F_logFC"
## [10] "8vs24_B-8vs24_I_logFC" "8vs24_B-8vs24_J_logFC" "8vs24_B-8vs24_K_logFC"
## [13] "8vs24_B-8vs24_P_logFC" "8vs24_D-8vs24_F_logFC" "8vs24_D-8vs24_I_logFC"
## [16] "8vs24_D-8vs24_J_logFC" "8vs24_D-8vs24_K_logFC" "8vs24_D-8vs24_P_logFC"
## [19] "8vs24_F-8vs24_I_logFC" "8vs24_F-8vs24_J_logFC" "8vs24_F-8vs24_K_logFC"
## [22] "8vs24_F-8vs24_P_logFC" "8vs24_I-8vs24_J_logFC" "8vs24_I-8vs24_K_logFC"
## [25] "8vs24_I-8vs24_P_logFC" "8vs24_J-8vs24_K_logFC" "8vs24_J-8vs24_P_logFC"
## [28] "8vs24_K-8vs24_P_logFC"
confirmationResults_RQ3_logFC_sign = type.convert(confirmationResults_RQ3_logFC_sign , as.is = TRUE)
confirmationResults_RQ3_logFC_sign = (NA^(OnlySignGenes_RQ3_ConStage.sub == 0)) * confirmationResults_RQ3_logFC_sign
# export data
#write.csv(confirmationResults_RQ3_logFC_sign, file = "Skmarinoi8x3_RQ3_logFC_significant_reduced_set.csv")
#write.csv(confirmationResults_RQ3_logFC, file = "Skmarinoi8x3_RQ3_logFC_all_reduced_set.csv")