1. Bioinformatics analysis of RNA-seq reads
- 1.1 STAR mapping
- 1.2 Read quantification with HTSeq on gene-level
2. Functional annotation of the Skeletonema marinoi genome Ref1.1.2
3. Statistic analysis in R

Document prepared by Eveline Pinseel, March 2023.

This document gives an overview of the reanalysis of the RNA-seq data published in Pinseel et al. 2022, ISME. This pipeline follows the exact steps described in Pinseel et al. 2022, but now using the Skmarinoi reference genome ref v1.1.2 as a reference. Therefore, I start this analysis from the quality controlled, trimmed reads obtained by Pinseel et al. 2022, and immediately proceed with mapping to the genome using STAR.

1. Bioinformatics analysis of RNA-seq reads

1.1 STAR mapping

We made the STAR index as follows:

# extract nuclear genome (= remove plastid and mitochondrial genome from GFF)
head -n 84053 Sm_ManualCuration.v1.1.2.gff > Sm_ManualCuration.v1.1.2_nuclear.gff

# create output directory
mkdir STAR_index_STAR2.7.10_100bpread_Ref_v1.1.2

# run STAR v2.7.10.a
module load STAR

STAR \
--runThreadN 10 \
--runMode genomeGenerate \
--genomeDir STAR_index_STAR2.7.10_100bpread_Ref_v1.1.2 \
--genomeFastaFiles Skeletonema_marinoi_Ref_v1.1.2_nuclear.fst \
--sjdbGTFfile Sm_ManualCuration.v1.1.2_nuclear.gff \
--sjdbGTFtagExonParentTranscript Parent \
--sjdbOverhang 99 \
--genomeSAindexNbases 11

##-genomeSAindexNbases was set to 12 in the Ref1.1 analysis. However, STAR complained about this with this new genome version and suggested to use 11 instead

Next, I extracted information on intron length from the gff. To extract the introns, I work in python:

#if gffutils is not installed, install first
#pip install gffutils

import gffutils

#create GFF database (you only need to do this once)
gffutils.create_db('Sm_ManualCuration.v1.1.2_nuclear.gff', 
                   'Sm_ManualCuration.v1.1.2_nuclear.db', keep_order=False,
                   merge_strategy='merge',sort_attribute_values=False, id_spec=['ID', 'Name'], force=True)
                 
#import the database
db = gffutils.FeatureDB('Sm_ManualCuration.v1.1.2_nuclear.db', keep_order=True)

#create introns
data = gffutils.FeatureDB.create_introns(db, exon_featuretype='exon', grandparent_featuretype=None, parent_featuretype='mRNA', new_featuretype='intron', merge_attributes=True)

#print all the introns
for intron in data:
    print(intron)
#copy output to text file to be treated as txt

Introns were stored in: Sm_ManualCuration.v1.1.2_nuclear-introns.txt.

Note that when calculating the introns for Ref1.1, I used grandparent_featuretype=‘Gene’ and parent_featuretype=None. I had to adjust this here because the new GFF does not contain feature types for gene but seems to indicate the positions of complete genes with mRNA.

Then I calculated intron lengths:

import csv

# import the file
GFF = 'Sm_ManualCuration.v1.1.2_nuclear-introns.txt'

# create empty list for the intron lengths
intron_lengths = []

#create a dictionary for the intron lengths and their IDs
dict_intron_length = {}

# read GFF file, line by line
with open(GFF, 'r') as gff_file:
        
    # create a csv reader without commented lines
    reader = csv.reader(gff_file, delimiter="\t")

    for line in reader:            
        # skip blank lines
        if not line:
            continue
                 
        else:
            # extract information from the GFF
            start = int(line[3])
            end   = int(line[4])
            attributes = line[8]
            
            # calculate the length of all the introns
            length = end - start + 1
            
            # create a list of all the intron lengths
            intron_lengths.append(length)
            
            # create a dictionary that links the intron lengths with the intron IDs
            dict_intron_length[length] = attributes

# calculate the min - max intron lengths
min_length = min(intron_lengths)
max_length = max(intron_lengths)

# print the min - max intron lengths
print("The minimum intron length is " + str(min_length))
print("The maximum intron length is " + str(max_length))

# export the intron lengths to a file
file = open("Skeletonema_marinoi_intron_lengths.txt", "w")
file.writelines(str(intron_lengths))
file.close()

#look for the minimum maximum values in the dictionary
min_intron = dict_intron_length.get(min_length)
max_intron = dict_intron_length.get(max_length)
print("The intron ID of the minimum intron length is: " + min_intron)
print("The intron ID of the maximum intron length is: " + max_intron)

#The minimum intron length is 4
#The maximum intron length is 17105
#The intron ID of the minimum intron length is: Parent=Sm_t00018725-RA
#The intron ID of the maximum intron length is: Parent=Sm_t00004715-RA

We ran STAR:

# create output directory
mkdir STAR_output

# create list of file names
ls ktrim_output/*read1.fq | sed "s/trimmed_Ktrim.read1.fq//" | sed "s,ktrim_output/,," > names_STAR_ktrim.txt

# run STAR v2.7.3.a
for i in $(cat names_STAR_ktrim.txt);do STAR \
--runThreadN 15 \
--genomeDir Skmarinoi_Ref_v1.1.2_2021-12-06/STAR_index_STAR2.7.10_100bpread_Ref_v1.1.2 \
--outSAMtype BAM SortedByCoordinate \
--alignIntronMin 4 \
--alignIntronMax 17105 \
--outReadsUnmapped Fastx \
--readFilesIn ktrim_output/$i\trimmed_Ktrim.read1.fq ktrim_output/$i\trimmed_Ktrim.read2.fq \
--outFileNamePrefix STAR_output/$i; done

Let’s extract information on read mapping:

# grep for mapping information
grep "Uniquely mapped reads %" *_Log.final.out | tr ' ' '_' | sed 's/%//1g' | sed "1s/^/File\tPercentage\n/" > STAR_uniquely_mapped_reads.txt

grep "% of reads mapped to multiple loci" *_Log.final.out | tr ' ' '_' | sed 's/%//1g' | sed "1s/^/File\tPercentage\n/" > STAR_reads_mapped_to_multiple_loci.txt

grep "% of reads mapped to too many loci" *_Log.final.out | tr ' ' '_' | sed 's/%//1g' | sed "1s/^/File\tPercentage\n/" > STAR_reads_mapped_to_too_many_loci.txt

grep "% of reads unmapped: too many mismatches" *_Log.final.out | tr ' ' '_' | sed 's/%//1g' | sed "1s/^/File\tPercentage\n/" > STAR_unmapped_too_many_mismatches.txt

grep "% of reads unmapped: too short" *_Log.final.out | tr ' ' '_' | sed 's/%//1g' | sed "1s/^/File\tPercentage\n/" > STAR_unmapped_too_short.txt

grep "% of reads unmapped: other" *_Log.final.out | tr ' ' '_' | sed 's/%//1g' | sed "1s/^/File\tPercentage\n/" > STAR_unmapped_other.txt

grep "% of chimeric reads" *_Log.final.out | tr ' ' '_' | sed 's/%//1g' | sed "1s/^/File\tPercentage\n/" > STAR_chimeric_reads.txt

#the first sed command removes all occurrences of the %-sign on each line: necessary for visualization in R
#the second sed command adds a title to the file, necessary for downstream analysis in R. Note that this code does not work on a Mac.
#the tr command removes spaces in the lines (problem in R)

# change names in files
for i in STAR*.txt;do paste $i short_name_STAR.txt > SN_$i;done

Plot the mapping data:

STAR_uniquely_mapped_reads = read.table("SN_STAR_uniquely_mapped_reads.txt", header = TRUE)
barplot(STAR_uniquely_mapped_reads$Percentage, main="STAR: uniquely mapped reads", ylim = c(0,100), ylab = "percentage (%)", names.arg= as.matrix(STAR_uniquely_mapped_reads$Short_name),cex.names=0.35, las=2, col = as.vector(STAR_uniquely_mapped_reads$Colour))

STAR_reads_mapped_to_multiple_loci = read.table("SN_STAR_reads_mapped_to_multiple_loci.txt", header = TRUE)
barplot(STAR_reads_mapped_to_multiple_loci$Percentage, main="STAR: reads mapped to multiple loci", ylab = "percentage (%)",names.arg= as.matrix(STAR_uniquely_mapped_reads$Short_name),cex.names=0.35, las=2, col = as.vector(STAR_reads_mapped_to_multiple_loci$Colour))

STAR_reads_mapped_to_too_many_loci = read.table("SN_STAR_reads_mapped_to_too_many_loci.txt", header = TRUE)
barplot(STAR_reads_mapped_to_too_many_loci$Percentage, main="STAR: reads mapped to too many loci", ylab = "percentage (%)",names.arg= as.matrix(STAR_uniquely_mapped_reads$Short_name),cex.names=0.35, las=2, col = as.vector(STAR_reads_mapped_to_too_many_loci$Colour))

STAR_unmapped_too_many_mismatches = read.table("SN_STAR_unmapped_too_many_mismatches.txt", header = TRUE)
barplot(STAR_unmapped_too_many_mismatches$Percentage, main="STAR: unmapped reads - too many mismatches", ylab = "percentage (%)",names.arg= as.matrix(STAR_uniquely_mapped_reads$Short_name),cex.names=0.35, las=2, col = as.vector(STAR_unmapped_too_many_mismatches$Colour))

STAR_unmapped_too_short = read.table("SN_STAR_unmapped_too_short.txt", header = TRUE)
barplot(STAR_unmapped_too_short$Percentage, main="STAR: unmapped reads - too short", ylab = "percentage (%)",names.arg= as.matrix(STAR_uniquely_mapped_reads$Short_name),cex.names=0.35, las=2, col = as.vector(STAR_unmapped_too_short$Colour))

STAR_chimeric_reads = read.table("SN_STAR_chimeric_reads.txt", header = TRUE)
barplot(STAR_chimeric_reads$Percentage, main="STAR: chimeric reads", ylab = "percentage (%)",names.arg= as.matrix(STAR_uniquely_mapped_reads$Short_name),cex.names=0.35, las=2, col = as.vector(STAR_chimeric_reads$Colour))

STAR_uniquely_mapped_reads = read.table("SN_STAR_uniquely_mapped_reads.txt", header = TRUE)
STAR_reads_mapped_to_multiple_loci = read.table("SN_STAR_reads_mapped_to_multiple_loci.txt", header = TRUE)

sum = STAR_uniquely_mapped_reads$Percentage + STAR_reads_mapped_to_multiple_loci$Percentage

barplot(sum, main="STAR: uniquely mapped reads + multimapped reads", ylab = "percentage (%)", ylim = c(0,100), names.arg= as.matrix(STAR_uniquely_mapped_reads$Short_name),cex.names=0.35, las=2, col = as.vector(STAR_uniquely_mapped_reads$Colour))

1.2 Read quantification with HTSeq on gene-level

I used the GFF file of S. marinoi that does not contain sequences and only includes the nuclear genome: Sm_ManualCuration.v1.1.2_nuclear.gff.

We ran HTSeq as outlined below. Note that sorting the BAM files was not necessary because STAR had already sorted the output:

# create list of file names (BAM output STAR)
ls STAR_output/*bam | sed "s/Aligned.sortedByCoord.out.bam//" | sed "s,STAR_output/,," > names_BAMfiles.txt

# create index files for all bam files (in STAR_output folder)
module load samtools #loads samtools v1.10
ls *.bam > names_BAMfiles.txt
for i in $(cat names_BAMfiles.txt); do \
samtools index $i; done

# load HTSeq [since July 2022]
module load gcc/11.2.1 mkl/19.0.5 python/3.10-anaconda;source /share/apps/bin/conda-3.10.sh;conda activate htseq-3.10

# run HTSeq v3.10 on gene-level
for i in $(cat names_BAMfiles.txt); do \
htseq-count \
--format=bam \
--order=pos \
--stranded=reverse \
--minaqual=10 \
--type=mRNA \
--idattr=ID \
--mode=union \
--nonunique=none \
--samout="$i"HTSeq.gene-level.out \
STAR_output/"$i"Aligned.sortedByCoord.out.bam \
Skmarinoi_Ref_v1.1.2_2021-12-06/Sm_ManualCuration.v1.1.2_nuclear.gff \
>> "$i"HTSeq.gene-level.out.STDOUT 2>> "$i"HTSeq.gene-level.out.STERROR;done

#for type=XXX: look at the third column in the GFF file! Running this with 'gene' did not work because the GFF does not contain any gene features.

The STDOUT file contains information on the number of reads that were counted and those that were not counted. To get a better grasp on these results, we need to extract them from the files:

grep "__no_feature" HTSeq_output/*STDOUT | sed "1s/^/File\tNumber\n/" > HTSeq_gene-level_nofeature.txt
grep "__ambiguous" HTSeq_output/*STDOUT | sed "1s/^/File\tNumber\n/" > HTSeq_gene-level_ambiguous.txt
grep "__too_low_aQual" HTSeq_output/*STDOUT | sed "1s/^/File\tNumber\n/" > HTSeq_gene-level_toolowaQual.txt
grep "__not_aligned" HTSeq_output/*STDOUT | sed "1s/^/File\tNumber\n/" > HTSeq_gene-level_not-aligned.txt
grep "__alignment_not_unique" HTSeq_output/*STDOUT | sed "1s/^/File\tNumber\n/" > HTSeq_gene-level_alignment_not_unique.txt
#sed introduces a header to the file

Above grep lines will only give us the reads that were not counted. To get the counted read numbers, we need to sum the counts for all the genes in each file:

#run recursively for multiple files
ls HTSeq_output/*STDOUT > HTSeq_gene_output_STDOUT_list.txt

for i in $(cat HTSeq_gene_output_STDOUT_list.txt); do head -n 17203 $i | cut -f 2 | awk '{s+=$1}END{print s}';done > HTSeq_gene-level_countedreads.txt 

#add labels to the counts
tail -n 72 HTSeq_gene-level_nofeature.txt | cut -f 1 > labels.txt; paste labels.txt HTSeq_gene-level_countedreads.txt | sed "1s/^/File\tNumber\n/" > HTSeq_gene-level_countedreads_labels.txt; mv HTSeq_gene-level_countedreads_labels.txt HTSeq_gene-level_countedreads.txt; rm labels.txt

HTSeq give absolute read-counts, but I’m interested in relative read-counts. We therefore need to have a file which contains the total read counts per sample. This can also be achieved based on the HTSeq output files:

for i in $(cat HTSeq_gene_output_STDOUT_list.txt); do cut -f 2 $i | awk '{s+=$1}END{print s}';done > HTSeq_gene-level_totalreads.txt

tail -n 72 HTSeq_gene-level_nofeature.txt | cut -f 1 > labels.txt; paste labels.txt HTSeq_gene-level_totalreads.txt | sed "1s/^/File\tNumber\n/" > HTSeq_gene-level_totalreads_labels.txt; mv HTSeq_gene-level_totalreads_labels.txt HTSeq_gene-level_totalreads.txt; rm labels.txt

Add shorter labels to the count data for visualization in R:

for i in HTSeq_gene-level*.txt;do paste $i STAR_output/short_name_STAR.txt > SN_$i;done

We will also need the total number of reads that were used as input in STAR:

grep "Number of input reads" STAR_output/*_Log.final.out | tr ' ' '_' | sed 's/%//1g' | sed "1s/^/File\tNumber\n/" > STAR_input_reads.txt

paste STAR_input_reads.txt STAR_output/short_name_STAR.txt > SN_STAR_input_reads.txt

Visualization in R:

Below code calculates the number of reads counted by HTSeq that were given as input to STAR:

HTSeq_gene_countedreads=read.table("SN_HTSeq_gene-level_countedreads.txt", header = TRUE)
STAR_input=read.table("SN_STAR_input_reads.txt", header = TRUE)
ratio = (HTSeq_gene_countedreads$Number / STAR_input$Number) * 100
barplot(ratio, main="HTSeq gene-level: counted reads [of STAR input]", ylim=c(0,70), ylab = "percentage (%)",names.arg= as.matrix(HTSeq_gene_countedreads$Short_name),cex.names=0.35, las=2, col = as.vector(HTSeq_gene_countedreads$Colour))

Below code calculates the number of reads counted by HTSeq that were mapped in STAR. Note that multimapped reads were excluded from the HTSeq analysis:

HTSeq_gene_countedreads=read.table("SN_HTSeq_gene-level_countedreads.txt", header = TRUE)
HTSeq_gene_total=read.table("SN_HTSeq_gene-level_totalreads.txt", header = TRUE)
ratio = (HTSeq_gene_countedreads$Number / HTSeq_gene_total$Number) * 100
barplot(ratio, main="HTSeq gene-level: counted reads [of STAR output]", ylim=c(0,80),ylab = "percentage (%)",names.arg= as.matrix(HTSeq_gene_countedreads$Short_name),cex.names=0.35, las=2, col = as.vector(HTSeq_gene_countedreads$Colour))

Below code calculates the number of reads counted by HTSeq that were uniquely mapped in STAR:

HTSeq_gene_countedreads=read.table("SN_HTSeq_gene-level_countedreads.txt", header = TRUE)
HTSeq_gene_total=read.table("SN_HTSeq_gene-level_totalreads.txt", header = TRUE)
HTSeq_gene_notunique=read.table("SN_HTSeq_gene-level_alignment_not_unique.txt", header = TRUE)
ratio = (HTSeq_gene_countedreads$Number / (HTSeq_gene_total$Number - HTSeq_gene_notunique$Number)) * 100
barplot(ratio, main="HTSeq gene-level: counted reads [of STAR uniquely mapped reads]", ylim=c(0,100), ylab = "percentage (%)",names.arg= as.matrix(HTSeq_gene_countedreads$Short_name),cex.names=0.35, las=2, col = as.vector(HTSeq_gene_countedreads$Colour))

Below code calculates the reads without feature that were uniquely mapped in STAR:

HTSeq_gene_nofeature=read.table("SN_HTSeq_gene-level_nofeature.txt", header = TRUE)
HTSeq_gene_total=read.table("SN_HTSeq_gene-level_totalreads.txt", header = TRUE)
HTSeq_gene_notunique=read.table("SN_HTSeq_gene-level_alignment_not_unique.txt", header = TRUE)
ratio = (HTSeq_gene_nofeature$Number / (HTSeq_gene_total$Number - HTSeq_gene_notunique$Number)) * 100
barplot(ratio, main="HTSeq gene-level: reads without feature [of STAR uniquely mapped reads]", ylim=c(0,30), ylab = "percentage (%)",names.arg= as.matrix(HTSeq_gene_countedreads$Short_name),cex.names=0.35, las=2, col = as.vector(HTSeq_gene_countedreads$Colour))

Below code calculates the ambiguous reads that were uniquely mapped in STAR:

HTSeq_gene_ambiguous=read.table("SN_HTSeq_gene-level_ambiguous.txt", header = TRUE)
HTSeq_gene_total=read.table("SN_HTSeq_gene-level_totalreads.txt", header = TRUE)
HTSeq_gene_notunique=read.table("SN_HTSeq_gene-level_alignment_not_unique.txt", header = TRUE)
ratio = (HTSeq_gene_ambiguous$Number / (HTSeq_gene_total$Number - HTSeq_gene_notunique$Number)) * 100
barplot(ratio, main="HTSeq gene-level: ambiguous reads [of STAR uniquely mapped reads]", ylim=c(0,1.2), ylab = "percentage (%)",names.arg= as.matrix(HTSeq_gene_countedreads$Short_name),cex.names=0.35, las=2, col = as.vector(HTSeq_gene_countedreads$Colour))

Finally, I combined all the count data into one file which will be used as input in R:

paste HTSeq_output/*STDOUT | cut -f 1,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126,128,130,132,134,136,138,140,142,144 > Skmarinoi8x3_reanalysis_ref1.1.2_gene-level_counts.txt

cat short_name_header.txt Skmarinoi8x3_reanalysis_ref1.1.2_gene-level_counts.txt > Skmarinoi8x3_reanalysis_ref1.1.2_gene-level_counts_FINAL.txt
#short_name_header.txt contains the short identifiers of all the samples in the order of the STDOUT files

2. Functional annotation of the Skeletonema marinoi genome Ref1.1.2

2.1 Extract proteins from the genome

In a first step, extract the protein and transcript files from the gff3 of the S. marinoi genome. I will do this using cufflinks.

# extract the protein and transcript files from the Maker gff3
/share/apps/bioinformatics/cufflinks/cufflinks-2.2.1.Linux_x86_64/gffread \
Sm_ManualCuration.v1.1.2_nuclear.gff \
-g Skeletonema_marinoi_Ref_v1.1.2_nuclear.fst \
-y Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins.fasta

This command creates a file that contains the protein translations of all the CDS regions. To be sure that this file matches the gff, I just quickly checked whether the number of proteins equals the number of genes. I also removed all the dots from the sequences because these will give problems later if included:

# calculate number of proteins by grepping for ">"
grep ">" Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins.fasta | wc -l
# result = 17203

# calculate number of genes + mRNA by grepping for "Sm_g" (this pattern is unique for lines with genes or mRNA)
grep "Sm_g" Sm_ManualCuration.v1.1.2_nuclear.gff | wc -l
# result = 17203

# remove dots from the input file
sed 's/\.//1g' Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins.fasta  > Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot.fasta

2.2 Swissprot

I used the Swissprot verison of June 1, 2020, which I already had available and which I used for Ref 1.1:

# load blast/2.13.0+
module load blast

# run blastp
blastp -db June1_2020/swissprot_db \
-query Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot.fasta \
-out Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins.sprot.blastp.out \
-evalue 1e-6 \
-outfmt 6 \
-num_alignments 1 \
-seg yes \
-soft_masking true \
-lcase_masking \
-max_hsps 1 \
-num_threads 8 \
> stdout.txt 2> stderror.txt

I then also ran Swissprot on the latest version of Swissprot - downloaded on September 28th 2022:

module load blast

# make blastp database using the swissprot.fasta as input file
makeblastdb -in swissprot_Sep28_2022.fasta -dbtype prot -out swissprot_db -title swissprot_db

# run blastp
blastp -db September28_2022/swissprot_db \
-query Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot.fasta \
-out Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins.sprot.blastp.Sep2022.out \
-evalue 1e-6 \
-outfmt 6 \
-num_alignments 1 \
-seg yes \
-soft_masking true \
-lcase_masking \
-max_hsps 1 \
-num_threads 8 \
> stdout.Sep2022.txt 2> stderror.Sep2022.txt

2.3 Uniprot

I will use the local copy of uniprot, downloaded by Wade in July 2019 (/home/wader/databases/ref-proteomes/).

The Uniprot data contains two databases: a diamond database (uniprot_ref_proteomes.dmnd) and a NCBI blast database (.phr; .pin .psq). I will use dimond (much faster than a standard blast search, although is less accurate than true blast):

# run diamond blast on Uniprot [diamond/2.0.1]
module load diamond

diamond blastp --db /home/wader/databases/ref-proteomes-2020/uniprot_ref_proteomes.dmnd \
--query Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot.fasta \
--outfmt 6 \
--evalue 1e-6 \
--max-target-seqs 1 \
--sensitive \
--max-hsps 1 \
--out Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins.uniprot.diamond.out \
--threads 16

# select uniprot IDs from output file
cut -f2 Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins.uniprot.diamond.out > Smarinoi_Ref1.1.2_uniprot_IDs.txt

Newer versions of uniprot can be downloaded here.

The output is a list of protein IDs. More information associated with these protein IDs can be retrieved by inputting the list on the Uniprot website: use the list option.

2.4 KEGG annotations

For KEGG, first remove the space in the header of the protein fasta:

#remove space in headers
sed 's/\s/_/1g' Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot.fasta > Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot_header.fasta

KofamKOALA only allows to submit 5,000 genes at the time. Since S. marinoi has ~17,000 genes in its genome, I will need to split the fasta file of the protein sequences into 4 subsets. To do this, I used the seqkit toolkit, after locally installing it on razor:

#sequences 1-5,000
seqkit head -n 5000 Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot_header.fasta > Skmarinoi_Ref1.1.2_proteins_1-5000.fasta

#sequences 5,001-10,000
seqkit range -r 5001:10000 Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot_header.fasta > Skmarinoi_Ref1.1.2_proteins_5001-10000.fasta

#sequences 10,001-15,000
seqkit range -r 10001:15000 Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot_header.fasta > Skmarinoi_Ref1.1.2_proteins_10001-15000.fasta

#sequences 15,001-17,203
seqkit range -r 15001:20000 Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot_header.fasta > Skmarinoi_Ref1.1.2_proteins_15001-17203.fasta

The output files of above code are separately submitted to the KofamKOALA tool on the koala webserver. The E-value parameter is set to 0.01 (default value). Results are returned via mail. I used the KofamKOALA version of 2022-08-01 (KEGG release 103.0).

Combine the five resulting output files into a single file:

#remove headers of all files, except the first
grep -v "#" Skmarinoi8x3_Ref1.1.2_genes5001-10000_KEGGresults.txt > Skmarinoi8x3_Ref1.1.2_genes5001-10000_KEGGresults_nohead.txt
grep -v "#" Skmarinoi8x3_Ref1.1.2_genes10001-15000_KEGGresults.txt > Skmarinoi8x3_Ref1.1.2_genes10001-15000_KEGGresults_nohead.txt
grep -v "#" Skmarinoi8x3_Ref1.1.2_genes15001-17203_KEGGresults.txt > Skmarinoi8x3_Ref1.1.2_genes15001-17203_KEGGresults_nohead.txt

#combine output files
cat Skmarinoi8x3_Ref1.1.2_genes1-5000_KEGGresults.txt Skmarinoi8x3_Ref1.1.2_genes5001-10000_KEGGresults_nohead.txt Skmarinoi8x3_Ref1.1.2_genes10001-15000_KEGGresults_nohead.txt Skmarinoi8x3_Ref1.1.2_genes15001-17203_KEGGresults_nohead.txt > Skmarinoi_Ref1.1.2_KEGG_full_output_kofamKOALA_2022Sep28.txt

#remove redundant files
rm *nohead.txt

How many genes received KEGG annotations?

grep -v "#" Skmarinoi_Ref1.1.2_KEGG_full_output_kofamKOALA_2022Sep28.txt | wc -l
#3638

This is way less than in the Ref1.1 genome: possibly sections that were considered to be different genes in Ref1.1 were combined into one gene in Ref1.1.2?

At last, let’s reduce the file to only include columns relevant for combining all data into one database (see below):

# get genes and KEGG numbers
sed 's/* //' Skmarinoi_Ref1.1.2_KEGG_full_output_kofamKOALA_2022Sep28.txt | sed 's/# //' | sed 's/#//' | sed '1d' | sed '1d' | sed 's/ /\t/' | sed 's/ /_/g' | sed 's/_[0-9]/\t/' | cut -f 1,2 | sed 's/_//g' | sed 's/gene=/\t/' | cut -f 1,3 | sed 's/Smt/Sm_t/' > KEGG_genes.txt 

# get KEGG info
sed 's/* //' Skmarinoi_Ref1.1.2_KEGG_full_output_kofamKOALA_2022Sep28.txt | sed 's/# //' | sed 's/#//' | sed '1d' | sed '1d' | sed 's/ /\t/' | sed 's/ /_/g' | sed 's/[0-9]_/\t/' | cut -f 3 | sed 's/[0-9]_/\t/' | cut -f 2 | sed 's/[0-9]_/\t/' | cut -f 2 | sed 's/[0-9]_/\t/' | cut -f 2 > KEGG_info.txt

# combine both files
paste KEGG_genes.txt KEGG_info.txt > Skmarinoi_Ref1.1.2_KEGG_full_output_kofamKOALA_2022Sep28_reduced.txt

2.5 InterProScan

Wade ran InterProScan for me. Probably using the following code:

# load the required java version
module load java/openjdk_14.0.1

# run InterProScan (using PBS script on scratch)
interproscan-3.44-79.0/interproscan.sh \
-appl Pfam,PRINTS,PANTHER,SMART,SignalP_EUK,TMHMM \
-iprlookup \
-goterms \
-cpu 8 \
-f tsv \
-i Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot.fasta \
2> InterProScan.stderror

Output file: Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot.fasta.tsv.NEW

2.6 Translate gene IDs Ref v1.1 to Ref v1.1.2

I want to know which gene IDs of Ref v1.1 correspond with those of Ref v1.1.2:

# load blast/2.13.0+
module load blast

# run blastp
blastp -db /functional_annotation/blast_Smarinoi/blast_db/S.marinoi_db \
-query ../Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot.fasta \
-out Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins.sprot.ref1.1.blastp.out \
-evalue 1e-6 \
-outfmt 6 \
-num_alignments 1 \
-seg yes \
-soft_masking true \
-lcase_masking \
-max_hsps 1 \
-num_threads 8 \
> stdout.txt 2> stderror.txt

But because there are many more genes in Ref v1.1, I also want to know which genes of Ref v1.1.2 correspond with those of Ref v1.1:

# make blastp database using the swissprot.fasta as input file
makeblastdb -in Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot.fasta -dbtype prot -out S.marinoi_ref1.1.2_db -title S.marinoi_db_ref1.1.2

# run blastp
blastp -db /functional_annotation/blast_db/S.marinoi_ref1.1.2_db \
-query Skeletonema_marinoi_Ref_v1.1_Primary.OnemRNAPerGene.proteins_shortproteinremoved2rmdot.fasta \
-out Skeletonema_marinoi_Ref_v1.1_nuclear.proteins.sprot.ref1.1.2.blastp.out \
-evalue 1e-6 \
-outfmt 6 \
-num_alignments 1 \
-seg yes \
-soft_masking true \
-lcase_masking \
-max_hsps 1 \
-num_threads 8 \
> stdout.txt 2> stderror.txt

2.7 Add functional annotation to the GFF

Now let’s add the InterProScan matches to the GFF of S. marinoi:

# load required modules
module load perl/3.24.0
module load exonerate/2.4.0
module load maker

# add InterProScan info to the GFF
ipr_update_gff Sm_ManualCuration.v1.1.2_nuclear.gff \
Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot.fasta.tsv.NEW > \
Sm_ManualCuration.v1.1.2_nuclear.functional_ipr.gff

However, there is an issue with the resulting gff file. If no annotations were present, a whole list of GO terms is added to a gene. We need to remove this:

# get a list of gene IDs in the whole genome
grep 'geneID' Sm_ManualCuration.v1.1.2_nuclear.gff | cut -f 9 | sed 's/=/\t/g' | sed 's/;geneID/\t/g' | cut -f 2 > Smarinoi_geneIDs_all.txt
wc -l Smarinoi_geneIDs_all.txt ##17203

# get a list of genes with annotations
cut -f 1 Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins_rmdot.fasta.tsv.NEW | uniq  > Smarinoi_geneIDs_genes-with-annotations.txt
wc -l Smarinoi_geneIDs_genes-with-annotations.txt ##12358

# get a list of genes without annotations
grep -v -f Smarinoi_geneIDs_genes-with-annotations.txt Smarinoi_geneIDs_all.txt > Smarinoi_geneIDS_genes-without-annotations.txt
wc -l Smarinoi_geneIDS_genes-without-annotations.txt ##4845

# reduce gff to lines with only gene IDs
grep 'geneID' Sm_ManualCuration.v1.1.2_nuclear.functional_ipr.gff > Sm_ManualCuration.v1.1.2_nuclear.functional_ipr_genes.gff
wc -l Smarinoi_geneIDS_genes-without-annotations.gff ##17203

# grep for the massive list of GO terms added to genes without annotations (e.g. number 9)
grep Sm_g00000009 Sm_ManualCuration.v1.1.2_nuclear.functional_ipr_genes.gff | sed 's/Ontology_term=/\t/' | cut -f 10 > test

# check whether the strange string of GO term count corresponds with the number of genes without GO terms
grep -f test Sm_ManualCuration.v1.1.2_nuclear.functional_ipr_genes.gff | wc -l ##10711
## doesn't fit: probably because some genes with annotations also don't have GO terms

# remove the massive list of GO terms from the gff
sed 's/Ontology_term=GO:0000009,GO:0000015,GO:0000027,GO:0000030,GO:0000045,GO:0000049,GO:0000055,GO:0000056,GO:0000062,GO:0000070,GO:0000077,GO:0000079,GO:0000105,GO:0000123,GO:0000124,GO:0000126,GO:0000139,GO:0000145,GO:0000148,GO:0000151,GO:0000154,GO:0000155,GO:0000159,GO:0000160,GO:0000164,GO:0000166,GO:0000172,GO:0000178,GO:0000179,GO:0000184,GO:0000213,GO:0000221,GO:0000225,GO:0000226,GO:0000228,GO:0000244,GO:0000245,GO:0000256,GO:0000275,GO:0000276,GO:0000278,GO:0000287,GO:0000289,GO:0000290,GO:0000338,GO:0000339,GO:0000340,GO:0000347,GO:0000350,GO:0000386,GO:0000387,GO:0000398,GO:0000408,GO:0000413,GO:0000422,GO:0000439,GO:0000462,GO:0000469,GO:0000470,GO:0000493,GO:0000502,GO:0000723,GO:0000724,GO:0000774,GO:0000776,GO:0000784,GO:0000786,GO:0000796,GO:0000808,GO:0000811,GO:0000812,GO:0000813,GO:0000814,GO:0000829,GO:0000906,GO:0000922,GO:0000930,GO:0000938,GO:0000956,GO:0000974,GO:0000981,GO:0000995,GO:0001164,GO:0001188,GO:0001510,GO:0001522,GO:0001671,GO:0001682,GO:0001731,GO:0001735,GO:0001882,GO:0002098,GO:0002100,GO:0002161,GO:0002943,GO:0002949,GO:0002953,GO:0003333,GO:0003341,GO:0003676,GO:0003677,GO:0003678,GO:0003682,GO:0003684,GO:0003688,GO:0003689,GO:0003690,GO:0003697,GO:0003700,GO:0003712,GO:0003713,GO:0003714,GO:0003721,GO:0003723,GO:0003724,GO:0003725,GO:0003729,GO:0003735,GO:0003743,GO:0003746,GO:0003747,GO:0003755,GO:0003756,GO:0003774,GO:0003777,GO:0003779,GO:0003824,GO:0003826,GO:0003830,GO:0003839,GO:0003843,GO:0003847,GO:0003848,GO:0003849,GO:0003852,GO:0003854,GO:0003855,GO:0003857,GO:0003860,GO:0003862,GO:0003863,GO:0003864,GO:0003868,GO:0003872,GO:0003873,GO:0003876,GO:0003879,GO:0003883,GO:0003884,GO:0003885,GO:0003887,GO:0003896,GO:0003899,GO:0003906,GO:0003910,GO:0003911,GO:0003916,GO:0003917,GO:0003918,GO:0003922,GO:0003923,GO:0003924,GO:0003934,GO:0003937,GO:0003951,GO:0003952,GO:0003954,GO:0003964,GO:0003975,GO:0003979,GO:0003980,GO:0003989,GO:0003993,GO:0003994,GO:0003995,GO:0003997,GO:0003998,GO:0004000,GO:0004001,GO:0004014,GO:0004017,GO:0004019,GO:0004037,GO:0004040,GO:0004042,GO:0004045,GO:0004055,GO:0004056,GO:0004057,GO:0004066,GO:0004070,GO:0004071,GO:0004076,GO:0004089,GO:0004096,GO:0004106,GO:0004107,GO:0004109,GO:0004111,GO:0004112,GO:0004114,GO:0004129,GO:0004140,GO:0004141,GO:0004143,GO:0004144,GO:0004146,GO:0004151,GO:0004161,GO:0004164,GO:0004170,GO:0004174,GO:0004175,GO:0004176,GO:0004177,GO:0004181,GO:0004185,GO:0004190,GO:0004197,GO:0004198,GO:0004222,GO:0004252,GO:0004298,GO:0004315,GO:0004325,GO:0004326,GO:0004329,GO:0004332,GO:0004333,GO:0004334,GO:0004335,GO:0004337,GO:0004340,GO:0004345,GO:0004347,GO:0004348,GO:0004351,GO:0004356,GO:0004358,GO:0004359,GO:0004360,GO:0004362,GO:0004363,GO:0004364,GO:0004366,GO:0004367,GO:0004368,GO:0004371,GO:0004372,GO:0004375,GO:0004378,GO:0004379,GO:0004386,GO:0004392,GO:0004399,GO:0004402,GO:0004407,GO:0004408,GO:0004411,GO:0004418,GO:0004420,GO:0004421,GO:0004424,GO:0004425,GO:0004427,GO:0004435,GO:0004450,GO:0004451,GO:0004455,GO:0004470,GO:0004471,GO:0004474,GO:0004476,GO:0004482,GO:0004483,GO:0004484,GO:0004488,GO:0004489,GO:0004491,GO:0004497,GO:0004499,GO:0004512,GO:0004514,GO:0004517,GO:0004518,GO:0004519,GO:0004521,GO:0004523,GO:0004525,GO:0004527,GO:0004535,GO:0004540,GO:0004550,GO:0004553,GO:0004557,GO:0004559,GO:0004563,GO:0004565,GO:0004568,GO:0004571,GO:0004576,GO:0004584,GO:0004587,GO:0004590,GO:0004591,GO:0004592,GO:0004594,GO:0004595,GO:0004597,GO:0004601,GO:0004602,GO:0004605,GO:0004609,GO:0004612,GO:0004615,GO:0004616,GO:0004618,GO:0004619,GO:0004634,GO:0004635,GO:0004637,GO:0004640,GO:0004641,GO:0004643,GO:0004645,GO:0004651,GO:0004652,GO:0004654,GO:0004655,GO:0004657,GO:0004664,GO:0004665,GO:0004668,GO:0004671,GO:0004672,GO:0004674,GO:0004683,GO:0004712,GO:0004713,GO:0004719,GO:0004721,GO:0004722,GO:0004731,GO:0004739,GO:0004743,GO:0004748,GO:0004749,GO:0004751,GO:0004764,GO:0004781,GO:0004784,GO:0004788,GO:0004799,GO:0004801,GO:0004803,GO:0004806,GO:0004807,GO:0004809,GO:0004812,GO:0004813,GO:0004814,GO:0004815,GO:0004820,GO:0004822,GO:0004823,GO:0004824,GO:0004825,GO:0004826,GO:0004827,GO:0004828,GO:0004829,GO:0004830,GO:0004832,GO:0004834,GO:0004842,GO:0004843,GO:0004844,GO:0004852,GO:0004853,GO:0004864,GO:0004865,GO:0004867,GO:0004888,GO:0004930,GO:0004965,GO:0005047,GO:0005049,GO:0005085,GO:0005086,GO:0005092,GO:0005096,GO:0005198,GO:0005200,GO:0005216,GO:0005230,GO:0005242,GO:0005246,GO:0005247,GO:0005249,GO:0005262,GO:0005267,GO:0005315,GO:0005319,GO:0005337,GO:0005375,GO:0005381,GO:0005384,GO:0005452,GO:0005457,GO:0005471,GO:0005484,GO:0005506,GO:0005507,GO:0005509,GO:0005515,GO:0005516,GO:0005524,GO:0005525,GO:0005534,GO:0005536,GO:0005542,GO:0005544,GO:0005576,GO:0005615,GO:0005634,GO:0005643,GO:0005655,GO:0005663,GO:0005665,GO:0005666,GO:0005667,GO:0005669,GO:0005673,GO:0005674,GO:0005675,GO:0005680,GO:0005681,GO:0005685,GO:0005694,GO:0005730,GO:0005732,GO:0005737,GO:0005739,GO:0005740,GO:0005741,GO:0005743,GO:0005744,GO:0005747,GO:0005750,GO:0005751,GO:0005758,GO:0005759,GO:0005761,GO:0005764,GO:0005765,GO:0005777,GO:0005778,GO:0005779,GO:0005783,GO:0005784,GO:0005786,GO:0005787,GO:0005788,GO:0005789,GO:0005794,GO:0005801,GO:0005815,GO:0005829,GO:0005832,GO:0005838,GO:0005839,GO:0005840,GO:0005846,GO:0005847,GO:0005852,GO:0005854,GO:0005858,GO:0005869,GO:0005871,GO:0005874,GO:0005886,GO:0005887,GO:0005929,GO:0005930,GO:0005956,GO:0005960,GO:0005965,GO:0005968,GO:0005971,GO:0005975,GO:0005992,GO:0005996,GO:0006000,GO:0006002,GO:0006003,GO:0006006,GO:0006012,GO:0006013,GO:0006021,GO:0006030,GO:0006032,GO:0006071,GO:0006072,GO:0006075,GO:0006078,GO:0006086,GO:0006090,GO:0006094,GO:0006096,GO:0006097,GO:0006098,GO:0006099,GO:0006106,GO:0006108,GO:0006120,GO:0006122,GO:0006139,GO:0006164,GO:0006165,GO:0006166,GO:0006177,GO:0006183,GO:0006189,GO:0006190,GO:0006206,GO:0006207,GO:0006213,GO:0006221,GO:0006226,GO:0006228,GO:0006231,GO:0006241,GO:0006259,GO:0006260,GO:0006261,GO:0006265,GO:0006269,GO:0006270,GO:0006275,GO:0006281,GO:0006283,GO:0006284,GO:0006289,GO:0006298,GO:0006301,GO:0006302,GO:0006303,GO:0006306,GO:0006307,GO:0006310,GO:0006313,GO:0006325,GO:0006333,GO:0006334,GO:0006338,GO:0006348,GO:0006351,GO:0006352,GO:0006355,GO:0006357,GO:0006360,GO:0006364,GO:0006366,GO:0006367,GO:0006368,GO:0006370,GO:0006376,GO:0006378,GO:0006379,GO:0006383,GO:0006384,GO:0006388,GO:0006396,GO:0006397,GO:0006400,GO:0006401,GO:0006402,GO:0006406,GO:0006412,GO:0006413,GO:0006414,GO:0006415,GO:0006417,GO:0006418,GO:0006419,GO:0006420,GO:0006422,GO:0006426,GO:0006428,GO:0006429,GO:0006430,GO:0006431,GO:0006432,GO:0006433,GO:0006434,GO:0006435,GO:0006436,GO:0006438,GO:0006452,GO:0006457,GO:0006464,GO:0006465,GO:0006468,GO:0006470,GO:0006478,GO:0006479,GO:0006480,GO:0006481,GO:0006486,GO:0006487,GO:0006488,GO:0006490,GO:0006491,GO:0006493,GO:0006499,GO:0006506,GO:0006508,GO:0006511,GO:0006513,GO:0006520,GO:0006525,GO:0006526,GO:0006529,GO:0006535,GO:0006536,GO:0006537,GO:0006542,GO:0006545,GO:0006546,GO:0006555,GO:0006559,GO:0006562,GO:0006568,GO:0006570,GO:0006571,GO:0006596,GO:0006597,GO:0006605,GO:0006606,GO:0006614,GO:0006621,GO:0006623,GO:0006625,GO:0006627,GO:0006629,GO:0006631,GO:0006633,GO:0006635,GO:0006644,GO:0006650,GO:0006656,GO:0006661,GO:0006665,GO:0006694,GO:0006725,GO:0006729,GO:0006741,GO:0006744,GO:0006749,GO:0006750,GO:0006751,GO:0006777,GO:0006779,GO:0006780,GO:0006783,GO:0006784,GO:0006788,GO:0006796,GO:0006801,GO:0006807,GO:0006809,GO:0006811,GO:0006812,GO:0006813,GO:0006814,GO:0006817,GO:0006820,GO:0006821,GO:0006825,GO:0006850,GO:0006862,GO:0006869,GO:0006885,GO:0006886,GO:0006887,GO:0006888,GO:0006890,GO:0006891,GO:0006897,GO:0006904,GO:0006913,GO:0006914,GO:0006952,GO:0006974,GO:0006979,GO:0007005,GO:0007010,GO:0007015,GO:0007017,GO:0007018,GO:0007020,GO:0007021,GO:0007023,GO:0007030,GO:0007031,GO:0007033,GO:0007034,GO:0007062,GO:0007064,GO:0007076,GO:0007093,GO:0007094,GO:0007095,GO:0007131,GO:0007155,GO:0007165,GO:0007186,GO:0007205,GO:0007219,GO:0007224,GO:0007264,GO:0008017,GO:0008022,GO:0008033,GO:0008047,GO:0008061,GO:0008076,GO:0008080,GO:0008081,GO:0008097,GO:0008113,GO:0008121,GO:0008124,GO:0008131,GO:0008134,GO:0008137,GO:0008138,GO:0008146,GO:0008168,GO:0008171,GO:0008173,GO:0008175,GO:0008176,GO:0008180,GO:0008198,GO:0008199,GO:0008233,GO:0008234,GO:0008235,GO:0008236,GO:0008237,GO:0008251,GO:0008270,GO:0008272,GO:0008278,GO:0008289,GO:0008290,GO:0008295,GO:0008299,GO:0008308,GO:0008312,GO:0008318,GO:0008320,GO:0008324,GO:0008353,GO:0008374,GO:0008375,GO:0008380,GO:0008408,GO:0008409,GO:0008410,GO:0008413,GO:0008417,GO:0008418,GO:0008420,GO:0008444,GO:0008452,GO:0008476,GO:0008478,GO:0008483,GO:0008484,GO:0008495,GO:0008519,GO:0008521,GO:0008531,GO:0008534,GO:0008536,GO:0008537,GO:0008540,GO:0008541,GO:0008559,GO:0008609,GO:0008610,GO:0008612,GO:0008616,GO:0008622,GO:0008641,GO:0008649,GO:0008652,GO:0008654,GO:0008661,GO:0008685,GO:0008686,GO:0008703,GO:0008705,GO:0008734,GO:0008757,GO:0008762,GO:0008767,GO:0008804,GO:0008810,GO:0008818,GO:0008824,GO:0008836,GO:0008837,GO:0008839,GO:0008864,GO:0008883,GO:0008887,GO:0008897,GO:0008914,GO:0008929,GO:0008935,GO:0008939,GO:0008942,GO:0008963,GO:0008964,GO:0008970,GO:0008974,GO:0008977,GO:0008986,GO:0008987,GO:0008990,GO:0009001,GO:0009008,GO:0009039,GO:0009052,GO:0009055,GO:0009058,GO:0009072,GO:0009073,GO:0009082,GO:0009083,GO:0009086,GO:0009089,GO:0009094,GO:0009098,GO:0009102,GO:0009107,GO:0009113,GO:0009116,GO:0009117,GO:0009143,GO:0009165,GO:0009166,GO:0009190,GO:0009228,GO:0009229,GO:0009231,GO:0009234,GO:0009235,GO:0009236,GO:0009247,GO:0009263,GO:0009298,GO:0009308,GO:0009312,GO:0009330,GO:0009331,GO:0009341,GO:0009349,GO:0009376,GO:0009396,GO:0009416,GO:0009435,GO:0009439,GO:0009443,GO:0009446,GO:0009451,GO:0009452,GO:0009496,GO:0009507,GO:0009523,GO:0009535,GO:0009584,GO:0009611,GO:0009642,GO:0009644,GO:0009654,GO:0009678,GO:0009765,GO:0009773,GO:0009877,GO:0009916,GO:0009966,GO:0009976,GO:0009982,GO:0010024,GO:0010038,GO:0010181,GO:0010207,GO:0010212,GO:0010242,GO:0010265,GO:0010277,GO:0010309,GO:0010389,GO:0010390,GO:0010468,GO:0010485,GO:0010756,GO:0010997,GO:0015031,GO:0015035,GO:0015074,GO:0015075,GO:0015078,GO:0015095,GO:0015097,GO:0015098,GO:0015109,GO:0015114,GO:0015116,GO:0015144,GO:0015165,GO:0015204,GO:0015267,GO:0015276,GO:0015297,GO:0015299,GO:0015321,GO:0015385,GO:0015629,GO:0015689,GO:0015693,GO:0015694,GO:0015696,GO:0015703,GO:0015708,GO:0015914,GO:0015930,GO:0015934,GO:0015935,GO:0015936,GO:0015937,GO:0015940,GO:0015969,GO:0015977,GO:0015979,GO:0015986,GO:0015995,GO:0016020,GO:0016021,GO:0016035,GO:0016036,GO:0016042,GO:0016051,GO:0016070,GO:0016114,GO:0016151,GO:0016192,GO:0016209,GO:0016226,GO:0016255,GO:0016272,GO:0016279,GO:0016301,GO:0016307,GO:0016310,GO:0016311,GO:0016316,GO:0016409,GO:0016422,GO:0016428,GO:0016429,GO:0016435,GO:0016459,GO:0016462,GO:0016471,GO:0016480,GO:0016485,GO:0016491,GO:0016504,GO:0016531,GO:0016538,GO:0016559,GO:0016560,GO:0016567,GO:0016570,GO:0016571,GO:0016572,GO:0016573,GO:0016575,GO:0016578,GO:0016579,GO:0016586,GO:0016592,GO:0016593,GO:0016597,GO:0016598,GO:0016603,GO:0016614,GO:0016615,GO:0016616,GO:0016620,GO:0016624,GO:0016627,GO:0016630,GO:0016636,GO:0016638,GO:0016651,GO:0016661,GO:0016670,GO:0016671,GO:0016679,GO:0016701,GO:0016702,GO:0016705,GO:0016706,GO:0016714,GO:0016715,GO:0016717,GO:0016730,GO:0016740,GO:0016742,GO:0016743,GO:0016746,GO:0016747,GO:0016756,GO:0016757,GO:0016758,GO:0016763,GO:0016765,GO:0016772,GO:0016773,GO:0016779,GO:0016780,GO:0016783,GO:0016785,GO:0016787,GO:0016788,GO:0016791,GO:0016798,GO:0016799,GO:0016805,GO:0016810,GO:0016811,GO:0016817,GO:0016818,GO:0016829,GO:0016831,GO:0016832,GO:0016836,GO:0016844,GO:0016846,GO:0016849,GO:0016851,GO:0016852,GO:0016853,GO:0016857,GO:0016866,GO:0016868,GO:0016872,GO:0016874,GO:0016884,GO:0016887,GO:0016889,GO:0016899,GO:0016903,GO:0016925,GO:0016971,GO:0016972,GO:0016973,GO:0016987,GO:0016992,GO:0016998,GO:0017004,GO:0017009,GO:0017025,GO:0017038,GO:0017056,GO:0017070,GO:0017108,GO:0017112,GO:0017116,GO:0017119,GO:0017121,GO:0017128,GO:0017137,GO:0017150,GO:0017176,GO:0017183,GO:0017186,GO:0017196,GO:0018024,GO:0018025,GO:0018193,GO:0018298,GO:0018342,GO:0018343,GO:0018344,GO:0018580,GO:0019001,GO:0019005,GO:0019008,GO:0019079,GO:0019205,GO:0019211,GO:0019237,GO:0019239,GO:0019242,GO:0019264,GO:0019288,GO:0019310,GO:0019346,GO:0019432,GO:0019464,GO:0019509,GO:0019538,GO:0019722,GO:0019752,GO:0019773,GO:0019774,GO:0019781,GO:0019789,GO:0019825,GO:0019843,GO:0019856,GO:0019867,GO:0019887,GO:0019888,GO:0019894,GO:0019898,GO:0019901,GO:0019903,GO:0019904,GO:0019915,GO:0019948,GO:0019985,GO:0019988,GO:0020037,GO:0022625,GO:0022857,GO:0022900,GO:0022904,GO:0030001,GO:0030008,GO:0030014,GO:0030015,GO:0030026,GO:0030036,GO:0030042,GO:0030058,GO:0030071,GO:0030091,GO:0030117,GO:0030123,GO:0030126,GO:0030127,GO:0030130,GO:0030131,GO:0030132,GO:0030145,GO:0030150,GO:0030151,GO:0030163,GO:0030170,GO:0030171,GO:0030173,GO:0030176,GO:0030234,GO:0030242,GO:0030246,GO:0030259,GO:0030261,GO:0030286,GO:0030328,GO:0030332,GO:0030337,GO:0030433,GO:0030488,GO:0030515,GO:0030532,GO:0030604,GO:0030623,GO:0030628,GO:0030677,GO:0030686,GO:0030688,GO:0030870,GO:0030880,GO:0030896,GO:0030906,GO:0030915,GO:0030942,GO:0030955,GO:0030975,GO:0030976,GO:0030983,GO:0030992,GO:0031011,GO:0031071,GO:0031083,GO:0031122,GO:0031124,GO:0031145,GO:0031146,GO:0031151,GO:0031167,GO:0031177,GO:0031201,GO:0031204,GO:0031207,GO:0031251,GO:0031262,GO:0031297,GO:0031369,GO:0031390,GO:0031417,GO:0031418,GO:0031419,GO:0031422,GO:0031491,GO:0031514,GO:0031515,GO:0031571,GO:0031588,GO:0031625,GO:0031683,GO:0031902,GO:0031929,GO:0031931,GO:0031932,GO:0032006,GO:0032007,GO:0032008,GO:0032012,GO:0032039,GO:0032040,GO:0032049,GO:0032259,GO:0032264,GO:0032299,GO:0032300,GO:0032366,GO:0032456,GO:0032469,GO:0032509,GO:0032515,GO:0032549,GO:0032574,GO:0032777,GO:0032784,GO:0032786,GO:0032957,GO:0032963,GO:0032968,GO:0032977,GO:0032981,GO:0033014,GO:0033063,GO:0033177,GO:0033178,GO:0033179,GO:0033180,GO:0033384,GO:0033539,GO:0033567,GO:0033573,GO:0033588,GO:0033617,GO:0033674,GO:0033743,GO:0033897,GO:0034066,GO:0034128,GO:0034198,GO:0034219,GO:0034220,GO:0034450,GO:0034457,GO:0034474,GO:0034477,GO:0034511,GO:0034553,GO:0034729,GO:0034755,GO:0035082,GO:0035091,GO:0035098,GO:0035101,GO:0035194,GO:0035246,GO:0035267,GO:0035299,GO:0035312,GO:0035368,GO:0035434,GO:0035435,GO:0035494,GO:0035515,GO:0035516,GO:0035522,GO:0035552,GO:0035553,GO:0035556,GO:0035591,GO:0035596,GO:0035999,GO:0036085,GO:0036159,GO:0036265,GO:0036297,GO:0036310,GO:0036361,GO:0036374,GO:0036402,GO:0036459,GO:0036524,GO:0040014,GO:0042023,GO:0042026,GO:0042073,GO:0042128,GO:0042132,GO:0042147,GO:0042162,GO:0042176,GO:0042242,GO:0042245,GO:0042254,GO:0042256,GO:0042264,GO:0042273,GO:0042274,GO:0042281,GO:0042283,GO:0042373,GO:0042450,GO:0042549,GO:0042555,GO:0042558,GO:0042578,GO:0042597,GO:0042623,GO:0042626,GO:0042651,GO:0042719,GO:0042720,GO:0042721,GO:0042765,GO:0042803,GO:0042819,GO:0042823,GO:0042908,GO:0042910,GO:0043015,GO:0043022,GO:0043023,GO:0043039,GO:0043043,GO:0043044,GO:0043047,GO:0043066,GO:0043085,GO:0043087,GO:0043130,GO:0043138,GO:0043139,GO:0043154,GO:0043161,GO:0043190,GO:0043231,GO:0043240,GO:0043248,GO:0043399,GO:0043419,GO:0043461,GO:0043486,GO:0043531,GO:0043547,GO:0043564,GO:0043565,GO:0043622,GO:0043625,GO:0043631,GO:0043666,GO:0043752,GO:0043967,GO:0043968,GO:0043998,GO:0044237,GO:0044238,GO:0044341,GO:0044458,GO:0044571,GO:0044666,GO:0044877,GO:0045038,GO:0045039,GO:0045047,GO:0045048,GO:0045116,GO:0045131,GO:0045239,GO:0045261,GO:0045292,GO:0045300,GO:0045337,GO:0045454,GO:0045737,GO:0045859,GO:0045892,GO:0045893,GO:0045900,GO:0045901,GO:0045905,GO:0045910,GO:0046034,GO:0046081,GO:0046168,GO:0046314,GO:0046406,GO:0046416,GO:0046422,GO:0046429,GO:0046488,GO:0046540,GO:0046654,GO:0046677,GO:0046695,GO:0046777,GO:0046835,GO:0046854,GO:0046856,GO:0046872,GO:0046873,GO:0046907,GO:0046912,GO:0046923,GO:0046933,GO:0046938,GO:0046961,GO:0046982,GO:0046983,GO:0047057,GO:0047325,GO:0047429,GO:0047661,GO:0047793,GO:0048015,GO:0048029,GO:0048034,GO:0048037,GO:0048038,GO:0048188,GO:0048193,GO:0048278,GO:0048472,GO:0048478,GO:0048487,GO:0048500,GO:0048678,GO:0048870,GO:0050080,GO:0050113,GO:0050242,GO:0050290,GO:0050333,GO:0050483,GO:0050660,GO:0050661,GO:0050662,GO:0050897,GO:0050992,GO:0051016,GO:0051028,GO:0051056,GO:0051073,GO:0051082,GO:0051087,GO:0051103,GO:0051156,GO:0051168,GO:0051188,GO:0051213,GO:0051259,GO:0051260,GO:0051276,GO:0051287,GO:0051304,GO:0051315,GO:0051382,GO:0051499,GO:0051536,GO:0051537,GO:0051539,GO:0051603,GO:0051726,GO:0051745,GO:0051879,GO:0051920,GO:0051998,GO:0052725,GO:0052726,GO:0052824,GO:0052855,GO:0052861,GO:0055085,GO:0055087,GO:0055114,GO:0060090,GO:0060271,GO:0061575,GO:0061578,GO:0061608,GO:0061617,GO:0061630,GO:0065003,GO:0070008,GO:0070011,GO:0070072,GO:0070204,GO:0070286,GO:0070402,GO:0070403,GO:0070476,GO:0070481,GO:0070567,GO:0070569,GO:0070577,GO:0070628,GO:0070682,GO:0070772,GO:0070773,GO:0070860,GO:0070897,GO:0070940,GO:0070966,GO:0070985,GO:0070988,GO:0071013,GO:0071025,GO:0071203,GO:0071209,GO:0071586,GO:0071596,GO:0071704,GO:0071805,GO:0071821,GO:0071918,GO:0071949,GO:0071985,GO:0071986,GO:0072321,GO:0072357,GO:0072487,GO:0072546,GO:0080009,GO:0080019,GO:0080085,GO:0089701,GO:0090114,GO:0090481,GO:0090522,GO:0090730,GO:0097027,GO:0097056,GO:0097255,GO:0097361,GO:0097367,GO:0097428,GO:0098519,GO:0098656,GO:0099122,GO:0101005,GO:0106035,GO:0106050,GO:0120009,GO:0120013,GO:0140326,GO:1901135,GO:1901137,GO:1901642,GO:1902412,GO:1902445,GO:1902600,GO:1902979,GO:1904263,GO:1904668,GO:1905775,GO:1990112,GO:1990116,GO:1990316,GO:1990380,GO:1990745;//' Sm_ManualCuration.v1.1.2_nuclear.functional_ipr_genes.gff > Sm_ManualCuration.v1.1.2_nuclear.functional_ipr_genes2.gff

Now, I can also add the blastp hits to the gff:

# add swissprot blastp hits to gff
maker_functional_gff swissprot/September28_2022/swissprot_Sep28_2022.fasta \
swissprot/Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins.sprot.blastp.Sep2022.out \
Sm_ManualCuration.v1.1.2_nuclear.functional_ipr_genes2.gff \
> Sm_ManualCuration.v1.1.2_nuclear.functional_ipr_swissprot.gff

Let’s do a quality check.

# test line numbers
wc -l Sm_ManualCuration.v1.1.2_nuclear.functional_ipr_swissprot.gff #17203

Other databases were not added, but can be looked at separately.

2.8 Combine information on functional annotation in one file

2.8.1 Extract information on GO annotations for GO enrichment

Extract GO annotations from the InterProScan file. The output file will be used for the GO enrichment analyses in R:

# grep for lines that contain GO information
grep 'GO:' Sm_ManualCuration.v1.1.2_nuclear.functional_ipr_swissprot.gff > Sm_ManualCuration.v1.1.2_nuclear.functional_ipr_swissprot.GO.gff 

# get GO terms
python extract_GOterms.py Sm_ManualCuration.v1.1.2_nuclear.functional_ipr_swissprot.GO.gff | sed 's/ /\t/' | sort > Smarinoi_Ref1.1.2_GOterms.txt

This is the script:

#! /usr/bin/python3

##Python script to extract genes names and GO information from a GFF file

import argparse
import csv

#create an argument parser object
parser = argparse.ArgumentParser(description = "This script extracts gene names and GO information from a GFF file. Note that prior to running this script, the GFF needs to be reduced to only contain $

#add positional argument for the input position in the Fib sequence
parser.add_argument("GFF", help="Name of the GFF file")

#parse the arguments
args = parser.parse_args()

#create two empty lists to store the information on gene names and GO terms
gene_names = []
GO_terms = []

#create a csv reader object
with open(args.GFF,"r") as gff:

    #create csv.reader object
    reader = csv.reader(gff,delimiter="\t")

    for line in reader:
        #skip blank lines
        if not line:
            continue

        else:

            #access data in the GFF file
            for field in reader:

                #get gene names
                gene_field = field[8].split(";")
                gene = gene_field[0].split("ID=")
                gene_names.append(gene[1])

                #get GO terms
                GO_field = field[8].split("Ontology_term=")
                GO = GO_field[1].split(";")
                GO_terms.append(GO[0])

#check the lay-out of the lists
#print(gene_names)
#print(GO_terms)

#check the length of the lists
#print("The length of the gene names list is: ", len(gene_names))
#print("The length of the GO terms list is: ", len(GO_terms))

#create dictionary
zip_iterator = zip(gene_names,GO_terms)
genes_GO_dict = dict(zip_iterator)
#print(genes_GO_dict)

#print the dictionary in table format
for gene, GO in genes_GO_dict.items():
    print('{} {}'.format(gene, GO))

The output of the script looks as follows:

Sm_t00006568-RA GO:0005515
Sm_t00002445-RA GO:0004512,GO:0006021,GO:0008654
Sm_t00001746-RA GO:0003723,GO:0006396,GO:0008173
Sm_t00001746-RA GO:0008168
Sm_t00004904-RA GO:0005515
Sm_t00013811-RA GO:0005515
Sm_t00000656-RA GO:0006629
Sm_t00011110-RA GO:0003924,GO:0005525,GO:0006913
Sm_t00011110-RA GO:0003924,GO:0005525
Sm_t00000013-RA GO:0008061
#etc.

2.8.2 Create file with all relevant information on functional annotation

We will be adjusting the python command to get lists of each individual parameter:

# prep the file
sed 's/Note=/Swissprot:/' Sm_ManualCuration.v1.1.2_nuclear.functional_ipr_swissprot.gff | sed 's/PANTHER://2g' | sed 's/InterPro://2g' | sed 's/Pfam://2g' | sed 's/SMART://2g' | sed 's/SignalP_EUK://2g' | sed 's/PRINTS://2g' | sed 's/,PANTHER:/;PANTHER:/' |  sed 's/,Pfam:/;Pfam:/' | sed 's/,SMART:/;SMART:/' | sed 's/,SignalP_EUK:/;SignalP_EUK:/' | sed 's/,PRINTS:/;PRINTS:/'> prep.gff

# InterPro
grep 'InterPro:' prep.gff > InterPro.gff; python extract_InterPro.py InterPro.gff | sed 's/ /\t/' | sort > Smarinoi_Ref1.1.2_InterPro.txt; rm InterPro.gff
# Panther
grep 'PANTHER:' prep.gff > Panther.gff; python extract_Panther.py Panther.gff | sed 's/ /\t/' | sort > Smarinoi_Ref1.1.2_Panther.txt; rm Panther.gff
# PRINTS
grep 'PRINTS:' prep.gff > Prints.gff; python extract_Prints.py Prints.gff | sed 's/ /\t/' | sort > Smarinoi_Ref1.1.2_Prints.txt; rm Prints.gff
# Pfam
grep 'Pfam:' prep.gff > Pfam.gff; python extract_Pfam.py Pfam.gff | sed 's/ /\t/' | sort > Smarinoi_Ref1.1.2_Pfam.txt; rm Pfam.gff
# SMART
grep 'SMART:' prep.gff > SMART.gff; python extract_Smart.py SMART.gff | sed 's/ /\t/' | sort > Smarinoi_Ref1.1.2_SMART.txt; rm SMART.gff
# Swissprot
grep 'Swissprot:' prep.gff > Swissprot.gff; python extract_Swissprot.py Swissprot.gff | sed 's/ /\t/' | sed 's/ /_/g'  | sort > Smarinoi_Ref1.1.2_Swissprot.txt; rm Swissprot.gff
# SignalP
grep 'SignalP_EUK:' prep.gff > SignalP.gff; python extract_SignalP.py SignalP.gff | sed 's/ /\t/' | sort > Smarinoi_Ref1.1.2_SignalP.txt; rm SignalP.gff

Finally, all databases are joined in R:

# import data
GO = read.table("Smarinoi_Ref1.1.2_GOterms.txt", header=FALSE)
InterPro = read.table("Smarinoi_Ref1.1.2_InterPro.txt", header=FALSE)
Panther = read.table("Smarinoi_Ref1.1.2_Panther.txt", header=FALSE)
Pfam = read.table("Smarinoi_Ref1.1.2_Pfam.txt", header=FALSE)
Prints = read.table("Smarinoi_Ref1.1.2_Prints.txt", header=FALSE)
Smart = read.table("Smarinoi_Ref1.1.2_SMART.txt", header=FALSE)
Swissprot = read.table("Smarinoi_Ref1.1.2_Swissprot.txt", header=FALSE)
SignalP = read.table("Smarinoi_Ref1.1.2_SignalP.txt", header=FALSE)
KEGG = read.table("KEGG_kofamkoalaoutput/Skmarinoi_Ref1.1.2_KEGG_full_output_kofamKOALA_2022Sep28_reduced.txt")
Uniprot = read.table("Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins.uniprot.diamond.out.sel", header=FALSE)
Uniprot2 = read.csv("uniprot-compressed_true_download_true_fields_accession_2Creviewed_2C-2023.01.30-23.41.23.80.csv", header=TRUE)
blast = read.table("Skeletonema_marinoi_Ref_v1.1.2_nuclear.proteins.sprot.ref1.1.blastp.out", header=FALSE)
  
# column names
colnames(GO) = c('genes', 'GO')
colnames(InterPro) = c('genes', 'InterPro')
colnames(Panther) = c('genes', 'Panther')
colnames(Pfam) = c('genes', 'Pfam')
colnames(Prints) = c('genes', 'Prints')
colnames(Smart) = c('genes', 'Smart')
colnames(Swissprot) = c('genes', 'Swissprot')
colnames(SignalP) = c('genes', 'SignalP')
colnames(KEGG) = c('genes', 'KEGG_ID', 'KEGG_info')
colnames(Uniprot) = c('genes', 'Uniprot_hit')
blast = blast[,c(1:2)]
colnames(blast) = c('genes', 'ID_ref1.1.1')

# merge data
tmp1 = merge(blast, Swissprot, by = c("genes"), all=TRUE)
tmp2 = merge(tmp1, Uniprot, by = c("genes"), all=TRUE)
tmp3 = merge(tmp2, Uniprot2, by = "Uniprot_hit", all.x = TRUE)
tmp4 = merge(tmp3, GO, by = c("genes"), all=TRUE)
tmp5 = merge(tmp4, KEGG, by = c("genes"), all=TRUE)
tmp6 = merge(tmp5, InterPro, by = c("genes"), all=TRUE)
tmp7 = merge(tmp6, Panther, by = c("genes"), all=TRUE)
tmp8 = merge(tmp7, Pfam, by = c("genes"), all=TRUE)
tmp9 = merge(tmp8, Prints, by = c("genes"), all=TRUE)
tmp10 = merge(tmp9, Smart, by = c("genes"), all=TRUE)
tmp11 = merge(tmp10, SignalP, by = c("genes"), all=TRUE)
data = tmp11

# export data
write.csv(data, "Smarinoi_Ref1.1.2_full-annotation.csv")

Note that for some genes, there are multiple output lines. This happened for genes with multiple KEGG hits.

3. Statistic analysis in R

We used R v4.0.2 for our analyses.

The required R-packages:

library("edgeR")

## Warning: package 'edgeR' was built under R version 4.1.1

## Warning: package 'limma' was built under R version 4.1.3

library("stageR")

## Warning: package 'stageR' was built under R version 4.1.1

## Warning: package 'SummarizedExperiment' was built under R version 4.1.1

## Warning: package 'MatrixGenerics' was built under R version 4.1.1

## Warning: package 'matrixStats' was built under R version 4.1.2

## Warning: package 'GenomicRanges' was built under R version 4.1.2

## Warning: package 'BiocGenerics' was built under R version 4.1.1

## Warning: package 'IRanges' was built under R version 4.1.1

## Warning: package 'GenomeInfoDb' was built under R version 4.1.2

## Warning: package 'Biobase' was built under R version 4.1.1

library("limma")
library("topGO")

## Warning: package 'topGO' was built under R version 4.1.1

## Warning: package 'graph' was built under R version 4.1.1

## Warning: package 'AnnotationDbi' was built under R version 4.1.2

library("GO.db") 
library("topconfects")

## Warning: package 'topconfects' was built under R version 4.1.1

library("UpSetR")
library("PoiClaClu")
library("RColorBrewer")

## Warning: package 'RColorBrewer' was built under R version 4.1.2

library("pheatmap")
library("ggplot2")

## Warning: package 'ggplot2' was built under R version 4.1.2

library("ComplexHeatmap")

## Warning: package 'ComplexHeatmap' was built under R version 4.1.1

library("VennDiagram")

## Warning: package 'VennDiagram' was built under R version 4.1.2

library("tidyr")

## Warning: package 'tidyr' was built under R version 4.1.2

library("plyr")

## Warning: package 'plyr' was built under R version 4.1.2

Used package versions:

# check package versions of all used packages
packageVersion("edgeR")

## [1] '3.36.0'

packageVersion("stageR")

## [1] '1.16.0'

packageVersion("limma")

## [1] '3.50.3'

packageVersion("topGO")

## [1] '2.46.0'

packageVersion("GO.db")

## [1] '3.14.0'

packageVersion("topconfects")

## [1] '1.10.0'

packageVersion("UpSetR")

## [1] '1.4.0'

packageVersion("PoiClaClu")

## [1] '1.0.2.1'

packageVersion("RColorBrewer")

## [1] '1.1.3'

packageVersion("pheatmap")

## [1] '1.0.12'

packageVersion("ggplot2")

## [1] '3.4.1'

packageVersion("ComplexHeatmap")

## [1] '2.10.0'

packageVersion("VennDiagram")

## [1] '1.7.3'

packageVersion("tidyr")

## [1] '1.3.0'

packageVersion("plyr")

## [1] '1.8.8'

3.1 Model fitting in EdgeR

3.1.1 Import dataset in R

We imported the dataset (output HTSeq) as follows:

# import the count data
x = read.table("02.Skmarinoi8x3_rna-seq_reanalysis/Skmarinoi8x3_reanalysis_ref1.1.2_gene-level_counts_FINAL.txt",header=TRUE)
x = x[-c(17204:17208), ] #drop the last lines that do not contain information on gene counts
total_gene_number = nrow(x) #total number of genes in the analysis

EdgeR works with a DGEList data class object. This needed to be created using the count data and a group object that contains information on the different groups:

# create a DGEList data class
group = c("A.16ppt","A.16ppt","A.16ppt","A.24ppt","A.24ppt","A.24ppt","A.8ppt","A.8ppt","A.8ppt","B.16ppt","B.16ppt","B.16ppt","B.24ppt","B.24ppt","B.24ppt","B.8ppt","B.8ppt","B.8ppt","D.16ppt","D.16ppt","D.16ppt","D.24ppt","D.24ppt","D.24ppt","D.8ppt","D.8ppt","D.8ppt","F.16ppt","F.16ppt","F.16ppt","F.24ppt","F.24ppt","F.24ppt","F.8ppt","F.8ppt","F.8ppt","I.16ppt","I.16ppt","I.16ppt","I.24ppt","I.24ppt","I.24ppt","I.8ppt","I.8ppt","I.8ppt","J.16ppt","J.16ppt","J.16ppt","J.24ppt","J.24ppt","J.24ppt","J.8ppt","J.8ppt","J.8ppt","K.16ppt","K.16ppt","K.16ppt","K.24ppt","K.24ppt","K.24ppt","K.8ppt","K.8ppt","K.8ppt","P.16ppt","P.16ppt","P.16ppt","P.24ppt","P.24ppt","P.24ppt","P.8ppt","P.8ppt","P.8ppt")

y = DGEList(counts=x, group=group)

3.1.2 Data filtering

In a next step, the genes that have very low counts across all the libraries were removed. Filtering was done using the CPM (count per million). Here, we retained all the genes that have least one CPM in at least three samples:

# filter out lowly expressed genes
keep = rowSums(cpm(y)>1)>=3 #keep genes that have a least one count per million in at least three samples
y = y[keep,]
y$samples$lib.size = colSums(y$counts)

3.1.3 Data normalization

Next, we calculated a set of normalization factors (one for each sample) to eliminate composition biases between libraries:

# calculate normalization factors
y = calcNormFactors(y, method = 'TMM') #normalizes for RNA composition (highly expressed genes)

3.1.4 MD plots

In a next step, we generated mean-difference (MD) plots for each sample. A MD plot allows exploring the expression profiles of individual samples more closely. A MD plot visualizes the library size-adjusted log-fold change between two libraries (the difference) against the average log-expression across those libraries (the mean).

# plot MD plots for genotype A
par(mfrow=c(3,3))
for (library in c(1:9)){
  plotMD(y, column = library)
  abline(h=0, col="red",lty=2,lwd=2)}

# plot MD plots for genotype B
par(mfrow=c(3,3))
for (library in c(10:18)){
  plotMD(y, column = library)
  abline(h=0, col="red",lty=2,lwd=2)}

# plot MD plots for genotype D
par(mfrow=c(3,3))
for (library in c(19:27)){
  plotMD(y, column = library)
  abline(h=0, col="red",lty=2,lwd=2)}

# plot MD plots for genotype F
par(mfrow=c(3,3))
for (library in c(28:36)){
  plotMD(y, column = library)
  abline(h=0, col="red",lty=2,lwd=2)}

# plot MD plots for genotype I
par(mfrow=c(3,3))
for (library in c(37:45)){
  plotMD(y, column = library)
  abline(h=0, col="red",lty=2,lwd=2)}

# plot MD plots for genotype J
par(mfrow=c(3,3))
for (library in c(46:54)){
  plotMD(y, column = library)
  abline(h=0, col="red",lty=2,lwd=2)}

# plot MD plots for genotype K
par(mfrow=c(3,3))
for (library in c(55:63)){
  plotMD(y, column = library)
  abline(h=0, col="red",lty=2,lwd=2)}

# plot MD plots for genotype P
par(mfrow=c(3,3))
for (library in c(64:72)){
  plotMD(y, column = library)
  abline(h=0, col="red",lty=2,lwd=2)}

3.1.5 Design matrix

Next, we created a design matrix. This matrix allowed for pairwise comparisons between genotype+conditions when doing the analyses on Differential Expression:

# create a design matrix without intercept
design = model.matrix(~0+group, data=y$samples)
colnames(design) = levels(y$samples$group)

3.1.6 Dispersion estimation

Prior to differential expression analysis, dispersion needed to be estimated:

# estimate dispersion
y = estimateDisp(y, design)

# visualize dispersion
plotBCV(y)

3.1.7 Model fitting

We fitted the actual model. We used the glmQLFit function, which is a quasi-likelihood (QL) method that accounts for gene-specific variability from both biological and technical sources:

# model fitting
fit_group_model = glmQLFit(y, design, robust=TRUE)
plotQLDisp(fit_group_model)

3.2 Explore general characteristics of the dataset

3.2.1 MDS plot

To explore the dataset, we plotted a MDS plot.

In a MDS plot, the distance between each pair of samples can be interpreted as the leading log-fold change between the samples for the genes that best distinguish that pair of samples. By default, leading fold-change is defined as the root-mean-square of the largest 500 log2-fold changes between that pair of samples.

# define colors for salinities
high = '#433E85FF'
med = '#1E9B8AFF'
low = "#C2DF23FF"
colors = rep(c(med,high,low),8)

# define symbols for genotypes
pch = c(21,21,21,24,24,24,10,10,10,22,22,22,25,25,25,8,8,8,23,23,23,12,12,12)

# plot MDS plot
plotMDS(y, top = 500, title = "MDS plot for top 500 genes", bg = colors[(y$samples$group)], 
        col = colors[(y$samples$group)], pch=pch[(y$samples$group)],cex = 1.25)
legend("bottomright", inset = c(0, 0), legend=levels(y$samples$group), 
       pch=pch, col='black', pt.bg=colors, ncol=8, cex = 0.55)

3.2.2 Poisson distance heatmap

Similarity between samples was also explored by means of a heatmap:

# calculate poisson distances for the normalized count data
poisd = PoissonDistance(t(y$counts))

# create a list with sample names to be used in the plot
names = c("A.16ppt","A.16ppt","A.16ppt","A.24ppt","A.24ppt","A.24ppt","A.8ppt","A.8ppt","A.8ppt","B.16ppt","B.16ppt","B.16ppt","B.24ppt","B.24ppt","B.24ppt","B.8ppt","B.8ppt","B.8ppt","D.16ppt","D.16ppt","D.16ppt","D.24ppt","D.24ppt","D.24ppt","D.8ppt","D.8ppt","D.8ppt","F.16ppt","F.16ppt","F.16ppt","F.24ppt","F.24ppt","F.24ppt","F.8ppt","F.8ppt","F.8ppt","I.16ppt","I.16ppt","I.16ppt","I.24ppt","I.24ppt","I.24ppt","I.8ppt","I.8ppt","I.8ppt","J.16ppt","J.16ppt","J.16ppt","J.24ppt","J.24ppt","J.24ppt","J.8ppt","J.8ppt","J.8ppt","K.16ppt","K.16ppt","K.16ppt","K.24ppt","K.24ppt","K.24ppt","K.8ppt","K.8ppt","K.8ppt","P.16ppt","P.16ppt","P.16ppt","P.24ppt","P.24ppt","P.24ppt","P.8ppt","P.8ppt","P.8ppt")

# define colors to be used in the plot
colors_heatmap = colorRampPalette(rev(brewer.pal(9,"Purples")))(255)
                        
# plot heatmap
samplePoisDistMatrix = as.matrix(poisd$dd)
rownames(samplePoisDistMatrix) = paste(names)
colnames(samplePoisDistMatrix) = paste(names)
pheatmap(samplePoisDistMatrix,
         clustering_distance_rows=poisd$dd,
         clustering_distance_cols=poisd$dd,
         col=colors_heatmap)

3.3 Testing for differential expression using stage-wise analysis [omnibus test]: average salinity effect (RQ1) & individual genotypes (RQ2)

In this omnibus test we combined the tests for the average salinity effect, as well as the respones of each individual genotype.

3.3.1 Defining contrasts to test

In a first step, we defined all the contrasts that need to be tested: 27 in total (24 for the genotypes and 3 for the average effect):

# define all contrasts to test
C_RQ1e2=matrix(0,nrow=ncol(fit_group_model$coefficients),ncol=27)
rownames(C_RQ1e2)=colnames(fit_group_model$coefficients)
colnames(C_RQ1e2)=c("A8-A16","A16-A24","A8-A24",
                    "B8-B16","B16-B24","B8-B24",
                    "D8-D16","D16-D24","D8-D24",
                    "F8-F16","F16-F24","F8-F24",
                    "I8-I16","I16-I24","I8-I24",
                    "J8-J16","J16-J24","J8-J24",
                    "K8-K16","K16-K24","K8-K24",
                    "P8-P16","P16-P24","P8-P24",
                    "avg8-16", "avg16-24","avg8-24")

# genotype A (salinity effect)
C_RQ1e2[c("A.8ppt","A.16ppt"),"A8-A16"]=c(1,-1)
C_RQ1e2[c("A.8ppt","A.24ppt"),"A8-A24"]=c(1,-1)
C_RQ1e2[c("A.16ppt","A.24ppt"),"A16-A24"]=c(1,-1)
# genotype B (salinity effect)
C_RQ1e2[c("B.8ppt","B.16ppt"),"B8-B16"]=c(1,-1)
C_RQ1e2[c("B.8ppt","B.24ppt"),"B8-B24"]=c(1,-1)
C_RQ1e2[c("B.16ppt","B.24ppt"),"B16-B24"]=c(1,-1)
# genotype D (salinity effect)
C_RQ1e2[c("D.8ppt","D.16ppt"),"D8-D16"]=c(1,-1)
C_RQ1e2[c("D.8ppt","D.24ppt"),"D8-D24"]=c(1,-1)
C_RQ1e2[c("D.16ppt","D.24ppt"),"D16-D24"]=c(1,-1)
# genotype F (salinity effect)
C_RQ1e2[c("F.8ppt","F.16ppt"),"F8-F16"]=c(1,-1)
C_RQ1e2[c("F.8ppt","F.24ppt"),"F8-F24"]=c(1,-1)
C_RQ1e2[c("F.16ppt","F.24ppt"),"F16-F24"]=c(1,-1)
# genotype I (salinity effect)
C_RQ1e2[c("I.8ppt","I.16ppt"),"I8-I16"]=c(1,-1)
C_RQ1e2[c("I.8ppt","I.24ppt"),"I8-I24"]=c(1,-1)
C_RQ1e2[c("I.16ppt","I.24ppt"),"I16-I24"]=c(1,-1)
# genotype J (salinity effect)
C_RQ1e2[c("J.8ppt","J.16ppt"),"J8-J16"]=c(1,-1)
C_RQ1e2[c("J.8ppt","J.24ppt"),"J8-J24"]=c(1,-1)
C_RQ1e2[c("J.16ppt","J.24ppt"),"J16-J24"]=c(1,-1)
# genotype K (salinity effect)
C_RQ1e2[c("K.8ppt","K.16ppt"),"K8-K16"]=c(1,-1)
C_RQ1e2[c("K.8ppt","K.24ppt"),"K8-K24"]=c(1,-1)
C_RQ1e2[c("K.16ppt","K.24ppt"),"K16-K24"]=c(1,-1)
# genotype P (salinity effect)
C_RQ1e2[c("P.8ppt","P.16ppt"),"P8-P16"]=c(1,-1)
C_RQ1e2[c("P.8ppt","P.24ppt"),"P8-P24"]=c(1,-1)
C_RQ1e2[c("P.16ppt","P.24ppt"),"P16-P24"]=c(1,-1)
# average salinity effect 
C_RQ1e2[c("A.8ppt", "B.8ppt", "D.8ppt","F.8ppt", "I.8ppt", "J.8ppt", "K.8ppt", "P.8ppt",
          "A.16ppt", "B.16ppt", "D.16ppt","F.16ppt", "I.16ppt", "J.16ppt", "K.16ppt", "P.16ppt"),
        "avg8-16"]=c(1/8,1/8,1/8,1/8,1/8,1/8,1/8,1/8,-1/8,-1/8,-1/8,-1/8,-1/8,-1/8,-1/8,-1/8)
C_RQ1e2[c("A.8ppt", "B.8ppt", "D.8ppt","F.8ppt", "I.8ppt", "J.8ppt", "K.8ppt", "P.8ppt",
          "A.24ppt", "B.24ppt", "D.24ppt","F.24ppt", "I.24ppt", "J.24ppt", "K.24ppt", "P.24ppt"),
        "avg8-24"]=c(1/8,1/8,1/8,1/8,1/8,1/8,1/8,1/8,-1/8,-1/8,-1/8,-1/8,-1/8,-1/8,-1/8,-1/8)
C_RQ1e2[c("A.16ppt", "B.16ppt", "D.16ppt","F.16ppt", "I.16ppt", "J.16ppt", "K.16ppt", "P.16ppt",
          "A.24ppt", "B.24ppt", "D.24ppt","F.24ppt", "I.24ppt", "J.24ppt", "K.24ppt", "P.24ppt"),
        "avg16-24"]=c(1/8,1/8,1/8,1/8,1/8,1/8,1/8,1/8,-1/8,-1/8,-1/8,-1/8,-1/8,-1/8,-1/8,-1/8)

3.3.2 Stage-wise testing

We performed the stage-wise testing procedure in stageR. StageR allows for simultaneous FDR control in all the contrasts, and consists of two steps: the screening stage, and the confirmation stage.

The screening stage tested whether any of the 27 contrasts were significant, i.e. it tests whether there has been any effect of the treatment for each genotype separately as well as for the average effect. The screening stage gives P-values as output, but these are not yet FDR-controlled so should not be used in downstream analyses.

# screening stage
alpha = 0.05
screenTest_RQ1e2 = glmQLFTest(fit_group_model, contrast=C_RQ1e2)
pScreen_RQ1e2 = screenTest_RQ1e2$table$PValue
names(pScreen_RQ1e2) = rownames(screenTest_RQ1e2$table)

The screening stage was followed by the confirmation stage. In the confirmation stage, every contrast was assessed separately. The confirmation stage P-values were adjusted to control the FWER across the hypotheses within a gene and are subsequently corrected to the BH-adjusted significance level of the screening stage. This allowed for a direct comparison of the adjusted P-values to the provided significance level alpha for both screening and confirmation stage adjusted P-values. Here, we used the holm method for correction of the P-values.

# confirmation stage
confirmationResults_RQ1e2 = sapply(1:ncol(C_RQ1e2),function(i) glmQLFTest(fit_group_model, contrast = C_RQ1e2[,i]), simplify=FALSE) #calculates Ftest for each contrast
confirmationPList_RQ1e2 = lapply(confirmationResults_RQ1e2, function(x) x$table$PValue) # takes the P-values from all genes for each contrast and puts them in a list
confirmationP_RQ1e2 = as.matrix(Reduce(f=cbind,confirmationPList_RQ1e2)) 
rownames(confirmationP_RQ1e2) = rownames(confirmationResults_RQ1e2[[1]]$table)
colnames(confirmationP_RQ1e2) = colnames(C_RQ1e2)
stageRObj_RQ1e2 = stageR(pScreen=pScreen_RQ1e2, pConfirmation=confirmationP_RQ1e2) # constructs an object
stageRAdj_RQ1e2 = stageWiseAdjustment(object=stageRObj_RQ1e2, method="holm", alpha=0.05) # adjusts the P-values using FWER correction using the holm method
resRQ1e2 = getResults(stageRAdj_RQ1e2)

## The returned adjusted p-values are based on a stage-wise testing approach and are only valid for the provided target OFDR level of 5%. If a different target OFDR level is of interest,the entire adjustment should be re-run.

# number of DE genes in every contrast
SignifGenesRQ1e2 = colSums(resRQ1e2) 
SignifGenesRQ1e2

## padjScreen     A8-A16    A16-A24     A8-A24     B8-B16    B16-B24     B8-B24 
##       8820        815        385       1649        469        443       1750 
##     D8-D16    D16-D24     D8-D24     F8-F16    F16-F24     F8-F24     I8-I16 
##       1026        451       1224       1159        867       1428        285 
##    I16-I24     I8-I24     J8-J16    J16-J24     J8-J24     K8-K16    K16-K24 
##        297        502        751        215       1042        197        638 
##     K8-K24     P8-P16    P16-P24     P8-P24    avg8-16   avg16-24    avg8-24 
##       1102       1378        648       1810       2586       2014       4276

# get adjusted P-values
adjusted_p_RQ1e2 = getAdjustedPValues(stageRAdj_RQ1e2, onlySignificantGenes = FALSE, order = FALSE)

## The returned adjusted p-values are based on a stage-wise testing approach and are only valid for the provided target OFDR level of 5%. If a different target OFDR level is of interest,the entire adjustment should be re-run.

Upon finishing the stage-wise testing procedure, we checked the number of significant genes:

# visualize number of significant genes in each contrast
resRQ1e2_df = as.data.frame(resRQ1e2) 
resRQ1e2_df2 = resRQ1e2_df
resRQ1e2_df2$gene = rownames(resRQ1e2_df2)
OnlySignGenes_RQ1e2 = resRQ1e2_df[resRQ1e2_df$padjScreen == 1,] # removes rows for which global test was non significant
dim(OnlySignGenes_RQ1e2) # still includes genes for which all posthoc tests were 0

## [1] 8820   28

Were there any genes that were significant in the screening stage but not in the confirmation stage?

# select genes that were only significant in the screening stage
genesSI_RQ1e2 = rownames(adjusted_p_RQ1e2)[adjusted_p_RQ1e2[,"padjScreen"]<=0.05]
genesNotFoundStageII_RQ1e2 = genesSI_RQ1e2[genesSI_RQ1e2 %in% rownames(resRQ1e2)[rowSums(resRQ1e2==0)==27]]
length(genesNotFoundStageII_RQ1e2) #stage I only genes

## [1] 1144

1144 genes were not significant in the confirmation stage, whereas they were found to be significant in the screening stage.

We removed the genes that were not significant after the confirmation stage:

# create object that only contains genes that are significant after the confirmation stage
OnlySignGenes_RQ1e2_ConStage = OnlySignGenes_RQ1e2 [!rownames(OnlySignGenes_RQ1e2 ) %in% genesNotFoundStageII_RQ1e2, ]
nrow(OnlySignGenes_RQ1e2_ConStage)

## [1] 7676

7676 genes were significant after the confirmation stage. These are the genes we continued our analyses with.

3.3.3 Summarize the results for downstream analyses

Before we continued with the downstream analyses, we created a single data object that contains some key-information of the statistical pipeline outlined above. This included information on logFC, logCPM and P-values for each gene for each contrast.

First, we selected the FDR adjusted P-values for each contrast using the output of the stageR screening stage:

# select the adjusted P-values for each contrast
adjusted_p_RQ1e2 = getAdjustedPValues(stageRAdj_RQ1e2, onlySignificantGenes = FALSE, order = FALSE)

## The returned adjusted p-values are based on a stage-wise testing approach and are only valid for the provided target OFDR level of 5%. If a different target OFDR level is of interest,the entire adjustment should be re-run.

# rename column headers in adjusted_p_RQ1e2
colnames(adjusted_p_RQ1e2)=c("padjScreen","A8vsA16_Padj","A16vsA24_Padj","A8vsA24_Padj","B8vsB16_Padj","B16vsB24_Padj","B8vsB24_Padj","D8vsD16_Padj","D16vsD24_Padj","D8vsD24_Padj","F8vsF16_Padj","F16vsF24_Padj","F8vsF24_Padj","I8vsI16_Padj","I16vsI24_Padj","I8vsI24_Padj","J8vsJ16_Padj","J16vsJ24_Padj","J8vsJ24_Padj","K8vsK16_Padj","K16vsK24_Padj","K8vsK24_Padj","P8vsP16_Padj","P16vsP24_Padj","P8vsP24_Padj","avg8vs16_Padj","avg16vs24_Padj","avg8vs24_Padj")

Second, we extracted the information on logFC, logCPM, F value and non-adjusted P-values from the confirmationResults_RQ1e2 object:

# create empty list to hold the data values
datalist = list()

# loop over the confirmationResults_RQ1e2 object to obtain the relevant information (table)
for (contrast in c(1:27)){
  table = confirmationResults_RQ1e2[[contrast]]$table
  datalist[[contrast]] = table
}

# turn list into data frame
confirmationResults_RQ1e2_total_dataset = data.frame(datalist)

# rename column names for tractability
colnames(confirmationResults_RQ1e2_total_dataset)=c("A8vsA16_logFC","A8vsA16_logCPM","A8vsA16_F","A8vsA16_nonadj_PValue","A16vsA24_logFC","A16vsA24_logCPM","A16vsA24_F","A16vsA24_nonadj_PValue","A8vsA24_logFC","A8vsA24_logCPM","A8vsA24_F","A8vsA24_nonadj_PValue","B8vsB16_logFC","B8vsB16_logCPM","B8vsB16_F","B8vsB16_nonadj_PValue","B16vsB24_logFC","B16vsB24_logCPM","B16vsB24_F","B16vsB24_nonadj_PValue","B8vsB24_logFC","B8vsB24_logCPM","B8vsB24_F","B8vsB24_nonadj_PValue","D8vsD16_logFC","D8vsD16_logCPM","D8vsD16_F","D8vsD16_nonadj_PValue","D16vsD24_logFC","D16vsD24_logCPM","D16vsD24_F","D16vsD24_nonadj_PValue","D8vsD24_logFC","D8vsD24_logCPM","D8vsD24_F","D8vsD24_nonadj_PValue","F8vsF16_logFC","F8vsF16_logCPM","F8vsF16_F","F8vsF16_nonadj_PValue","F16vsF24_logFC","F16vsF24_logCPM","F16vsF24_F","F16vsF24_nonadj_PValue","F8vsF24_logFC","F8vsF24_logCPM","F8vsF24_F","F8vsF24_nonadj_PValue","I8vsI16_logFC","I8vsI16_logCPM","I8vsI16_F","I8vsI16_nonadj_PValue","I16vsI24_logFC","I16vsI24_logCPM","I16vsI24_F","I16vsI24_nonadj_PValue","I8vsI24_logFC","I8vsI24_logCPM","I8vsI24_F","I8vsI24_nonadj_PValue","J8vsJ16_logFC","J8vsJ16_logCPM","J8vsJ16_F","J8vsJ16_nonadj_PValue","J16vsJ24_logFC","J16vsJ24_logCPM","J16vsJ24_F","J16vsJ24_nonadj_PValue","J8vsJ24_logFC","J8vsJ24_logCPM","J8vsJ24_F","J8vsJ24_nonadj_PValue","K8vsK16_logFC","K8vsK16_logCPM","K8vsK16_F","K8vsK16_nonadj_PValue","K16vsK24_logFC","K16vsK24_logCPM","K16vsK24_F","K16vsK24_nonadj_PValue","K8vsK24_logFC","K8vsK24_logCPM","K8vsK24_F","K8vsK24_nonadj_PValue","P8vsP16_logFC","P8vsP16_logCPM","P8vsP16_F","P8vsP16_nonadj_PValue","P16vsP24_logFC","P16vsP24_logCPM","P16vsP24_F","P16vsP24_nonadj_PValue","P8vsP24_logFC","P8vsP24_logCPM","P8vsP24_F","P8vsP24_nonadj_PValue","avg8vs16_logFC","avg8vs16_logCPM","avg8vs16_F","avg8vs16_nonadj_PValue","avg16vs24_logFC","avg16vs24_logCPM","avg16vs24_F","avg16vs24_nonadj_PValue","avg8vs24_logFC","avg8vs24_logCPM","avg8vs24_F","avg8vs24_nonadj_PValue")

Then we combined the FDR-adjusted P-values and the table with information on logFC etc. into a single data frame:

# merge the data frames
table = merge(confirmationResults_RQ1e2_total_dataset,adjusted_p_RQ1e2, by = 0, all = TRUE)

# use the first column (gene names) for the row names
all_results_RQ1e2 = table[,-1]
rownames(all_results_RQ1e2) = table[,1]

The resulting data frame all_results_RQ1e2 was used in multiple analyses downstream to access basic statistical information on each gene for each contrast.

3.3.4 Unique and shared DE between genotypes

In this section, we visualized the percentage of DE genes in the total dataset and in different contrasts for both the average salinity response and all individual genotypes.

3.3.4.1 General overview of DE total dataset

First, we visualized the total number of DE genes per contrast:

# calculate statistics
colsums = colSums(resRQ1e2)
par(las=2)
par(mar = c(6, 5, 3, 1), xpd = TRUE)

# plot barplot
barplot(colsums,col = "black",
        main = "Total number of DE genes per contrast [padjScreen = total dataset]", 
        ylab = "Total number of DE genes", 
        border=NA, cex.axis=0.9, cex.names = 0.9, 
        cex.lab=1, font.lab=2, ylim=c(0,10000))

3.3.4.2 Barplot with up- and downregulated DE genes [total dataset]

Here, we created a barpot with the number of up- and downregulated DE genes in each contrast:

# create table with only logFC values of significant genes
all_results_RQ1e2_OnlySig = subset(all_results_RQ1e2, 
                                   rownames(all_results_RQ1e2)%in%rownames(OnlySignGenes_RQ1e2_ConStage))
all_results_RQ1e2_OnlySig_logFC = all_results_RQ1e2_OnlySig[,grepl("logFC", colnames(all_results_RQ1e2_OnlySig))]

# remove non-significant logFC values
OnlySignGenes_RQ1e2_ConStage2 = OnlySignGenes_RQ1e2_ConStage[-c(1)]
all_results_RQ1e2_OnlySig_logFC_con = type.convert(all_results_RQ1e2_OnlySig_logFC, as.is = TRUE)
all_results_RQ1e2_OnlySig_logFC_OnlySign = (0^(OnlySignGenes_RQ1e2_ConStage2 == 0)) * all_results_RQ1e2_OnlySig_logFC_con

# count number of up- and downregulated genes
calcStats = function(x) {
  pos = sum(all_results_RQ1e2_OnlySig_logFC_OnlySign[, x] > 0)
  neg = sum(all_results_RQ1e2_OnlySig_logFC_OnlySign[, x] < 0)
  c("upregulated" = pos, "downregulated" = -neg)
}

RQ1e2_DirGenes = as.data.frame(Map(calcStats, colnames(all_results_RQ1e2_OnlySig_logFC_OnlySign)))

colnames(RQ1e2_DirGenes) = c('A8-A16','A16-A24','A8-A24','B8-B16','B16-B24','B8-B24','D8-D16','D16-D24','D8-D24','F8-F16','F16-F24','F8-F24','I8-I16','I16-I24','I8-I24','J8-J16','J16-J24','J8-J24','K8-K16','K16-K24','K8-K24','P8-P16','P16-P24','P8-P24','avg8-avg16','avg16-avg24','avg8-avg24')

# reorder columns to order for ggplot
RQ1e2_DirGenes = RQ1e2_DirGenes[,(c('avg8-avg24','avg8-avg16','avg16-avg24',
                                    'P8-P24','P8-P16','P16-P24',
                                    'K8-K24','K8-K16','K16-K24',
                                    'J8-J24','J8-J16','J16-J24',
                                    'I8-I24','I8-I16','I16-I24',
                                    'F8-F24','F8-F16','F16-F24',
                                    'D8-D24','D8-D16','D16-D24',
                                    'B8-B24','B8-B16','B16-B24',
                                    'A8-A24','A8-A16','A16-A24'))]
# stack data
RQ1e2_DirGenes2 = stack(RQ1e2_DirGenes)

# include category to data frame for color plotting in ggplot
category = rev(rep(c('16vs24','16vs24','8vs16','8vs16','8vs24','8vs24'),9))
RQ1e2_DirGenes3 = cbind(RQ1e2_DirGenes2,category)

# plot barplot
g = ggplot(RQ1e2_DirGenes3, aes(x = ind, y = values, fill = category)) +
  geom_bar(stat = "identity", position = "identity",
           color = "white") + coord_flip() +
  scale_fill_manual("legend", values = c('8vs16' = "#3690C0", '16vs24' = "#A6BDDB", '8vs24' = "#023858")) +
  scale_x_discrete(limits = rev(levels(x))) +
  theme_test()
g

3.3.4.3 Barplot with up- and downregulated DE genes [top set genes]

For this plot, we selected different sets of top genes using different methodologies.

First, we calculated the top 100 genes in the full dataset, using stageR’s FDR-adjusted P-value of the global null hypothesis (Padjscreen):

# select top 100 genes in the full dataset based on Padjscreen
Padjscreen_sorted = all_results_RQ1e2 [with(all_results_RQ1e2 , order(all_results_RQ1e2$padjScreen)),]
Padjscreen_sorted_top100 = Padjscreen_sorted[1:100,] 
Padjscreen_sorted_top100_genes = rownames(Padjscreen_sorted_top100)

Then, we determined in which individual contrasts the top 100 genes selected by Padjscreen were significant, and we calculated the number of times this occured, separately for up- and downregulated genes:

# calculate number of up- and downregulated top 100 genes (selected by Padjscreen) for each individual contrast
all_results_RQ1e2_OnlySig_logFC_Padjscreen = subset(all_results_RQ1e2_OnlySig_logFC_OnlySign,rownames(all_results_RQ1e2_OnlySig_logFC_OnlySign)%in%Padjscreen_sorted_top100_genes)

calcStats = function(x) {
  pos = sum(all_results_RQ1e2_OnlySig_logFC_Padjscreen[, x] > 0)
  neg = sum(all_results_RQ1e2_OnlySig_logFC_Padjscreen[, x] < 0)
  c("upregulated" = pos, "downregulated" = -neg)
}

RQ1e2_DirGenes_Padjscreen = as.data.frame(Map(calcStats, colnames(all_results_RQ1e2_OnlySig_logFC_OnlySign)))

colnames(RQ1e2_DirGenes_Padjscreen)=c('A8-A16','A16-A24','A8-A24','B8-B16','B16-B24','B8-B24','D8-D16','D16-D24','D8-D24','F8-F16','F16-F24','F8-F24','I8-I16','I16-I24','I8-I24','J8-J16','J16-J24','J8-J24','K8-K16','K16-K24','K8-K24','P8-P16','P16-P24','P8-P24','avg8-avg16','avg16-avg24','avg8-avg24')

Second, we determined the top 100 genes in the contrast with most DE genes (avg8-avg24) using the P-values of this specific contrast:

# rank genes based on P-value of the avg8-avg24 contrast
Padj_avg8vs24_sorted = all_results_RQ1e2 [with(all_results_RQ1e2 , order(all_results_RQ1e2$avg8vs24_Padj)),] 
Padj_avg8vs24_sorted_top100 = Padj_avg8vs24_sorted[1:100,] 

# calculate number of up- and downregulated genes
all_results_RQ1e2_OnlySig_logFC_Padj_24vs8 = subset(all_results_RQ1e2_OnlySig_logFC_OnlySign,                           rownames(all_results_RQ1e2_OnlySig_logFC_OnlySign)%in%rownames(Padj_avg8vs24_sorted_top100))[,27]
pos1 = sum(all_results_RQ1e2_OnlySig_logFC_Padj_24vs8 > 0)
neg1 = -sum(all_results_RQ1e2_OnlySig_logFC_Padj_24vs8 < 0)

Third, we used the Topconfects method to rank genes. This method uses LFC and confidence intervals (CI) to rank genes.

# run Topconfects for each contrast separately
topconfects_results = list()
for (contrast in c(1:27)){
  topconfects_results[[contrast]] = edger_confects(fit_group_model, contrast = C_RQ1e2[,contrast], 
                 fdr = 0.05, step = 0.01)
}

We selected the genes of the top 100 genes by topconfects in the avg8-avg24 contrast, and calculated the number of up- and downregulated genes:

# select top 100 genes by topconfects in avg8-avg24 contrast
tc_avg8vs24 = subset(topconfects_results[[27]]$table, confect != 0)
top100_genes_tc8vs24avg = rownames(as.data.frame(subset(all_results_RQ1e2,rownames(all_results_RQ1e2)%in%tc_avg8vs24[1:100,]$name)))

# calculate number of up- and downregulated genes
all_results_RQ1e2_OnlySig_logFC_tc_24vs8 = subset(all_results_RQ1e2_OnlySig_logFC_OnlySign,rownames(all_results_RQ1e2_OnlySig_logFC_OnlySign)%in%top100_genes_tc8vs24avg)[,27]
pos2 = sum(all_results_RQ1e2_OnlySig_logFC_tc_24vs8 > 0)
neg2 = -sum(all_results_RQ1e2_OnlySig_logFC_tc_24vs8 < 0)

Next, we combined the information of all the sets of top genes, and plotted a barplot:

# combine data on topconfects and top 100 8vs24 contrast
upregulated = c(pos1,pos2)
downregulated = c(neg1,neg2)
data = as.data.frame(rbind(upregulated,downregulated))
colnames(data) = c('Padj_avg8-avg24','topconfects_avg8-avg24')
RQ1e2_DirGenes_Padjscreen_update = cbind(RQ1e2_DirGenes_Padjscreen,data)

# reorder columns to order for ggplot
RQ1e2_DirGenes_Padjscreen_update = RQ1e2_DirGenes_Padjscreen_update[,c( 'Padj_avg8-avg24',
                                                                        'topconfects_avg8-avg24',
                                                                        'avg8-avg24','avg8-avg16','avg16-avg24',
                                                                        'P8-P24','P8-P16','P16-P24',
                                                                        'K8-K24','K8-K16','K16-K24',
                                                                        'J8-J24','J8-J16','J16-J24',
                                                                        'I8-I24','I8-I16','I16-I24',
                                                                        'F8-F24','F8-F16','F16-F24',
                                                                        'D8-D24','D8-D16','D16-D24',
                                                                        'B8-B24','B8-B16','B16-B24',
                                                                        'A8-A24','A8-A16','A16-A24')]

# stack data
RQ1e2_DirGenes_Padjscreen2 = stack(RQ1e2_DirGenes_Padjscreen_update)

# include category to data frame for easing color plotting in ggplot
category_bis = rev(c(rep(c('16vs24','16vs24','8vs16','8vs16','8vs24','8vs24'),9),
                     '8vs24','8vs24','8vs24','8vs24'))
RQ1e2_DirGenes_Padjscreen3 = cbind(RQ1e2_DirGenes_Padjscreen2,category_bis)

# plot barplot
g = ggplot(RQ1e2_DirGenes_Padjscreen3, aes(x = ind, y = values, fill = category_bis)) +
  geom_bar(stat = "identity", position = position_stack(),
           color = "white") + coord_flip() +
  scale_fill_manual("legend", values = c('8vs16' = "#3690C0", '16vs24' = "#A6BDDB" , '8vs24' = "#023858")) +
  theme_test()
g

3.3.4.4 Upset plots showing shared and unique DE across the three salinity contrasts

We selected the genes significant in each genotype and the average response. Here, we made no distinction between contrasts:

# select DE genes per genotype and average response
allDE_genoA = rownames (subset (OnlySignGenes_RQ1e2_ConStage, 
                                OnlySignGenes_RQ1e2_ConStage$`A8-A16`== 1 | 
                                OnlySignGenes_RQ1e2_ConStage$`A8-A24`== 1 | 
                                OnlySignGenes_RQ1e2_ConStage$`A16-A24`== 1))
allDE_genoB = rownames (subset (OnlySignGenes_RQ1e2_ConStage, 
                                OnlySignGenes_RQ1e2_ConStage$`B8-B16`== 1 | 
                                OnlySignGenes_RQ1e2_ConStage$`B8-B24`== 1 | 
                                OnlySignGenes_RQ1e2_ConStage$`B16-B24`== 1))
allDE_genoD = rownames (subset (OnlySignGenes_RQ1e2_ConStage, 
                                OnlySignGenes_RQ1e2_ConStage$`D8-D16`== 1 | 
                                OnlySignGenes_RQ1e2_ConStage$`D8-D24`== 1 | 
                                OnlySignGenes_RQ1e2_ConStage$`D16-D24`== 1))
allDE_genoF = rownames (subset (OnlySignGenes_RQ1e2_ConStage, 
                                OnlySignGenes_RQ1e2_ConStage$`F8-F16`== 1 | 
                                OnlySignGenes_RQ1e2_ConStage$`F8-F24`== 1 | 
                                OnlySignGenes_RQ1e2_ConStage$`F16-F24`== 1))
allDE_genoI = rownames (subset (OnlySignGenes_RQ1e2_ConStage, 
                                OnlySignGenes_RQ1e2_ConStage$`I8-I16`== 1 | 
                                OnlySignGenes_RQ1e2_ConStage$`I8-I24`== 1 | 
                                OnlySignGenes_RQ1e2_ConStage$`I16-I24`== 1))
allDE_genoJ = rownames (subset (OnlySignGenes_RQ1e2_ConStage, 
                                OnlySignGenes_RQ1e2_ConStage$`J8-J16`== 1 | 
                                OnlySignGenes_RQ1e2_ConStage$`J8-J24`== 1 | 
                                OnlySignGenes_RQ1e2_ConStage$`J16-J24`== 1))
allDE_genoK = rownames (subset (OnlySignGenes_RQ1e2_ConStage, 
                                OnlySignGenes_RQ1e2_ConStage$`K8-K16`== 1 | 
                                OnlySignGenes_RQ1e2_ConStage$`K8-K24`== 1 | 
                                OnlySignGenes_RQ1e2_ConStage$`K16-K24`== 1))
allDE_genoP = rownames (subset (OnlySignGenes_RQ1e2_ConStage, 
                                OnlySignGenes_RQ1e2_ConStage$`P8-P16`== 1 | 
                                OnlySignGenes_RQ1e2_ConStage$`P8-P24`== 1 | 
                                OnlySignGenes_RQ1e2_ConStage$`P16-P24`== 1))
allDE_avg = rownames (subset (OnlySignGenes_RQ1e2_ConStage, 
                              OnlySignGenes_RQ1e2_ConStage$`avg8-16`== 1 | 
                              OnlySignGenes_RQ1e2_ConStage$`avg8-24`== 1 | 
                              OnlySignGenes_RQ1e2_ConStage$`avg16-24`== 1))

Next, we plotted an upset plot of the full dataset, showing unique and shared DE between genotypes and the average response:

# prepare data
listInput_allgenes = list(allDE_genoA,allDE_genoB,allDE_genoD,allDE_genoF,allDE_genoI,
                          allDE_genoJ,allDE_genoK,allDE_genoP,allDE_avg)
names(listInput_allgenes) = c('genotype A', 'genotype B', 'genotype D', 'genotype F' , 
                              'genotype I', 'genotype J', 'genotype K', 'genotype P','average')

# plot upset plot
upset(fromList(listInput_allgenes), 
      nintersects = 60, 
      mainbar.y.label = "Intersection DE genes", sets.x.label = "Number of DE genes per set", 
      nsets = 9, 
      order.by = "freq", 
      decreasing = T, 
      mb.ratio = c(0.6, 0.4),
      number.angles = 0, 
      text.scale = 1.1, 
      point.size = 2.8, 
      line.size = 1,
      sets = c('average', 'genotype P', 'genotype K', 'genotype J', 'genotype I', 'genotype F', 
               'genotype D', 'genotype B', 'genotype A'),
      keep.order = TRUE)

We also created a data object with all genes that are DE in only one genotype:

# extract unique sets
listInput_allgenes = list(allDE_genoA,allDE_genoB,allDE_genoD,allDE_genoF,allDE_genoI,
                          allDE_genoJ,allDE_genoK,allDE_genoP,allDE_avg)
names(listInput_allgenes) = c('A', 'B', 'D', 'F' , 'I', 'J', 'K', 'P','avg')

Upset_sets = make_comb_mat(listInput_allgenes,  mode = 'distinct')

uniqueA = extract_comb(Upset_sets, "100000000")
uniqueB = extract_comb(Upset_sets, "010000000")
uniqueD = extract_comb(Upset_sets, "001000000")
uniqueF = extract_comb(Upset_sets, "000100000")
uniqueI = extract_comb(Upset_sets, "000010000")
uniqueJ = extract_comb(Upset_sets, "000001000")
uniqueK = extract_comb(Upset_sets, "000000100")
uniqueP = extract_comb(Upset_sets, "000000010")
uniqueAvg = extract_comb(Upset_sets, "000000001")

# genes uniquely DE in one genotype
RQ1e2_uniqueDE = c(uniqueA, uniqueB, uniqueD, uniqueF, uniqueI, uniqueJ, uniqueK, uniqueP)
length(RQ1e2_uniqueDE)

## [1] 1643

# genes uniquely DE in the average response
length(uniqueAvg)

## [1] 826

Then, we did the same as above, but now excluded the average response data:

# extract unique sets
listInput_allgenes = list(allDE_genoA,allDE_genoB,allDE_genoD,allDE_genoF,allDE_genoI,
                          allDE_genoJ,allDE_genoK,allDE_genoP)
names(listInput_allgenes) = c('A', 'B', 'D', 'F' , 'I', 'J', 'K', 'P')
Upset_sets = make_comb_mat(listInput_allgenes,  mode = 'distinct')

uniqueA = extract_comb(Upset_sets, '10000000')
uniqueB = extract_comb(Upset_sets, '01000000')
uniqueD = extract_comb(Upset_sets, '00100000')
uniqueF = extract_comb(Upset_sets, '00010000')
uniqueI = extract_comb(Upset_sets, '00001000')
uniqueJ = extract_comb(Upset_sets, '00000100')
uniqueK = extract_comb(Upset_sets, '00000010')
uniqueP = extract_comb(Upset_sets, '00000001')

# genes uniquely DE in one genotype (not taking into account the average response)
RQ1e2_uniqueDE2 = c(uniqueA, uniqueB, uniqueD, uniqueF, uniqueI, uniqueJ, uniqueK, uniqueP)
length(RQ1e2_uniqueDE2)

## [1] 3189

# number of genes uniquely DE in one genotype that are also DE in the average response
length(RQ1e2_uniqueDE2) - length(RQ1e2_uniqueDE)

## [1] 1546

DE genes shared among seven out of eight genotypes:

# DE in all but one genotype
notA = extract_comb(Upset_sets, '01111111')
notB = extract_comb(Upset_sets, '10111111')
notD = extract_comb(Upset_sets, '11011111')
notF = extract_comb(Upset_sets, '11101111')
notI = extract_comb(Upset_sets, '11110111')
notJ = extract_comb(Upset_sets, '11111011')
notK = extract_comb(Upset_sets, '11111101')
notP = extract_comb(Upset_sets, '11111110')

# number of genes DE in all but one genotype
genes = c(notA, notB, notD, notF, notI, notJ, notK, notP)
genes_unique = unique(genes)
length(genes_unique)

## [1] 93

3.3.4.5 Upset plots showing shared and unique DE for the salinity contrasts separately

The 8-16 contrast:

# select DE genes per genotype and average response
genoA_8vs16 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`A8-A16`== 1))
genoB_8vs16 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`B8-B16`== 1))
genoD_8vs16 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`D8-D16`== 1))
genoF_8vs16 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`F8-F16`== 1))
genoI_8vs16 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`I8-I16`== 1))
genoJ_8vs16 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`J8-J16`== 1))
genoK_8vs16 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`K8-K16`== 1))
genoP_8vs16 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`P8-P16`== 1))
avg_8vs16 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`avg8-16`== 1))

listInput_allgenes = list(genoA_8vs16,genoB_8vs16,genoD_8vs16,genoF_8vs16,genoI_8vs16,
                          genoJ_8vs16,genoK_8vs16,genoP_8vs16,avg_8vs16)
names(listInput_allgenes) = c('genotype A', 'genotype B', 'genotype D', 'genotype F' , 
                              'genotype I', 'genotype J', 'genotype K', 'genotype P','average')

# plot Upset plot
upset(fromList(listInput_allgenes), 
      nintersects = 60, 
      mainbar.y.label = 'Intersection DE genes', sets.x.label = 'Number of DE genes per set', 
      nsets = 9, 
      order.by = 'freq', 
      decreasing = T, 
      mb.ratio = c(0.6, 0.4),
      number.angles = 0, 
      text.scale = 1.1, 
      point.size = 2.8, 
      line.size = 1,
      sets = c('average', 'genotype P', 'genotype K', 'genotype J', 'genotype I', 'genotype F', 
               'genotype D', 'genotype B', 'genotype A'),
      keep.order = TRUE)

The 16-24 contrast:

# select DE genes per genotype and average response
genoA_16vs24 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`A16-A24`== 1))
genoB_16vs24 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`B16-B24`== 1))
genoD_16vs24 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`D16-D24`== 1))
genoF_16vs24 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`F16-F24`== 1))
genoI_16vs24 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`I16-I24`== 1))
genoJ_16vs24 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`J16-J24`== 1))
genoK_16vs24 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`K16-K24`== 1))
genoP_16vs24 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`P16-P24`== 1))
avg_16vs24 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`avg16-24`== 1))

listInput_allgenes = list(genoA_16vs24,genoB_16vs24,genoD_16vs24,genoF_16vs24,genoI_16vs24,
                          genoJ_16vs24,genoK_16vs24,genoP_16vs24,avg_16vs24)
names(listInput_allgenes) = c('genotype A', 'genotype B', 'genotype D', 'genotype F' , 
                              'genotype I', 'genotype J', 'genotype K', 'genotype P','average')

# plot Upset plot
upset(fromList(listInput_allgenes), 
      nintersects = 60, 
      mainbar.y.label = 'Intersection DE genes', sets.x.label = 'Number of DE genes per set', 
      nsets = 9, 
      order.by = 'freq', 
      decreasing = T, 
      mb.ratio = c(0.6, 0.4),
      number.angles = 0, 
      text.scale = 1.1, 
      point.size = 2.8, 
      line.size = 1,
      sets = c('average', 'genotype P', 'genotype K', 'genotype J', 'genotype I', 'genotype F', 
               'genotype D', 'genotype B', 'genotype A'),
      keep.order = TRUE)

The 8-24 contrast:

# select DE genes per genotype and average response
genoA_8vs24 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`A8-A24`== 1))
genoB_8vs24 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`B8-B24`== 1))
genoD_8vs24 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`D8-D24`== 1))
genoF_8vs24 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`F8-F24`== 1))
genoI_8vs24 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`I8-I24`== 1))
genoJ_8vs24 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`J8-J24`== 1))
genoK_8vs24 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`K8-K24`== 1))
genoP_8vs24 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`P8-P24`== 1))
avg_8vs24 = rownames (subset (OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`avg8-24`== 1))

listInput_allgenes = list(genoA_8vs24,genoB_8vs24,genoD_8vs24,genoF_8vs24,genoI_8vs24,
                          genoJ_8vs24,genoK_8vs24,genoP_8vs24,avg_8vs24)
names(listInput_allgenes) = c('genotype A', 'genotype B', 'genotype D', 'genotype F' , 
                              'genotype I', 'genotype J', 'genotype K', 'genotype P','average')

# plot Upset plot
upset(fromList(listInput_allgenes), 
      nintersects = 60, 
      mainbar.y.label = 'Intersection DE genes', sets.x.label = 'Number of DE genes per set', 
      nsets = 9, 
      order.by = 'freq', 
      decreasing = T, 
      mb.ratio = c(0.6, 0.4),
      number.angles = 0, 
      text.scale = 1.1, 
      point.size = 2.8, 
      line.size = 1,
      sets = c('average', 'genotype P', 'genotype K', 'genotype J', 'genotype I', 'genotype F', 
               'genotype D', 'genotype B', 'genotype A'),
      keep.order = TRUE)

How many genes are uniquely DE?

Upset_sets = make_comb_mat(listInput_allgenes,  mode = 'distinct')

# how many genes are uniquely DE in one genotype but not the average response?
uniqueA_8vs24 = extract_comb(Upset_sets, '100000000')
uniqueB_8vs24 = extract_comb(Upset_sets, '010000000')
uniqueD_8vs24 = extract_comb(Upset_sets, '001000000')
uniqueF_8vs24 = extract_comb(Upset_sets, '000100000')
uniqueI_8vs24 = extract_comb(Upset_sets, '000010000')
uniqueJ_8vs24 = extract_comb(Upset_sets, '000001000')
uniqueK_8vs24 = extract_comb(Upset_sets, '000000100')
uniqueP_8vs24 = extract_comb(Upset_sets, '000000010')

genes = c(uniqueA_8vs24,uniqueB_8vs24,uniqueD_8vs24,uniqueF_8vs24,uniqueI_8vs24,
          uniqueJ_8vs24,uniqueK_8vs24,uniqueP_8vs24)
length(unique(genes))

## [1] 1607

# how many genes are uniquely DE in one genotype and the average response?
uniqueA_8vs24_2 = extract_comb(Upset_sets, '100000001')
uniqueB_8vs24_2 = extract_comb(Upset_sets, '010000001')
uniqueD_8vs24_2 = extract_comb(Upset_sets, '001000001')
uniqueF_8vs24_2 = extract_comb(Upset_sets, '000100001')
uniqueI_8vs24_2 = extract_comb(Upset_sets, '000010001')
uniqueJ_8vs24_2 = extract_comb(Upset_sets, '000001001')
uniqueK_8vs24_2 = extract_comb(Upset_sets, '000000101')
uniqueP_8vs24_2 = extract_comb(Upset_sets, '000000011')

genes = c(uniqueA_8vs24_2,uniqueB_8vs24_2,uniqueD_8vs24_2,uniqueF_8vs24_2,uniqueI_8vs24_2,
          uniqueJ_8vs24_2,uniqueK_8vs24_2,uniqueP_8vs24_2)
length(unique(genes))

## [1] 1371

3.3.4.6 Venn diagrams shared and unique DE in individual contrasts

In this section, I created a Venn diagram for each genotype and the average response, showing unique and shared DE among the three different salinity contrasts.

First for the average response:

# define color scheme
myCol = c("dodgerblue3", "gray60", "firebrick")

# select relevant data
avg_8vs16 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`avg8-16`== 1)))
avg_8vs24 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`avg8-24` == 1)))
avg_16vs24 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`avg16-24` == 1)))

# plot Venn diagram
venn_avg = venn.diagram(x = list(avg_8vs16,avg_8vs24,avg_16vs24), NULL,
                         main = "average response", main.fontface = "plain", main.fontfamily = "sans", 
                         main.col = "black", main.cex = 1.5, 
                         category.names = c("8-16", "8-24","16-24"),
                         lwd = 2, lty = 1, fill = myCol, 
                         cex = 1, fontface = "bold", fontfamily = "sans", 
                         cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", 
                         cat.pos = c(-27, 27, 135), cat.dist = c(0.055, 0.055, 0.085), 
                         cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_avg)

Genotype A:

# select relevant data
A8vsA16 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`A8-A16`== 1)))
A8vsA24 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`A8-A24` == 1)))
A16vsA24 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`A16-A24` == 1)))

#plot Venn diagram
venn_A = venn.diagram(x = list(A8vsA16,A8vsA24,A16vsA24), NULL, 
                       main = "genotype A", main.fontface = "plain", main.fontfamily = "sans", 
                       main.col = "black", main.cex = 1.5, 
                       category.names = c("8-16", "8-24","16-24"),
                       lwd = 2, lty = 1, fill = myCol, 
                       cex = 1, fontface = "bold", fontfamily = "sans", 
                       cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", 
                       cat.pos = c(-27, 27, 135), cat.dist = c(0.055, 0.055, 0.085), 
                       cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_A)

Genotype B:

# select relevant data
B8vsB16 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`B8-B16`== 1)))
B8vsB24 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`B8-B24` == 1)))
B16vsB24 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`B16-B24` == 1)))

# plot Venn diagram
venn_B = venn.diagram(x = list(B8vsB16,B8vsB24,B16vsB24), NULL, 
                       main = "genotype B", main.fontface = "plain", main.fontfamily = "sans", 
                      main.col = "black", main.cex = 1.5, 
                       category.names = c("8-16", "8-24","16-24"),
                       lwd = 2, lty = 1, fill = myCol, 
                       cex = 1, fontface = "bold", fontfamily = "sans", 
                       cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", 
                       cat.pos = c(-27, 27, 135), cat.dist = c(0.055, 0.055, 0.085), 
                      cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_B)

Genotype D:

# select relevant data
D8vsD16 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`D8-D16`== 1)))
D8vsD24 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`D8-D24` == 1)))
D16vsD24 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`D16-D24` == 1)))

# plot Venn diagram
venn_D = venn.diagram(x = list(D8vsD16,D8vsD24,D16vsD24), NULL, 
                       main = "genotype D", main.fontface = "plain", main.fontfamily = "sans", 
                      main.col = "black", main.cex = 1.5, 
                       category.names = c("8-16", "8-24","16-24"),
                       lwd = 2, lty = 1, fill = myCol, 
                       cex = 1, fontface = "bold", fontfamily = "sans", 
                       cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", 
                       cat.pos = c(-27, 27, 135), cat.dist = c(0.055, 0.055, 0.085), 
                      cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_D)

Genotype F:

# select relevant data
F8vsF16 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`F8-F16`== 1)))
F8vsF24 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`F8-F24` == 1)))
F16vsF24 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`F16-F24` == 1)))

# plot Venn diagram
venn_F = venn.diagram(x = list(F8vsF16,F8vsF24,F16vsF24), NULL, 
                       main = "genotype F", main.fontface = "plain", main.fontfamily = "sans", 
                      main.col = "black", main.cex = 1.5, 
                       category.names = c("8-16", "8-24","16-24"),
                       lwd = 2, lty = 1, fill = myCol, 
                       cex = 1, fontface = "bold", fontfamily = "sans", 
                       cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", 
                       cat.pos = c(-27, 27, 135), cat.dist = c(0.055, 0.055, 0.085), 
                      cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_F)

# select relevant data
I8vsI16 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`I8-I16`== 1)))
I8vsI24 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`I8-I24` == 1)))
I16vsI24 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`I16-I24` == 1)))

# plot Venn diagram
venn_I = venn.diagram(x = list(I8vsI16,I8vsI24,I16vsI24), NULL, 
                       main = "genotype I", main.fontface = "plain", main.fontfamily = "sans", 
                      main.col = "black", main.cex = 1.5, 
                       category.names = c("8-16", "8-24","16-24"),
                       lwd = 2, lty = 1, fill = myCol, 
                       cex = 1, fontface = "bold", fontfamily = "sans", 
                       cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", 
                       cat.pos = c(-27, 27, 135), cat.dist = c(0.055, 0.055, 0.085), 
                      cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_I)

Genotype J:

# select relevant data
J8vsJ16 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`J8-J16`== 1)))
J8vsJ24 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`J8-J24` == 1)))
J16vsJ24 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`J16-J24` == 1)))

# plot Venn diagram
venn_J = venn.diagram(x = list(J8vsJ16,J8vsJ24,J16vsJ24), NULL, 
                       main = "genotype J", main.fontface = "plain", main.fontfamily = "sans", 
                      main.col = "black", main.cex = 1.5, 
                       category.names = c("8-16", "8-24","16-24"),
                       lwd = 2, lty = 1, fill = myCol, 
                       cex = 1, fontface = "bold", fontfamily = "sans", 
                       cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", 
                       cat.pos = c(-27, 27, 135), cat.dist = c(0.055, 0.055, 0.085), 
                      cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_J)

Genotype K:

# select relevant data
K8vsK16 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`K8-K16`== 1)))
K8vsK24 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`K8-K24` == 1)))
K16vsK24 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`K16-K24` == 1)))

# plot Venn diagram
venn_K = venn.diagram(x = list(K8vsK16,K8vsK24,K16vsK24), NULL, 
                       main = "genotype K", main.fontface = "plain", main.fontfamily = "sans", 
                      main.col = "black", main.cex = 1.5, 
                       category.names = c("8-16", "8-24","16-24"),
                       lwd = 2, lty = 1, fill = myCol, 
                       cex = 1, fontface = "bold", fontfamily = "sans", 
                       cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", 
                       cat.pos = c(-27, 27, 135), cat.dist = c(0.055, 0.055, 0.085), 
                      cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_K)

Genotype P:

# select relevant data
P8vsP16 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`P8-P16`== 1)))
P8vsP24 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`P8-P24` == 1)))
P16vsP24 = c(rownames(subset(OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`P16-P24` == 1)))

# plot Venn diagram
venn_P = venn.diagram(x = list(P8vsP16,P8vsP24,P16vsP24), NULL, 
                       main = "genotype P", main.fontface = "plain", main.fontfamily = "sans", 
                      main.col = "black", main.cex = 1.5, 
                       category.names = c("8-16", "8-24","16-24"),
                       lwd = 2, lty = 1, fill = myCol, 
                       cex = 1, fontface = "bold", fontfamily = "sans", 
                       cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", 
                       cat.pos = c(-27, 27, 135), cat.dist = c(0.055, 0.055, 0.085), 
                      cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_P)

3.3.4.7 Venn diagrams shared and unique DE in individual contrasts

In this section, I created a Venn diagram for each genotype and the average response, showing unique and shared DE among the three different salinity contrasts, and distinguishing between up- and downregulated genes.

First, I prepared the adta:

#separate data of genotypes and average response
res_genA = all_results_RQ1e2_OnlySig_logFC_OnlySign[1:3]
res_genB = all_results_RQ1e2_OnlySig_logFC_OnlySign[4:6]
res_genD = all_results_RQ1e2_OnlySig_logFC_OnlySign[7:9]
res_genF = all_results_RQ1e2_OnlySig_logFC_OnlySign[10:12]
res_genI = all_results_RQ1e2_OnlySig_logFC_OnlySign[13:15]
res_genJ = all_results_RQ1e2_OnlySig_logFC_OnlySign[16:18]
res_genK = all_results_RQ1e2_OnlySig_logFC_OnlySign[19:21]
res_genP = all_results_RQ1e2_OnlySig_logFC_OnlySign[22:24]
res_avg = all_results_RQ1e2_OnlySig_logFC_OnlySign[25:27]

#check for rows containing both up- and downregulated genes
posneg_genA = subset(res_genA,(rowSums(sign(res_genA)<0)>0) & (rowSums(sign(res_genA)>0)>0))
posneg_genB = subset(res_genB,(rowSums(sign(res_genB)<0)>0) & (rowSums(sign(res_genB)>0)>0))
posneg_genD = subset(res_genD,(rowSums(sign(res_genD)<0)>0) & (rowSums(sign(res_genD)>0)>0))
posneg_genF = subset(res_genF,(rowSums(sign(res_genF)<0)>0) & (rowSums(sign(res_genF)>0)>0))
posneg_genI = subset(res_genI,(rowSums(sign(res_genI)<0)>0) & (rowSums(sign(res_genI)>0)>0))
posneg_genJ = subset(res_genJ,(rowSums(sign(res_genJ)<0)>0) & (rowSums(sign(res_genJ)>0)>0))
posneg_genK = subset(res_genK,(rowSums(sign(res_genK)<0)>0) & (rowSums(sign(res_genK)>0)>0))
posneg_genP = subset(res_genP,(rowSums(sign(res_genP)<0)>0) & (rowSums(sign(res_genP)>0)>0))
posneg_avg = subset(res_avg,(rowSums(sign(res_avg)<0)>0) & (rowSums(sign(res_avg)>0)>0))

#check how many genes can be both up- and downregulated
nrow(posneg_genA)

## [1] 32

nrow(posneg_genB)

## [1] 2

nrow(posneg_genD)

## [1] 105

nrow(posneg_genF)

## [1] 297

nrow(posneg_genI)

## [1] 24

nrow(posneg_genJ)

## [1] 28

nrow(posneg_genK)

## [1] 12

nrow(posneg_genP)

## [1] 125

nrow(posneg_avg)

## [1] 178

#select up and downregulated genes
sel_genA = subset(res_genA, !(rownames(res_genA)%in%rownames(posneg_genA)))
down_genA = subset(sel_genA,(rowSums(sign(sel_genA)<0)>0))
up_genA = subset(sel_genA,(rowSums(sign(sel_genA)>0)>0))

sel_genB = subset(res_genB, !(rownames(res_genB)%in%rownames(posneg_genB)))
down_genB = subset(sel_genB,(rowSums(sign(sel_genB)<0)>0))
up_genB = subset(sel_genB,(rowSums(sign(sel_genB)>0)>0))

sel_genD = subset(res_genD, !(rownames(res_genD)%in%rownames(posneg_genD)))
down_genD = subset(sel_genD,(rowSums(sign(sel_genD)<0)>0))
up_genD = subset(sel_genD,(rowSums(sign(sel_genD)>0)>0))

sel_genF = subset(res_genF, !(rownames(res_genF)%in%rownames(posneg_genF)))
down_genF = subset(sel_genF,(rowSums(sign(sel_genF)<0)>0))
up_genF = subset(sel_genF,(rowSums(sign(sel_genF)>0)>0))

sel_genI = subset(res_genI, !(rownames(res_genI)%in%rownames(posneg_genI)))
down_genI = subset(sel_genI,(rowSums(sign(sel_genI)<0)>0))
up_genI = subset(sel_genI,(rowSums(sign(sel_genI)>0)>0))

sel_genJ = subset(res_genJ, !(rownames(res_genJ)%in%rownames(posneg_genJ)))
down_genJ = subset(sel_genJ,(rowSums(sign(sel_genJ)<0)>0))
up_genJ = subset(sel_genJ,(rowSums(sign(sel_genJ)>0)>0))

sel_genK = subset(res_genK, !(rownames(res_genK)%in%rownames(posneg_genK)))
down_genK = subset(sel_genK,(rowSums(sign(sel_genK)<0)>0))
up_genK = subset(sel_genK,(rowSums(sign(sel_genK)>0)>0))

sel_genP = subset(res_genP, !(rownames(res_genP)%in%rownames(posneg_genP)))
down_genP = subset(sel_genP,(rowSums(sign(sel_genP)<0)>0))
up_genP = subset(sel_genP,(rowSums(sign(sel_genP)>0)>0))

sel_avg = subset(res_avg, !(rownames(res_avg)%in%rownames(posneg_avg)))
down_avg = subset(sel_avg,(rowSums(sign(sel_avg)<0)>0))
up_avg = subset(sel_avg,(rowSums(sign(sel_avg)>0)>0))

#define color scheme
myCol = c("dodgerblue3", "gray60", "firebrick")

Average response upregulated:

avg_8vs16_up = c(rownames(subset(up_avg, up_avg$`avg8vs16_logFC`!= 0)))
avg_8vs24_up = c(rownames(subset(up_avg, up_avg$`avg8vs24_logFC`!= 0)))
avg_16vs24_up = c(rownames(subset(up_avg, up_avg$`avg16vs24_logFC`!= 0)))

venn_avg = venn.diagram(x = list(avg_8vs16_up,avg_8vs24_up,avg_16vs24_up), NULL,
                         main = "average response - upregulated", main.fontface = "plain", 
                         main.fontfamily = "sans", 
                         main.col = "black", main.cex = 1.5, 
                         category.names = c("8-16","8-24","16-24"),
                         lwd = 2, lty = 1, fill = myCol, #
                         cex = 1, fontface = "bold", fontfamily = "sans",
                         cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", 
                         cat.pos = c(-35, 27, 0), cat.dist = c(0.055, 0.055, 0.055), 
                         cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_avg)

Average response downregulated:

avg_8vs16_down = c(rownames(subset(down_avg, down_avg$`avg8vs16_logFC`!= 0)))
avg_8vs24_down = c(rownames(subset(down_avg, down_avg$`avg8vs24_logFC`!= 0)))
avg_16vs24_down = c(rownames(subset(down_avg, down_avg$`avg16vs24_logFC`!= 0)))

venn_avg = venn.diagram(x = list(avg_8vs16_down,avg_8vs24_down,avg_16vs24_down), NULL,
                         main = "average response - downregulated", main.fontface = "plain", 
                         main.fontfamily = "sans", 
                         main.col = "black", main.cex = 1.5, 
                         category.names = c("8-16","8-24","16-24"),
                         lwd = 2, lty = 1, fill = myCol,
                         cex = 1, fontface = "bold", fontfamily = "sans",
                         cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", 
                         cat.pos = c(-35, 27, 0), cat.dist = c(0.055, 0.055, 0.055), 
                         cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_avg)

Genotype A upregulated:

genA_8vs16_up = c(rownames(subset(up_genA, up_genA$`A8vsA16_logFC`!= 0)))
genA_8vs24_up = c(rownames(subset(up_genA, up_genA$`A8vsA24_logFC`!= 0)))
genA_16vs24_up = c(rownames(subset(up_genA, up_genA$`A16vsA24_logFC`!= 0)))

venn_genA = venn.diagram(x = list(genA_8vs16_up,genA_8vs24_up,genA_16vs24_up), NULL,
                          main = "genotype A - upregulated", main.fontface = "plain", 
                          main.fontfamily = "sans", 
                          main.col = "black", main.cex = 1.5, 
                          category.names = c("8-16","8-24","16-24"),
                          lwd = 2, lty = 1, fill = myCol, 
                          cex = 1, fontface = "bold", fontfamily = "sans",
                          cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", 
                          cat.pos = c(-35, 27, 0), cat.dist = c(0.055, 0.055, 0.055), 
                          cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_genA)

Genotype A downregulated:

#genotype A downregulated
genA_8vs16_down = c(rownames(subset(down_genA, down_genA$`A8vsA16_logFC`!= 0)))
genA_8vs24_down = c(rownames(subset(down_genA, down_genA$`A8vsA24_logFC`!= 0)))
genA_16vs24_down = c(rownames(subset(down_genA, down_genA$`A16vsA24_logFC`!= 0)))

venn_genA = venn.diagram(x = list(genA_8vs16_down,genA_8vs24_down,genA_16vs24_down), NULL,
                          main = "genotype A - downregulated", main.fontface = "plain", 
                          main.fontfamily = "sans", 
                          main.col = "black", main.cex = 1.5, 
                          category.names = c("8-16","8-24","16-24"),
                          lwd = 2, lty = 1, fill = myCol, 
                          cex = 1, fontface = "bold", fontfamily = "sans", 
                          cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", 
                          cat.pos = c(-35, 27, 0), cat.dist = c(0.055, 0.055, 0.055), 
                          cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_genA)

Genotype B upregulated:

genB_8vs16_up = c(rownames(subset(up_genB, up_genB$`B8vsB16_logFC`!= 0)))
genB_8vs24_up = c(rownames(subset(up_genB, up_genB$`B8vsB24_logFC`!= 0)))
genB_16vs24_up = c(rownames(subset(up_genB, up_genB$`B16vsB24_logFC`!= 0)))

venn_genB = venn.diagram(x = list(genB_8vs16_up,genB_8vs24_up,genB_16vs24_up), NULL,
                          main = "genotype B - upregulated", main.fontface = "plain", 
                          main.fontfamily = "sans", 
                          main.col = "black", main.cex = 1.5, 
                          category.names = c("8-16","8-24","16-24"),
                          lwd = 2, lty = 1, fill = myCol, 
                          cex = 1, fontface = "bold", fontfamily = "sans", 
                          cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", 
                          cat.pos = c(-35, 27, 0), cat.dist = c(0.055, 0.055, 0.055), 
                          cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_genB)

Genotype B downregulated:

genB_8vs16_down = c(rownames(subset(down_genB, down_genB$`B8vsB16_logFC`!= 0)))
genB_8vs24_down = c(rownames(subset(down_genB, down_genB$`B8vsB24_logFC`!= 0)))
genB_16vs24_down = c(rownames(subset(down_genB, down_genB$`B16vsB24_logFC`!= 0)))

venn_genB = venn.diagram(x = list(genB_8vs16_down,genB_8vs24_down,genB_16vs24_down), NULL,
                          main = "genotype B - downregulated", main.fontface = "plain", 
                          main.fontfamily = "sans", 
                          main.col = "black", main.cex = 1.5, 
                          category.names = c("8-16","8-24","16-24"),
                          lwd = 2, lty = 1, fill = myCol, 
                          cex = 1, fontface = "bold", fontfamily = "sans", 
                          cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", 
                          cat.pos = c(-35, 27, 0), cat.dist = c(0.055, 0.055, 0.055), 
                          cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_genB)

Genotype D upregulated:

genD_8vs16_up = c(rownames(subset(up_genD, up_genD$`D8vsD16_logFC`!= 0)))
genD_8vs24_up = c(rownames(subset(up_genD, up_genD$`D8vsD24_logFC`!= 0)))
genD_16vs24_up = c(rownames(subset(up_genD, up_genD$`D16vsD24_logFC`!= 0)))

venn_genD = venn.diagram(x = list(genD_8vs16_up,genD_8vs24_up,genD_16vs24_up), NULL,
                          main = "genotype D - upregulated", main.fontface = "plain",
                          main.fontfamily = "sans", 
                          main.col = "black", main.cex = 1.5,
                          category.names = c("8-16","8-24","16-24"),
                          lwd = 2, lty = 1, fill = myCol, 
                          cex = 1, fontface = "bold", fontfamily = "sans", 
                          cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", 
                          cat.pos = c(-35, 27, 0), cat.dist = c(0.055, 0.055, 0.055), 
                          cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_genD)

Genotype D downregulated:

genD_8vs16_down = c(rownames(subset(down_genD, down_genD$`D8vsD16_logFC`!= 0)))
genD_8vs24_down = c(rownames(subset(down_genD, down_genD$`D8vsD24_logFC`!= 0)))
genD_16vs24_down = c(rownames(subset(down_genD, down_genD$`D16vsD24_logFC`!= 0)))

venn_genD = venn.diagram(x = list(genD_8vs16_down,genD_8vs24_down,genD_16vs24_down), NULL,
                          main = "genotype D - downregulated", main.fontface = "plain", 
                          main.fontfamily = "sans", 
                          main.col = "black", main.cex = 1.5, 
                          category.names = c("8-16","8-24","16-24"),
                          lwd = 2, lty = 1, fill = myCol, 
                          cex = 1, fontface = "bold", fontfamily = "sans", 
                          cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", 
                          cat.pos = c(-35, 27, 0), cat.dist = c(0.055, 0.055, 0.055), 
                          cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_genD)

Genotype F upregulated:

genF_8vs16_up = c(rownames(subset(up_genF, up_genF$`F8vsF16_logFC`!= 0)))
genF_8vs24_up = c(rownames(subset(up_genF, up_genF$`F8vsF24_logFC`!= 0)))
genF_16vs24_up = c(rownames(subset(up_genF, up_genF$`F16vsF24_logFC`!= 0)))

venn_genF = venn.diagram(x = list(genF_8vs16_up,genF_8vs24_up,genF_16vs24_up), NULL,
                          main = "genotype F - upregulated", main.fontface = "plain", 
                          main.fontfamily = "sans", 
                          main.col = "black", main.cex = 1.5, 
                          category.names = c("8-16","8-24","16-24"),
                          lwd = 2, lty = 1, fill = myCol,
                          cex = 1, fontface = "bold", fontfamily = "sans", 
                          cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", 
                          cat.pos = c(-35, 27, 0), cat.dist = c(0.055, 0.055, 0.055), 
                          cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_genF)

Genotype F downregulated:

genF_8vs16_down = c(rownames(subset(down_genF, down_genF$`F8vsF16_logFC`!= 0)))
genF_8vs24_down = c(rownames(subset(down_genF, down_genF$`F8vsF24_logFC`!= 0)))
genF_16vs24_down = c(rownames(subset(down_genF, down_genF$`F16vsF24_logFC`!= 0)))

venn_genF = venn.diagram(x = list(genF_8vs16_down,genF_8vs24_down,genF_16vs24_down), NULL,
                          main = "genotype F - downregulated", main.fontface = "plain",
                          main.fontfamily = "sans", 
                          main.col = "black", main.cex = 1.5, 
                          category.names = c("8-16","8-24","16-24"),
                          lwd = 2, lty = 1, fill = myCol, 
                          cex = 1, fontface = "bold", fontfamily = "sans", 
                          cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", 
                          cat.pos = c(-35, 27, 0), cat.dist = c(0.055, 0.055, 0.055), 
                          cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_genF)

Genotype I upregulated:

genI_8vs16_up = c(rownames(subset(up_genI, up_genI$`I8vsI16_logFC`!= 0)))
genI_8vs24_up = c(rownames(subset(up_genI, up_genI$`I8vsI24_logFC`!= 0)))
genI_16vs24_up = c(rownames(subset(up_genI, up_genI$`I16vsI24_logFC`!= 0)))

venn_genI = venn.diagram(x = list(genI_8vs16_up,genI_8vs24_up,genI_16vs24_up), NULL,
                          main = "genotype I - upregulated", main.fontface = "plain", 
                          main.fontfamily = "sans", 
                          main.col = "black", main.cex = 1.5, 
                          category.names = c("8-16","8-24","16-24"),
                          lwd = 2, lty = 1, fill = myCol, 
                          cex = 1, fontface = "bold", fontfamily = "sans", 
                          cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", 
                          cat.pos = c(-35, 27, 0), cat.dist = c(0.055, 0.055, 0.055), 
                          cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_genI)

Genotype I downregulated:

genI_8vs16_down = c(rownames(subset(down_genI, down_genI$`I8vsI16_logFC`!= 0)))
genI_8vs24_down = c(rownames(subset(down_genI, down_genI$`I8vsI24_logFC`!= 0)))
genI_16vs24_down = c(rownames(subset(down_genI, down_genI$`I16vsI24_logFC`!= 0)))

venn_genI = venn.diagram(x = list(genI_8vs16_down,genI_8vs24_down,genI_16vs24_down), NULL,
                          main = "genotype I - downregulated", main.fontface = "plain", 
                          main.fontfamily = "sans", 
                          main.col = "black", main.cex = 1.5, 
                          category.names = c("8-16","8-24","16-24"),
                          lwd = 2, lty = 1, fill = myCol, 
                          cex = 1, fontface = "bold", fontfamily = "sans", 
                          cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", 
                          cat.pos = c(-35, 27, 0), cat.dist = c(0.055, 0.055, 0.055), 
                          cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_genI)

Genotype J upregulated:

genJ_8vs16_up = c(rownames(subset(up_genJ, up_genJ$`J8vsJ16_logFC`!= 0)))
genJ_8vs24_up = c(rownames(subset(up_genJ, up_genJ$`J8vsJ24_logFC`!= 0)))
genJ_16vs24_up = c(rownames(subset(up_genJ, up_genJ$`J16vsJ24_logFC`!= 0)))

venn_genJ = venn.diagram(x = list(genJ_8vs16_up,genJ_8vs24_up,genJ_16vs24_up), NULL,
                          main = "genotype J - upregulated", main.fontface = "plain", 
                          main.fontfamily = "sans", 
                          main.col = "black", main.cex = 1.5, 
                          category.names = c("8-16","8-24","16-24"),
                          lwd = 2, lty = 1, fill = myCol, 
                          cex = 1, fontface = "bold", fontfamily = "sans", 
                          cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", 
                          cat.pos = c(-35, 27, 0), cat.dist = c(0.055, 0.055, 0.055), 
                          cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_genJ)

Genotype J downregulated:

genJ_8vs16_down = c(rownames(subset(down_genJ, down_genJ$`J8vsJ16_logFC`!= 0)))
genJ_8vs24_down = c(rownames(subset(down_genJ, down_genJ$`J8vsJ24_logFC`!= 0)))
genJ_16vs24_down = c(rownames(subset(down_genJ, down_genJ$`J16vsJ24_logFC`!= 0)))

venn_genJ = venn.diagram(x = list(genJ_8vs16_down,genJ_8vs24_down,genJ_16vs24_down), NULL,
                          main = "genotype J - downregulated", main.fontface = "plain", 
                          main.fontfamily = "sans", 
                          main.col = "black", main.cex = 1.5, 
                          category.names = c("8-16","8-24","16-24"),
                          lwd = 2, lty = 1, fill = myCol, 
                          cex = 1, fontface = "bold", fontfamily = "sans", 
                          cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", 
                          cat.pos = c(-35, 27, 0), cat.dist = c(0.055, 0.055, 0.055), 
                          cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_genJ)

Genotype K upregulated:

genK_8vs16_up = c(rownames(subset(up_genK, up_genK$`K8vsK16_logFC`!= 0)))
genK_8vs24_up = c(rownames(subset(up_genK, up_genK$`K8vsK24_logFC`!= 0)))
genK_16vs24_up = c(rownames(subset(up_genK, up_genK$`K16vsK24_logFC`!= 0)))

venn_genK = venn.diagram(x = list(genK_8vs16_up,genK_8vs24_up,genK_16vs24_up), NULL,
                          main = "genotype K - upregulated", main.fontface = "plain", main.fontfamily = "sans", 
                          main.col = "black", main.cex = 1.5, 
                          category.names = c("8-16","8-24","16-24"),
                          lwd = 2, lty = 1, fill = myCol, 
                          cex = 1, fontface = "bold", fontfamily = "sans", 
                          cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer",
                          cat.pos = c(-35, 27, 0), cat.dist = c(0.055, 0.055, 0.055), 
                          cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_genK)

Genotype K downregulated:

genK_8vs16_down = c(rownames(subset(down_genK, down_genK$`K8vsK16_logFC`!= 0)))
genK_8vs24_down = c(rownames(subset(down_genK, down_genK$`K8vsK24_logFC`!= 0)))
genK_16vs24_down = c(rownames(subset(down_genK, down_genK$`K16vsK24_logFC`!= 0)))

venn_genK = venn.diagram(x = list(genK_8vs16_down,genK_8vs24_down,genK_16vs24_down), NULL,
                          main = "genotype K - downregulated", main.fontface = "plain", 
                          main.fontfamily = "sans", 
                          main.col = "black", main.cex = 1.5, 
                          category.names = c("8-16","8-24","16-24"),
                          lwd = 2, lty = 1, fill = myCol, 
                          cex = 1, fontface = "bold", fontfamily = "sans", 
                          cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", 
                          cat.pos = c(-35, 27, 0), cat.dist = c(0.055, 0.055, 0.055), 
                          cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_genK)

Genotype P upregulated:

genP_8vs16_up = c(rownames(subset(up_genP, up_genP$`P8vsP16_logFC`!= 0)))
genP_8vs24_up = c(rownames(subset(up_genP, up_genP$`P8vsP24_logFC`!= 0)))
genP_16vs24_up = c(rownames(subset(up_genP, up_genP$`P16vsP24_logFC`!= 0)))

venn_genP = venn.diagram(x = list(genP_8vs16_up,genP_8vs24_up,genP_16vs24_up), NULL,
                          main = "genotype P - upregulated", main.fontface = "plain", 
                          main.fontfamily = "sans", 
                          main.col = "black", main.cex = 1.5, 
                          category.names = c("8-16","8-24","16-24"),
                          lwd = 2, lty = 1, fill = myCol, 
                          cex = 1, fontface = "bold", fontfamily = "sans", 
                          cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", 
                          cat.pos = c(-35, 27, 0), cat.dist = c(0.055, 0.055, 0.055), 
                          cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_genP)

Genotype P downregulated:

genP_8vs16_down = c(rownames(subset(down_genP, down_genP$`P8vsP16_logFC`!= 0)))
genP_8vs24_down = c(rownames(subset(down_genP, down_genP$`P8vsP24_logFC`!= 0)))
genP_16vs24_down = c(rownames(subset(down_genP, down_genP$`P16vsP24_logFC`!= 0)))

venn_genP = venn.diagram(x = list(genP_8vs16_down,genP_8vs24_down,genP_16vs24_down), NULL,
                          main = "genotype P - downregulated", main.fontface = "plain", main.fontfamily = "sans", 
                          main.col = "black", main.cex = 1.5, 
                          category.names = c("8-16","8-24","16-24"),
                          lwd = 2, lty = 1, fill = myCol, 
                          cex = 1, fontface = "bold", fontfamily = "sans",
                          cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer",
                          cat.pos = c(-35, 27, 0), cat.dist = c(0.055, 0.055, 0.055), 
                          cat.fontfamily = "sans", rotation = 1)
grid.draw(venn_genP)

3.3.5 Core response genes

Core response genes were defined as genes that are DE in each genotype, regardless of the salinity contrast. We selected for these genes as follows:

# list of significant genes for each genotype
SignGenes_genA = c(rownames (subset (OnlySignGenes_RQ1e2_ConStage, 
                                     OnlySignGenes_RQ1e2_ConStage$`A8-A24`== 1 | 
                                     OnlySignGenes_RQ1e2_ConStage$`A8-A16`== 1 | 
                                     OnlySignGenes_RQ1e2_ConStage$`A16-A24`== 1)))
SignGenes_genB = c(rownames (subset (OnlySignGenes_RQ1e2_ConStage, 
                                     OnlySignGenes_RQ1e2_ConStage$`B8-B24`== 1 | 
                                     OnlySignGenes_RQ1e2_ConStage$`B8-B16`== 1 |
                                     OnlySignGenes_RQ1e2_ConStage$`B16-B24`== 1)))
SignGenes_genD = c(rownames (subset (OnlySignGenes_RQ1e2_ConStage, 
                                     OnlySignGenes_RQ1e2_ConStage$`D8-D24`== 1 | 
                                     OnlySignGenes_RQ1e2_ConStage$`D8-D16`== 1 | 
                                     OnlySignGenes_RQ1e2_ConStage$`D16-D24`== 1)))
SignGenes_genF = c(rownames (subset (OnlySignGenes_RQ1e2_ConStage, 
                                     OnlySignGenes_RQ1e2_ConStage$`F8-F24`== 1 | 
                                     OnlySignGenes_RQ1e2_ConStage$`F8-F16`== 1 | 
                                     OnlySignGenes_RQ1e2_ConStage$`F16-F24`== 1)))
SignGenes_genI = c(rownames (subset (OnlySignGenes_RQ1e2_ConStage, 
                                     OnlySignGenes_RQ1e2_ConStage$`I8-I24`== 1 | 
                                     OnlySignGenes_RQ1e2_ConStage$`I8-I16`== 1 | 
                                     OnlySignGenes_RQ1e2_ConStage$`I16-I24`== 1)))
SignGenes_genJ = c(rownames (subset (OnlySignGenes_RQ1e2_ConStage, 
                                     OnlySignGenes_RQ1e2_ConStage$`J8-J24`== 1 | 
                                     OnlySignGenes_RQ1e2_ConStage$`J8-J16`== 1 | 
                                     OnlySignGenes_RQ1e2_ConStage$`J16-J24`== 1)))
SignGenes_genK = c(rownames (subset (OnlySignGenes_RQ1e2_ConStage, 
                                     OnlySignGenes_RQ1e2_ConStage$`K8-K24`== 1 | 
                                     OnlySignGenes_RQ1e2_ConStage$`K8-K16`== 1 | 
                                     OnlySignGenes_RQ1e2_ConStage$`K16-K24`== 1)))
SignGenes_genP = c(rownames (subset (OnlySignGenes_RQ1e2_ConStage, 
                                     OnlySignGenes_RQ1e2_ConStage$`P8-P24`== 1 | 
                                     OnlySignGenes_RQ1e2_ConStage$`P8-P16`== 1 | 
                                     OnlySignGenes_RQ1e2_ConStage$`P16-P24`== 1)))

# take intersect of all genotypes
RQ1e2_CoreResponse = Reduce(intersect, list(SignGenes_genA,SignGenes_genB,SignGenes_genD,SignGenes_genF,SignGenes_genI,SignGenes_genJ,SignGenes_genK,SignGenes_genP))
RQ1e2_CoreResponse

##  [1] "Sm_t00000820-RA" "Sm_t00001191-RA" "Sm_t00002242-RA" "Sm_t00002835-RA"
##  [5] "Sm_t00003616-RA" "Sm_t00003882-RA" "Sm_t00005258-RA" "Sm_t00005259-RA"
##  [9] "Sm_t00005877-RA" "Sm_t00007121-RA" "Sm_t00007360-RA" "Sm_t00007543-RA"
## [13] "Sm_t00008098-RA" "Sm_t00008123-RA" "Sm_t00008820-RA" "Sm_t00009398-RA"
## [17] "Sm_t00009402-RA" "Sm_t00009981-RA" "Sm_t00010077-RA" "Sm_t00010552-RA"
## [21] "Sm_t00010556-RA" "Sm_t00011041-RA" "Sm_t00011042-RA" "Sm_t00012577-RA"
## [25] "Sm_t00013291-RA" "Sm_t00013313-RA" "Sm_t00014816-RA" "Sm_t00015478-RA"
## [29] "Sm_t00016600-RA" "Sm_t00017272-RA" "Sm_t00018475-RA" "Sm_t00018687-RA"
## [33] "Sm_t00018847-RA"

How did these core response genes relate to the top genes selected by stageR’s FDR-adjusted P-value of the global null hypothesis (Padjscreen)?

# top 25 genes
Padjscreen_sorted_top25 = Padjscreen_sorted[1:25,] 
Padjscreen_sorted_top25_genes = rownames(Padjscreen_sorted_top25) 
venn_top25 = venn.diagram(x = list(RQ1e2_CoreResponse, Padjscreen_sorted_top25_genes), NULL,
                           main = "top 25 genes & core response", main.fontface = "plain", 
                           main.fontfamily = "sans", main.col = "black", main.cex = 1.5, #lay-out title
                           category.names = c("core response", "top 25"), alpha=c(0.5,0.5), 
                           lwd = 2, lty = 'blank', fill = c("dodgerblue3", "gray60"), #lay-out circles
                           cex = 1, fontface = "bold", fontfamily = "sans", #lay-out numbers
                           cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", #lay-out names
                           cat.pos = c(-155, 155), cat.dist = c(0.055, 0.055), cat.fontfamily = "sans")
grid.draw(venn_top25)

# top 100 genes
Padjscreen_sorted_top100 = Padjscreen_sorted[1:100,] 
Padjscreen_sorted_top100_genes = rownames(Padjscreen_sorted_top100) 
venn_top100 = venn.diagram(x = list(RQ1e2_CoreResponse, Padjscreen_sorted_top100_genes), NULL,
                            main = "top 100 genes & core response", main.fontface = "plain", 
                            main.fontfamily = "sans", main.col = "black", main.cex = 1.5, #lay-out title
                            category.names = c("core response", "top 100"), alpha=c(0.5,0.5), 
                            lwd = 2, lty = 'blank', fill = c("dodgerblue3", "gray60"), #lay-out circles
                            cex = 1, fontface = "bold", fontfamily = "sans", #lay-out numbers
                            cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", #lay-out names
                            cat.pos = c(-165, 145), cat.dist = c(0.055, 0.055), cat.fontfamily = "sans")
grid.draw(venn_top100)

# top 225 genes
Padjscreen_sorted_top225 = Padjscreen_sorted[1:225,] 
Padjscreen_sorted_top225_genes = rownames(Padjscreen_sorted_top225) 
venn_top225 = venn.diagram(x = list(RQ1e2_CoreResponse, Padjscreen_sorted_top225_genes), NULL,
                            main = "top 225 genes & core response", main.fontface = "plain", 
                            main.fontfamily = "sans", main.col = "black", main.cex = 1.5, #lay-out title
                            category.names = c("core response", "top 225"), alpha=c(0.5,0.5), 
                            lwd = 2, lty = 'blank', fill = c("dodgerblue3", "gray60"), #lay-out circles
                            cex = 1, fontface = "bold", fontfamily = "sans", #lay-out numbers
                            cat.cex = 1.5, cat.fontface = "bold", cat.default.pos = "outer", #lay-out names
                            cat.pos = c(-165, 45), cat.dist = c(0.055, 0.055), cat.fontfamily = "sans")
grid.draw(venn_top225)

Next, we plotted a heatmap of the core response genes, showing logFC values of significant and non-significant contrasts:

# select logFC values
all_results_RQ1e2_logFC = all_results_RQ1e2[,grepl("logFC", colnames(all_results_RQ1e2))]

# change column names of logFC object
colnames(all_results_RQ1e2_logFC) = c('A8-A16','A16-A24','A8-A24','B8-B16','B16-B24','B8-B24','D8-D16','D16-D24','D8-D24','F8-F16','F16-F24','F8-F24','I8-I16','I16-I24','I8-I24','J8-J16','J16-J24','J8-J24','K8-K16','K16-K24','K8-K24','P8-P16','P16-P24','P8-P24','avg8-avg16','avg16-avg24','avg8-avg24')

# select core response genes
all_results_RQ1e2_logFC_core = subset(all_results_RQ1e2_logFC, rownames(all_results_RQ1e2_logFC)%in%RQ1e2_CoreResponse)

# reorder gene names (rows)
target = c("Sm_t00000820-RA", "Sm_t00001191-RA", "Sm_t00002242-RA", "Sm_t00002835-RA",
           "Sm_t00003616-RA", "Sm_t00003882-RA", "Sm_t00005258-RA", "Sm_t00005259-RA",
           "Sm_t00005877-RA", "Sm_t00007121-RA", "Sm_t00007360-RA", "Sm_t00007543-RA",
           "Sm_t00008098-RA", "Sm_t00008123-RA", "Sm_t00008820-RA", "Sm_t00009398-RA",
           "Sm_t00009402-RA", "Sm_t00009981-RA", "Sm_t00010077-RA", "Sm_t00010552-RA",
           "Sm_t00010556-RA", "Sm_t00011041-RA", "Sm_t00011042-RA", "Sm_t00012577-RA",
           "Sm_t00013291-RA", "Sm_t00013313-RA", "Sm_t00014816-RA", "Sm_t00015478-RA",
           "Sm_t00016600-RA", "Sm_t00017272-RA", "Sm_t00018475-RA", "Sm_t00018687-RA",
           "Sm_t00018847-RA") 

all_results_RQ1e2_logFC_core_reordered = all_results_RQ1e2_logFC_core [match(target, rownames(all_results_RQ1e2_logFC_core)),]

# change row names (= gene names) to include more information on gene identity
#rownames(all_results_RQ1e2_logFC_core_reordered) = 
  #c("Sm_g00002242 slc38a11 - amino acid transporter", 
    #"Sm_g00003882 KEA3 - potassium transporter",
    #"Sm_g00005258 SLC35F5 - solute transporter", 
    #"Sm_g00021791 SLC35F5 - solute transporter",
    #"Sm_g00007543 MJ0079 - ATPase activity", 
    #"Sm_g00007121 ATP13A3 - cation transporting ATPase", 
    #"Sm_g00008820 HMA9 - cation transporting ATPase", 
    #"Sm_g00000820 Evolv2 - fatty acid/lipid metabolism", 
    #"Sm_g00005259 - fatty acid/lipid metabolism",  
    #"Sm_g00021792 - fatty acid/lipid metabolism",
    #"Sm_g00013313 odc-1 - polyamine biosynthesis",  
    #"Sm_g00020016 aphA - polyamine biosynthesis",  
    #"Sm_g00015478 CALS1 - 1,3-beta-D-glucan biosynthesis", 
    #"Sm_g00012577 VDE1 - violaxanthin-de-epoxidase", 
    #"Sm_g00008098 - transcription factor", 
    #"Sm_g00009981 fusA - translation elongation",
    #"Sm_g00009402 - protein binding activity", 
    #"Sm_g00013291 - unknown",
    #"Sm_g00017716 - unknown", 
    #"Sm_g00018687 - unknown", 
    #"Sm_g00019737 - unknown", 
    #"Sm_g00014816 - unknown", 
    #"Sm_g00007360 Usp5 - deubiquitination", 
    #"Sm_g00015422 PSMD12 - proteasome subunit",
    #"Sm_g00008123 - iron ion binding", 
    #"Sm_g00011041 AKHSDH1 - glycine/serine/threonine metabolism [ectoine?]", 
    #"Sm_g00011042 asd - glycine/serine/threonine metabolism [ectoine?]")

# reformat data for plotting
all_results_RQ1e2_logFC_core_reordered_reshaped = gather(all_results_RQ1e2_logFC_core_reordered, 
                                                         "condition", "logFC", 1:27)
temp_rownames = rep(rownames(all_results_RQ1e2_logFC_core_reordered), 27)
all_results_RQ1e2_logFC_core_reordered_reshaped$gene = temp_rownames

# create data frame with TRUE/FALSE information on significance
all_results_RQ1e2_OnlySig_logFC_OnlySign_TF = all_results_RQ1e2_OnlySig_logFC_OnlySign
all_results_RQ1e2_OnlySig_logFC_OnlySign_TF[] = lapply(all_results_RQ1e2_OnlySig_logFC_OnlySign_TF, as.logical)
all_results_RQ1e2_OnlySig_logFC_OnlySign_TF[all_results_RQ1e2_OnlySig_logFC_OnlySign_TF == FALSE] = NA 

# include information on significance
TF_logFC_core_response = subset(all_results_RQ1e2_OnlySig_logFC_OnlySign_TF,
                                rownames(all_results_RQ1e2_OnlySig_logFC_OnlySign_TF)%in%RQ1e2_CoreResponse)
TF_logFC_core_response_reordered = TF_logFC_core_response[match(target, rownames(TF_logFC_core_response)),]

TF_logFC_core_response_reordered_reshaped = gather(TF_logFC_core_response_reordered, 
                                                   "condition2", "significance", 1:27)
temp_rownames = rep(rownames(TF_logFC_core_response_reordered), 27)
TF_logFC_core_response_reordered_reshaped$gene2 = temp_rownames

logFC_core_response_all = cbind(all_results_RQ1e2_logFC_core_reordered_reshaped,
                                TF_logFC_core_response_reordered_reshaped)

# create order for plotting
logFC_core_response_all$condition = factor(logFC_core_response_all$condition, levels = c(
  'avg16-avg24','A16-A24','B16-B24','D16-D24','F16-F24','I16-I24','J16-J24','K16-K24','P16-P24',
  'avg8-avg16','A8-A16','B8-B16','D8-D16','F8-F16','I8-I16','J8-J16','K8-K16','P8-P16',
  'avg8-avg24','A8-A24','B8-B24','D8-D24','F8-F24','I8-I24','J8-J24','K8-K24','P8-P24'))

#logFC_core_response_all$gene = factor(logFC_core_response_all$gene, 
 #     levels = rev(c("Sm_g00002242 slc38a11 - amino acid transporter", 
                     #"Sm_g00003882 KEA3 - potassium transporter",
                     #"Sm_g00005258 SLC35F5 - solute transporter", 
                     #"Sm_g00021791 SLC35F5 - solute transporter",
                     #"Sm_g00007543 MJ0079 - ATPase activity", 
                     #"Sm_g00007121 ATP13A3 - cation transporting ATPase", 
                     #"Sm_g00008820 HMA9 - cation transporting ATPase", 
                     #"Sm_g00000820 Evolv2 - fatty acid/lipid metabolism", 
                     #"Sm_g00005259 - fatty acid/lipid metabolism",  
                     #"Sm_g00021792 - fatty acid/lipid metabolism",
                     #"Sm_g00013313 odc-1 - polyamine biosynthesis",  
                     #"Sm_g00020016 aphA - polyamine biosynthesis",  
                     #"Sm_g00015478 CALS1 - 1,3-beta-D-glucan biosynthesis", 
                     #"Sm_g00012577 VDE1 - violaxanthin-de-epoxidase", 
                     #"Sm_g00008098 - transcription factor", 
                     #"Sm_g00009981 fusA - translation elongation",
                     #"Sm_g00009402 - protein binding activity", 
                     #"Sm_g00013291 - unknown",
                     #"Sm_g00017716 - unknown", 
                     #"Sm_g00018687 - unknown", 
                     #"Sm_g00019737 - unknown", 
                     #"Sm_g00014816 - unknown", 
                     #"Sm_g00007360 Usp5 - deubiquitination", 
                     #"Sm_g00015422 PSMD12 - proteasome subunit",
                     #"Sm_g00008123 - iron ion binding", 
                     #"Sm_g00011041 AKHSDH1 - glycine/serine/threonine metabolism [ectoine?]", 
                     #"Sm_g00011042 asd - glycine/serine/threonine metabolism [ectoine?]")))

# plot heatmap
heatmap_core_all = ggplot(logFC_core_response_all, aes(x = condition, y = gene, fill = logFC)) + geom_tile() +
  geom_tile(data = logFC_core_response_all[!is.na(logFC_core_response_all$significance), ], 
            aes(color = significance), size = 0.5) +
  theme(panel.background = element_blank()) +
  scale_fill_gradient2(low="#313695", mid = 'white', high="#A50026", midpoint=0, name = 'logFC') +
  scale_color_manual(guide = FALSE, values = c(`TRUE` = "black")) +
  xlab("Contrast") +
  ylab("Gene") + 
  theme(axis.text.x = element_text(angle = 45, vjust = 0.8, hjust = 0.8, size = 9), 
        axis.text.y = element_text(size = 9), strip.text = element_text(size = 9, family = "sans", ), 
        axis.title = element_text(size = 12),  strip.text.y = element_text(angle = 0, hjust = 0), 
        strip.background = element_blank())
heatmap_core_all

3.3.6 Volcano plots

In this section, we plotted a volcano plot for each contrast (average response and genotype-dependent response).

First, the 8-24 contrasts:

par(mfrow=c(3,3))

# calculate threshold at which the non adjusted P-values are no longer significant in the FDR
subset = subset(all_results_RQ1e2, avg8vs24_Padj>0.05 | avg8vs24_Padj==is.na(NA)) # select all genes that are not significant for the FDR
sel = subset[with(subset, order(subset$avg8vs24_Padj)),] # sort Padj from small to large
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) # select the name of the first row = smallest non significant non adjusted P-value
threshold = sel2$avg8vs24_nonadj_PValue # select the non adjusted P-value
# plot the volcano plot
plot(1, type="n", xlab = NA, ylab="-log10 nonadj P", main="average effect", 
     xlim=c(-10,10),
     ylim=c(0, 45))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, avg8vs24_Padj > 0.05 | avg8vs24_Padj == is.na(NA)), 
     points(avg8vs24_logFC, -log10(avg8vs24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(avg8vs24_logFC) >= 1 & avg8vs24_Padj <= 0.05), 
     points(avg8vs24_logFC, -log10(avg8vs24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(avg8vs24_logFC) < 1 & avg8vs24_Padj <= 0.05), 
     points(avg8vs24_logFC, -log10(avg8vs24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
# core genes
core_genes = as.data.frame(subset(all_results_RQ1e2, rownames(all_results_RQ1e2)%in%RQ1e2_CoreResponse))
with(subset(core_genes, avg8vs24_Padj <= 0.05), points(avg8vs24_logFC, -log10(avg8vs24_nonadj_PValue), 
                                                       pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, avg8vs24_Padj > 0.05), points(avg8vs24_logFC, -log10(avg8vs24_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="gray30"))

## genotype A [8-24]
subset = subset(all_results_RQ1e2, A8vsA24_Padj>0.05 | A8vsA24_Padj==is.na(NA))  
sel = subset[with(subset, order(subset$A8vsA24_Padj)),] 
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) 
threshold = sel2$A8vsA24_nonadj_PValue
plot(1, type="n", xlab = NA, ylab = NA, main="genotype A", 
     xlim=c(-10,10), 
     ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, A8vsA24_Padj > 0.05 | A8vsA24_Padj == is.na(NA)), 
     points(A8vsA24_logFC, -log10(A8vsA24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(A8vsA24_logFC) >= 1 & A8vsA24_Padj <= 0.05), 
     points(A8vsA24_logFC, -log10(A8vsA24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(A8vsA24_logFC) < 1 & A8vsA24_Padj <= 0.05), 
     points(A8vsA24_logFC, -log10(A8vsA24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, A8vsA24_Padj <= 0.05), points(A8vsA24_logFC, -log10(A8vsA24_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, A8vsA24_Padj > 0.05), points(A8vsA24_logFC, -log10(A8vsA24_nonadj_PValue), 
                                                     pch=15, cex = 0.75, col="gray30"))

# genotype B [8-24]
subset = subset(all_results_RQ1e2, B8vsB24_Padj>0.05 | B8vsB24_Padj==is.na(NA)) 
sel = subset[with(subset, order(subset$B8vsB24_Padj)),] 
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) 
threshold = sel2$B8vsB24_nonadj_PValue 
plot(1, type="n",xlab = NA, ylab = NA, main="genotype B", 
     xlim=c(-10,10), 
     ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, B8vsB24_Padj > 0.05 | B8vsB24_Padj == is.na(NA)), 
     points(B8vsB24_logFC, -log10(B8vsB24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(B8vsB24_logFC) >= 1 & B8vsB24_Padj <= 0.05), 
     points(B8vsB24_logFC, -log10(B8vsB24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(B8vsB24_logFC) < 1 & B8vsB24_Padj <= 0.05), 
     points(B8vsB24_logFC, -log10(B8vsB24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, B8vsB24_Padj <= 0.05), points(B8vsB24_logFC, -log10(B8vsB24_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, B8vsB24_Padj > 0.05), points(B8vsB24_logFC, -log10(B8vsB24_nonadj_PValue), 
                                                     pch=15, cex = 0.75, col="gray30"))

## genotype D [8-24]
subset = subset(all_results_RQ1e2, D8vsD24_Padj>0.05 | D8vsD24_Padj==is.na(NA)) 
sel = subset[with(subset, order(subset$D8vsD24_Padj)),] 
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) 
threshold = sel2$D8vsD24_nonadj_PValue 
plot(1, type="n", xlab = NA, ylab="-log10 nonadj P", main="genotype D", 
     xlim=c(-10,10), 
     ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, D8vsD24_Padj > 0.05 | D8vsD24_Padj == is.na(NA)), 
     points(D8vsD24_logFC, -log10(D8vsD24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(D8vsD24_logFC) >= 1 & D8vsD24_Padj <= 0.05), 
     points(D8vsD24_logFC, -log10(D8vsD24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(D8vsD24_logFC) < 1 & D8vsD24_Padj <= 0.05), 
     points(D8vsD24_logFC, -log10(D8vsD24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, D8vsD24_Padj <= 0.05), points(D8vsD24_logFC, -log10(D8vsD24_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, D8vsD24_Padj > 0.05), points(D8vsD24_logFC, -log10(D8vsD24_nonadj_PValue), 
                                                     pch=15, cex = 0.75, col="gray30"))

## genotype F [8-24]
subset = subset(all_results_RQ1e2, F8vsF24_Padj>0.05 | F8vsF24_Padj==is.na(NA)) 
sel = subset[with(subset, order(subset$F8vsF24_Padj)),] 
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) 
threshold = sel2$F8vsF24_nonadj_PValue 
plot(1, type="n", xlab = NA, ylab = NA, main="genotype F", 
     xlim=c(-10,10), 
     ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, F8vsF24_Padj > 0.05 | F8vsF24_Padj == is.na(NA)), 
     points(F8vsF24_logFC, -log10(F8vsF24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(F8vsF24_logFC) >= 1 & F8vsF24_Padj <= 0.05), 
     points(F8vsF24_logFC, -log10(F8vsF24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(F8vsF24_logFC) < 1 & F8vsF24_Padj <= 0.05), 
     points(F8vsF24_logFC, -log10(F8vsF24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, F8vsF24_Padj <= 0.05), points(F8vsF24_logFC, -log10(F8vsF24_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, F8vsF24_Padj > 0.05), points(F8vsF24_logFC, -log10(F8vsF24_nonadj_PValue), 
                                                     pch=15, cex = 0.75, col="gray30"))

## genotype I [8-24]
subset = subset(all_results_RQ1e2, I8vsI24_Padj>0.05 | I8vsI24_Padj==is.na(NA)) 
sel = subset[with(subset, order(subset$I8vsI24_Padj)),] 
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) 
threshold = sel2$I8vsI24_nonadj_PValue 
plot(1, type="n", xlab = NA, ylab = NA, main="genotype I", 
     xlim=c(-10,10), 
     ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, I8vsI24_Padj > 0.05 | I8vsI24_Padj == is.na(NA)), 
     points(I8vsI24_logFC, -log10(I8vsI24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(I8vsI24_logFC) >= 1 & I8vsI24_Padj <= 0.05), 
     points(I8vsI24_logFC, -log10(I8vsI24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(I8vsI24_logFC) < 1 & I8vsI24_Padj <= 0.05), 
     points(I8vsI24_logFC, -log10(I8vsI24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, I8vsI24_Padj <= 0.05), points(I8vsI24_logFC, -log10(I8vsI24_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, I8vsI24_Padj > 0.05), points(I8vsI24_logFC, -log10(I8vsI24_nonadj_PValue), 
                                                     pch=15, cex = 0.75, col="gray30"))

## genotype J [8-24]
subset = subset(all_results_RQ1e2, J8vsJ24_Padj>0.05 | J8vsJ24_Padj==is.na(NA)) 
sel = subset[with(subset, order(subset$J8vsJ24_Padj)),] 
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) 
threshold = sel2$J8vsJ24_nonadj_PValue 
plot(1, type="n", xlab="log2 fold change", ylab="-log10 nonadj P", main="genotype J", 
     xlim=c(-10,10), 
     ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, J8vsJ24_Padj > 0.05 | J8vsJ24_Padj == is.na(NA)), 
     points(J8vsJ24_logFC, -log10(J8vsJ24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(J8vsJ24_logFC) >= 1 & J8vsJ24_Padj <= 0.05), 
     points(J8vsJ24_logFC, -log10(J8vsJ24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(J8vsJ24_logFC) < 1 & J8vsJ24_Padj <= 0.05), 
     points(J8vsJ24_logFC, -log10(J8vsJ24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, J8vsJ24_Padj <= 0.05), points(J8vsJ24_logFC, -log10(J8vsJ24_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, J8vsJ24_Padj > 0.05), points(J8vsJ24_logFC, -log10(J8vsJ24_nonadj_PValue), 
                                                     pch=15, cex = 0.75, col="gray30"))

## genotype K [8-24]
subset = subset(all_results_RQ1e2, K8vsK24_Padj>0.05 | K8vsK24_Padj==is.na(NA)) 
sel = subset[with(subset, order(subset$K8vsK24_Padj)),] 
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) 
threshold = sel2$K8vsK24_nonadj_PValue 
plot(1, type="n", xlab="log2 fold change", ylab = NA, main="genotype K", 
     xlim=c(-10,10), 
     ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, K8vsK24_Padj > 0.05 | K8vsK24_Padj == is.na(NA)), 
     points(K8vsK24_logFC, -log10(K8vsK24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(K8vsK24_logFC) >= 1 & K8vsK24_Padj <= 0.05), 
     points(K8vsK24_logFC, -log10(K8vsK24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(K8vsK24_logFC) < 1 & K8vsK24_Padj <= 0.05), 
     points(K8vsK24_logFC, -log10(K8vsK24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, K8vsK24_Padj <= 0.05), points(K8vsK24_logFC, -log10(K8vsK24_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, K8vsK24_Padj > 0.05), points(K8vsK24_logFC, -log10(K8vsK24_nonadj_PValue), 
                                                     pch=15, cex = 0.75, col="gray30"))

## genotype P [8-24]
subset = subset(all_results_RQ1e2, P8vsP24_Padj>0.05 | P8vsP24_Padj==is.na(NA)) 
sel = subset[with(subset, order(subset$P8vsP24_Padj)),]
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) 
threshold = sel2$P8vsP24_nonadj_PValue 
plot(1, type="n", xlab="log2 fold change", ylab = NA,  main="genotype P", 
     xlim=c(-10,10), 
     ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, P8vsP24_Padj > 0.05 | P8vsP24_Padj == is.na(NA)), 
     points(P8vsP24_logFC, -log10(P8vsP24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(P8vsP24_logFC) >= 1 & P8vsP24_Padj <= 0.05), 
     points(P8vsP24_logFC, -log10(P8vsP24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(P8vsP24_logFC) < 1 & P8vsP24_Padj <= 0.05), 
     points(P8vsP24_logFC, -log10(P8vsP24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, P8vsP24_Padj <= 0.05), points(P8vsP24_logFC, -log10(P8vsP24_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, P8vsP24_Padj > 0.05), points(P8vsP24_logFC, -log10(P8vsP24_nonadj_PValue), 
                                                     pch=15, cex = 0.75, col="gray30"))
# plot title 
mtext("Volcano plots contrasts 8vs24", side = 3, line = -1.25, outer = TRUE,font = 2)

The 16-24 contrasts:

par(mfrow=c(3,3))

# calculate threshold at which the non adjusted P-values are no longer significant in the FDR
subset = subset(all_results_RQ1e2, avg16vs24_Padj>0.05 | avg16vs24_Padj==is.na(NA)) 
sel = subset[with(subset, order(subset$avg16vs24_Padj)),] 
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) 
threshold = sel2$avg16vs24_nonadj_PValue 

# plot the volcano plot
plot(1, type="n", xlab=NA, ylab="-log10 nonadj P", main="average effect", 
     xlim=c(-10,10),
     ylim=c(0, 45))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, avg16vs24_Padj > 0.05 | avg16vs24_Padj == is.na(NA)), 
     points(avg16vs24_logFC, -log10(avg16vs24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(avg16vs24_logFC) >= 1 & avg16vs24_Padj <= 0.05), 
     points(avg16vs24_logFC, -log10(avg16vs24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(avg16vs24_logFC) < 1 & avg16vs24_Padj <= 0.05), 
     points(avg16vs24_logFC, -log10(avg16vs24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
# core genes
with(subset(core_genes, avg16vs24_Padj <= 0.05), points(avg16vs24_logFC, -log10(avg16vs24_nonadj_PValue), 
                                                        pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, avg16vs24_Padj > 0.05), points(avg16vs24_logFC, -log10(avg16vs24_nonadj_PValue), 
                                                       pch=15, cex = 0.75, col="gray30"))

## genotype A [16-24]
subset = subset(all_results_RQ1e2, A16vsA24_Padj>0.05 | A16vsA24_Padj==is.na(NA)) 
sel = subset[with(subset, order(subset$A16vsA24_Padj)),] 
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) 
threshold = sel2$A16vsA24_nonadj_PValue #
plot(1, type="n", xlab=NA, ylab=NA, main="genotype A", 
     xlim=c(-10,10), 
     ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, A16vsA24_Padj > 0.05 | A16vsA24_Padj == is.na(NA)), 
     points(A16vsA24_logFC, -log10(A16vsA24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(A16vsA24_logFC) >= 1 & A16vsA24_Padj <= 0.05), 
     points(A16vsA24_logFC, -log10(A16vsA24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(A16vsA24_logFC) < 1 & A16vsA24_Padj <= 0.05), 
     points(A16vsA24_logFC, -log10(A16vsA24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, A16vsA24_Padj <= 0.05), points(A16vsA24_logFC, -log10(A16vsA24_nonadj_PValue), 
                                                       pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, A16vsA24_Padj > 0.05), points(A16vsA24_logFC, -log10(A16vsA24_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="gray30"))

## genotype B [16-24]
subset = subset(all_results_RQ1e2, B16vsB24_Padj>0.05 | B16vsB24_Padj==is.na(NA)) 
sel = subset[with(subset, order(subset$B16vsB24_Padj)),] 
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) 
threshold = sel2$B16vsB24_nonadj_PValue 
plot(1, type="n", xlab=NA, ylab=NA, main="genotype B", 
     xlim=c(-10,10), 
     ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, B16vsB24_Padj > 0.05 | B16vsB24_Padj == is.na(NA)), 
     points(B16vsB24_logFC, -log10(B16vsB24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(B16vsB24_logFC) >= 1 & B16vsB24_Padj <= 0.05), 
     points(B16vsB24_logFC, -log10(B16vsB24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(B16vsB24_logFC) < 1 & B16vsB24_Padj <= 0.05), 
     points(B16vsB24_logFC, -log10(B16vsB24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, B16vsB24_Padj <= 0.05), points(B16vsB24_logFC, -log10(B16vsB24_nonadj_PValue), 
                                                       pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, B16vsB24_Padj > 0.05), points(B16vsB24_logFC, -log10(B16vsB24_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="gray30"))

## genotype D [16-24]
subset = subset(all_results_RQ1e2, D16vsD24_Padj>0.05 | D16vsD24_Padj==is.na(NA))
sel = subset[with(subset, order(subset$D16vsD24_Padj)),] 
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) 
threshold = sel2$D16vsD24_nonadj_PValue 
plot(1, type="n", xlab=NA, ylab="-log10 nonadj P", main="genotype D", 
     xlim=c(min(all_results_RQ1e2$D16vsD24_logFC)-1, max(all_results_RQ1e2$D16vsD24_logFC)+1), 
     ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, D16vsD24_Padj > 0.05 | D16vsD24_Padj == is.na(NA)), 
     points(D16vsD24_logFC, -log10(D16vsD24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(D16vsD24_logFC) >= 1 & D16vsD24_Padj <= 0.05), 
     points(D16vsD24_logFC, -log10(D16vsD24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(D16vsD24_logFC) < 1 & D16vsD24_Padj <= 0.05), 
     points(D16vsD24_logFC, -log10(D16vsD24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, D16vsD24_Padj <= 0.05), points(D16vsD24_logFC, -log10(D16vsD24_nonadj_PValue), 
                                                       pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, D16vsD24_Padj > 0.05), points(D16vsD24_logFC, -log10(D16vsD24_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="gray30"))

## genotype F [16-24]
subset = subset(all_results_RQ1e2, F16vsF24_Padj>0.05 | F16vsF24_Padj==is.na(NA)) 
sel = subset[with(subset, order(subset$F16vsF24_Padj)),] 
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) 
threshold = sel2$F16vsF24_nonadj_PValue 
plot(1, type="n", xlab=NA, ylab=NA, main="genotype F", 
     xlim=c(-10,10), 
     ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, F16vsF24_Padj > 0.05 | F16vsF24_Padj == is.na(NA)), 
     points(F16vsF24_logFC, -log10(F16vsF24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(F16vsF24_logFC) >= 1 & F16vsF24_Padj <= 0.05), 
     points(F16vsF24_logFC, -log10(F16vsF24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(F16vsF24_logFC) < 1 & F16vsF24_Padj <= 0.05), 
     points(F16vsF24_logFC, -log10(F16vsF24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, F16vsF24_Padj <= 0.05), points(F16vsF24_logFC, -log10(F16vsF24_nonadj_PValue), 
                                                       pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, F16vsF24_Padj > 0.05), points(F16vsF24_logFC, -log10(F16vsF24_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="gray30"))

## genotype I [16-24]
subset = subset(all_results_RQ1e2, I16vsI24_Padj>0.05 | I16vsI24_Padj==is.na(NA)) 
sel = subset[with(subset, order(subset$I16vsI24_Padj)),] 
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) 
threshold = sel2$I16vsI24_nonadj_PValue 
plot(1, type="n", xlab=NA, ylab=NA, main="genotype I", 
     xlim=c(-10,10), 
     ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, I16vsI24_Padj > 0.05 | I16vsI24_Padj == is.na(NA)), 
     points(I16vsI24_logFC, -log10(I16vsI24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(I16vsI24_logFC) >= 1 & I16vsI24_Padj <= 0.05), 
     points(I16vsI24_logFC, -log10(I16vsI24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(I16vsI24_logFC) < 1 & I16vsI24_Padj <= 0.05), 
     points(I16vsI24_logFC, -log10(I16vsI24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, I16vsI24_Padj <= 0.05), points(I16vsI24_logFC, -log10(I16vsI24_nonadj_PValue), 
                                                       pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, I16vsI24_Padj > 0.05), points(I16vsI24_logFC, -log10(I16vsI24_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="gray30"))

## genotype J [16-24]
subset = subset(all_results_RQ1e2, J16vsJ24_Padj>0.05 | J16vsJ24_Padj==is.na(NA)) 
sel = subset[with(subset, order(subset$J16vsJ24_Padj)),] 
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) 
threshold = sel2$J16vsJ24_nonadj_PValue 
plot(1, type="n", xlab="log2 fold change", ylab="-log10 nonadj P", main="genotype J", 
     xlim=c(-10,10), 
     ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, J16vsJ24_Padj > 0.05 | J16vsJ24_Padj == is.na(NA)), 
     points(J16vsJ24_logFC, -log10(J16vsJ24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(J16vsJ24_logFC) >= 1 & J16vsJ24_Padj <= 0.05), 
     points(J16vsJ24_logFC, -log10(J16vsJ24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(J16vsJ24_logFC) < 1 & J16vsJ24_Padj <= 0.05), 
     points(J16vsJ24_logFC, -log10(J16vsJ24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, J16vsJ24_Padj <= 0.05), points(J16vsJ24_logFC, -log10(J16vsJ24_nonadj_PValue), 
                                                       pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, J16vsJ24_Padj > 0.05), points(J16vsJ24_logFC, -log10(J16vsJ24_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="gray30"))

## genotype K [16-24]
subset = subset(all_results_RQ1e2, K16vsK24_Padj>0.05 | K16vsK24_Padj==is.na(NA)) 
sel = subset[with(subset, order(subset$K16vsK24_Padj)),] 
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) 
threshold = sel2$K16vsK24_nonadj_PValue 
plot(1, type="n", xlab="log2 fold change", ylab=NA, main="genotype K", 
     xlim=c(-10,10), 
     ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, K16vsK24_Padj > 0.05 | K16vsK24_Padj == is.na(NA)), 
     points(K16vsK24_logFC, -log10(K16vsK24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(K16vsK24_logFC) >= 1 & K16vsK24_Padj <= 0.05), 
     points(K16vsK24_logFC, -log10(K16vsK24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(K16vsK24_logFC) < 1 & K16vsK24_Padj <= 0.05), 
     points(K16vsK24_logFC, -log10(K16vsK24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, K16vsK24_Padj <= 0.05), points(K16vsK24_logFC, -log10(K16vsK24_nonadj_PValue), 
                                                       pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, K16vsK24_Padj > 0.05), points(K16vsK24_logFC, -log10(K16vsK24_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="gray30"))

##genotype P [16-24]
subset = subset(all_results_RQ1e2, P16vsP24_Padj>0.05 | P16vsP24_Padj==is.na(NA)) 
sel = subset[with(subset, order(subset$P16vsP24_Padj)),] 
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) 
threshold = sel2$P16vsP24_nonadj_PValue 
plot(1, type="n", xlab="log2 fold change", ylab=NA, main="genotype P", 
     xlim=c(-10,10), 
     ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, P16vsP24_Padj > 0.05 | P16vsP24_Padj == is.na(NA)), 
     points(P16vsP24_logFC, -log10(P16vsP24_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(P16vsP24_logFC) >= 1 & P16vsP24_Padj <= 0.05), 
     points(P16vsP24_logFC, -log10(P16vsP24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(P16vsP24_logFC) < 1 & P16vsP24_Padj <= 0.05), 
     points(P16vsP24_logFC, -log10(P16vsP24_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, P16vsP24_Padj <= 0.05), points(P16vsP24_logFC, -log10(P16vsP24_nonadj_PValue), 
                                                       pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, P16vsP24_Padj > 0.05), points(P16vsP24_logFC, -log10(P16vsP24_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="gray30"))
# plot title 
mtext("Volcano plots contrasts 16vs24", side = 3, line = -1.25, outer = TRUE,font = 2)

And the 8-16 contrasts:

par(mfrow=c(3,3))

# calculate threshold at which the non adjusted P-values are no longer significant in the FDR
subset = subset(all_results_RQ1e2, avg8vs16_Padj>0.05 | avg8vs16_Padj==is.na(NA)) # select all genes that are not significant for the FDR
sel = subset[with(subset, order(subset$avg8vs16_Padj)),] # sort Padj from small to large
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) # select the name of the first row = smallest non significant non adjusted P-value
threshold = sel2$avg8vs16_nonadj_PValue # select the non adjusted P-value
# plot the volcano plot
plot(1, type="n", xlab=NA, ylab="-log10 nonadj P", main="average effect", 
     xlim=c(-10,10),
     ylim=c(0, 45))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, avg8vs16_Padj > 0.05 | avg8vs16_Padj == is.na(NA)), 
     points(avg8vs16_logFC, -log10(avg8vs16_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(avg8vs16_logFC) >= 1 & avg8vs16_Padj <= 0.05), 
     points(avg8vs16_logFC, -log10(avg8vs16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(avg8vs16_logFC) < 1 & avg8vs16_Padj <= 0.05), 
     points(avg8vs16_logFC, -log10(avg8vs16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
# core genes
with(subset(core_genes, avg8vs16_Padj <= 0.05), points(avg8vs16_logFC, -log10(avg8vs16_nonadj_PValue), 
                                                       pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, avg8vs16_Padj > 0.05), points(avg8vs16_logFC, -log10(avg8vs16_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="gray30"))

## genotype A [8-16]
subset = subset(all_results_RQ1e2, A8vsA16_Padj>0.05 | A8vsA16_Padj==is.na(NA)) 
sel = subset[with(subset, order(subset$A8vsA16_Padj)),] 
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) 
threshold = sel2$A8vsA16_nonadj_PValue 
plot(1, type="n", xlab=NA, ylab=NA, main="genotype A", 
     xlim=c(-10,10), 
     ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, A8vsA16_Padj > 0.05 | A8vsA16_Padj == is.na(NA)), 
     points(A8vsA16_logFC, -log10(A8vsA16_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(A8vsA16_logFC) >= 1 & A8vsA16_Padj <= 0.05), 
     points(A8vsA16_logFC, -log10(A8vsA16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(A8vsA16_logFC) < 1 & A8vsA16_Padj <= 0.05), 
     points(A8vsA16_logFC, -log10(A8vsA16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, A8vsA16_Padj <= 0.05), points(A8vsA16_logFC, -log10(A8vsA16_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, A8vsA16_Padj > 0.05), points(A8vsA16_logFC, -log10(A8vsA16_nonadj_PValue), 
                                                     pch=15, cex = 0.75, col="gray30"))

## genotype B [8-16]
subset = subset(all_results_RQ1e2, B8vsB16_Padj>0.05 | B8vsB16_Padj==is.na(NA)) 
sel = subset[with(subset, order(subset$B8vsB16_Padj)),] 
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) 
threshold = sel2$B8vsB16_nonadj_PValue 
plot(1, type="n", xlab=NA, ylab=NA, main="genotype B", 
     xlim=c(-10,10), 
     ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, B8vsB16_Padj > 0.05 | B8vsB16_Padj == is.na(NA)), 
     points(B8vsB16_logFC, -log10(B8vsB16_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(B8vsB16_logFC) >= 1 & B8vsB16_Padj <= 0.05), 
     points(B8vsB16_logFC, -log10(B8vsB16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(B8vsB16_logFC) < 1 & B8vsB16_Padj <= 0.05), 
     points(B8vsB16_logFC, -log10(B8vsB16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, B8vsB16_Padj <= 0.05), points(B8vsB16_logFC, -log10(B8vsB16_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, B8vsB16_Padj > 0.05), points(B8vsB16_logFC, -log10(B8vsB16_nonadj_PValue), 
                                                     pch=15, cex = 0.75, col="gray30"))

## genotype D [8-16]
subset = subset(all_results_RQ1e2, D8vsD16_Padj>0.05 | D8vsD16_Padj==is.na(NA)) 
sel = subset[with(subset, order(subset$D8vsD16_Padj)),] 
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) 
threshold = sel2$D8vsD16_nonadj_PValue 
plot(1, type="n", xlab=NA, ylab="-log10 nonadj P", main="genotype D", 
     xlim=c(-10,10), 
     ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, D8vsD16_Padj > 0.05 | D8vsD16_Padj == is.na(NA)), 
     points(D8vsD16_logFC, -log10(D8vsD16_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(D8vsD16_logFC) >= 1 & D8vsD16_Padj <= 0.05), 
     points(D8vsD16_logFC, -log10(D8vsD16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(D8vsD16_logFC) < 1 & D8vsD16_Padj <= 0.05), 
     points(D8vsD16_logFC, -log10(D8vsD16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, D8vsD16_Padj <= 0.05), points(D8vsD16_logFC, -log10(D8vsD16_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, D8vsD16_Padj > 0.05), points(D8vsD16_logFC, -log10(D8vsD16_nonadj_PValue), 
                                                     pch=15, cex = 0.75, col="gray30"))

## genotype F [8-16]
subset = subset(all_results_RQ1e2, F8vsF16_Padj>0.05 | F8vsF16_Padj==is.na(NA)) 
sel = subset[with(subset, order(subset$F8vsF16_Padj)),] 
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) 
threshold = sel2$F8vsF16_nonadj_PValue 
plot(1, type="n", xlab=NA, ylab=NA, main="genotype F", 
     xlim=c(-10,10), 
     ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, F8vsF16_Padj > 0.05 | F8vsF16_Padj == is.na(NA)), 
     points(F8vsF16_logFC, -log10(F8vsF16_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(F8vsF16_logFC) >= 1 & F8vsF16_Padj <= 0.05), 
     points(F8vsF16_logFC, -log10(F8vsF16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(F8vsF16_logFC) < 1 & F8vsF16_Padj <= 0.05), 
     points(F8vsF16_logFC, -log10(F8vsF16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, F8vsF16_Padj <= 0.05), points(F8vsF16_logFC, -log10(F8vsF16_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, F8vsF16_Padj > 0.05), points(F8vsF16_logFC, -log10(F8vsF16_nonadj_PValue), 
                                                     pch=15, cex = 0.75, col="gray30"))

## genotype I [8-16]
ubset = subset(all_results_RQ1e2, I8vsI16_Padj>0.05 | I8vsI16_Padj==is.na(NA)) 
sel = subset[with(subset, order(subset$I8vsI16_Padj)),] 
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) 
threshold = sel2$I8vsI16_nonadj_PValue
plot(1, type="n", xlab=NA, ylab=NA, main="genotype I", 
     xlim=c(-10,10), 
     ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, I8vsI16_Padj > 0.05 | I8vsI16_Padj == is.na(NA)), 
     points(I8vsI16_logFC, -log10(I8vsI16_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(I8vsI16_logFC) >= 1 & I8vsI16_Padj <= 0.05), 
     points(I8vsI16_logFC, -log10(I8vsI16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(I8vsI16_logFC) < 1 & I8vsI16_Padj <= 0.05), 
     points(I8vsI16_logFC, -log10(I8vsI16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, I8vsI16_Padj <= 0.05), points(I8vsI16_logFC, -log10(I8vsI16_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, I8vsI16_Padj > 0.05), points(I8vsI16_logFC, -log10(I8vsI16_nonadj_PValue), 
                                                     pch=15, cex = 0.75, col="gray30"))

## genotype J [8-16]
subset = subset(all_results_RQ1e2, J8vsJ16_Padj>0.05 | J8vsJ16_Padj==is.na(NA)) 
sel = subset[with(subset, order(subset$J8vsJ16_Padj)),] 
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) 
threshold = sel2$J8vsJ16_nonadj_PValue 
plot(1, type="n", xlab="log2 fold change", ylab="-log10 nonadj P", main="genotype J", 
     xlim=c(-10,10), 
     ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, J8vsJ16_Padj > 0.05 | J8vsJ16_Padj == is.na(NA)), 
     points(J8vsJ16_logFC, -log10(J8vsJ16_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(J8vsJ16_logFC) >= 1 & J8vsJ16_Padj <= 0.05), 
     points(J8vsJ16_logFC, -log10(J8vsJ16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(J8vsJ16_logFC) < 1 & J8vsJ16_Padj <= 0.05), 
     points(J8vsJ16_logFC, -log10(J8vsJ16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, J8vsJ16_Padj <= 0.05), points(J8vsJ16_logFC, -log10(J8vsJ16_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, J8vsJ16_Padj > 0.05), points(J8vsJ16_logFC, -log10(J8vsJ16_nonadj_PValue), 
                                                     pch=15, cex = 0.75, col="gray30"))

## genotype K [8-16]
subset = subset(all_results_RQ1e2, K8vsK16_Padj>0.05 | K8vsK16_Padj==is.na(NA)) 
sel = subset[with(subset, order(subset$K8vsK16_Padj)),] 
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) 
threshold = sel2$K8vsK16_nonadj_PValue 
plot(1, type="n", xlab="log2 fold change", ylab=NA, main="genotype K", 
     xlim=c(-10,10), 
     ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, K8vsK16_Padj > 0.05 | K8vsK16_Padj == is.na(NA)), 
     points(K8vsK16_logFC, -log10(K8vsK16_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(K8vsK16_logFC) >= 1 & K8vsK16_Padj <= 0.05), 
     points(K8vsK16_logFC, -log10(K8vsK16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(K8vsK16_logFC) < 1 & K8vsK16_Padj <= 0.05), 
     points(K8vsK16_logFC, -log10(K8vsK16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, K8vsK16_Padj <= 0.05), points(K8vsK16_logFC, -log10(K8vsK16_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, K8vsK16_Padj > 0.05), points(K8vsK16_logFC, -log10(K8vsK16_nonadj_PValue), 
                                                     pch=15, cex = 0.75, col="gray30"))

## genotype P [8-16]
subset = subset(all_results_RQ1e2, P8vsP16_Padj>0.05 | P8vsP16_Padj==is.na(NA)) 
sel = subset[with(subset, order(subset$P8vsP16_Padj)),] 
sel2 = as.data.frame(all_results_RQ1e2[rownames(head(sel,1)),]) 
threshold = sel2$P8vsP16_nonadj_PValue
plot(1, type="n", xlab="log2 fold change", ylab=NA, main="genotype P", 
     xlim=c(-10,10), 
     ylim=c(0, 25))
abline(h = -log10(threshold), col = "steelblue2", lty = 2, lwd = 1)
abline(v = c(-1,1), col = "steelblue2", lty = 2, lwd = 1)
with(subset(all_results_RQ1e2, P8vsP16_Padj > 0.05 | P8vsP16_Padj == is.na(NA)), 
     points(P8vsP16_logFC, -log10(P8vsP16_nonadj_PValue), pch=20, cex = 0.75, col="gray80"))
with(subset(all_results_RQ1e2, abs(P8vsP16_logFC) >= 1 & P8vsP16_Padj <= 0.05), 
     points(P8vsP16_logFC, -log10(P8vsP16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue2"))
with(subset(all_results_RQ1e2, abs(P8vsP16_logFC) < 1 & P8vsP16_Padj <= 0.05), 
     points(P8vsP16_logFC, -log10(P8vsP16_nonadj_PValue), pch=20, cex = 0.75, col="steelblue4"))
with(subset(core_genes, P8vsP16_Padj <= 0.05), points(P8vsP16_logFC, -log10(P8vsP16_nonadj_PValue), 
                                                      pch=15, cex = 0.75, col="tomato3"))
with(subset(core_genes, P8vsP16_Padj > 0.05), points(P8vsP16_logFC, -log10(P8vsP16_nonadj_PValue), 
                                                     pch=15, cex = 0.75, col="gray30"))
# plot title 
mtext("Volcano plots contrasts 8vs16", side = 3, line = -1.25, outer = TRUE,font = 2)

3.3.7 Average expression in function of salinity

Next, we subdivided DE genes by means of their expression patterns in function of salinity.

We started with the average response:

# extract expression levels of significant genes
expression_RQ1e2 = fit_group_model$fitted.values 

# calculate the average expression per gene for each salinity treatment over all genotypes
mean_16ppt_RQ1e2 = as.matrix(apply(expression_RQ1e2[,c(1:3,10:12,19:21,28:30,37:39,46:48,55:57,64:66)], 1, mean))
mean_24ppt_RQ1e2 = as.matrix(apply(expression_RQ1e2[,c(4:6,13:15,22:24,31:33,40:42,49:51,58:60,67:69)], 1, mean))
mean_8ppt_RQ1e2 = as.matrix(apply(expression_RQ1e2[,c(7:9,16:18,25:27,34:36,43:45,52:54,61:63,70:72)], 1, mean))

# add column names to the matrices with the average values
colnames(mean_16ppt_RQ1e2) = c("16ppt")
colnames(mean_24ppt_RQ1e2) = c("24ppt")
colnames(mean_8ppt_RQ1e2) = c("8ppt")

# merge the data frames & take the log value of the expression
mean_expression_RQ1e2_log = as.data.frame(log(cbind(mean_24ppt_RQ1e2, mean_16ppt_RQ1e2, mean_8ppt_RQ1e2)))

# extract expression levels of subsets of genes
## take only genes significant in all contrasts
OnlySignGenes_RQ1e2_ConStage_average_names_allcontrasts = c(rownames(subset(
 OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`avg8-16` == 1 & 
 OnlySignGenes_RQ1e2_ConStage$`avg8-24` == 1 & OnlySignGenes_RQ1e2_ConStage$`avg16-24` == 1))) 

mean_expression_RQ1e2_allcontrasts_log = subset(mean_expression_RQ1e2_log,
 rownames(mean_expression_RQ1e2_log)%in%OnlySignGenes_RQ1e2_ConStage_average_names_allcontrasts) 

## take only genes significant in 16-24 and 8-24
OnlySignGenes_RQ1e2_ConStage_average_names_16vs24_8vs24 = c(rownames(subset(
  OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`avg8-16` == 0 & 
  OnlySignGenes_RQ1e2_ConStage$`avg8-24` == 1 & OnlySignGenes_RQ1e2_ConStage$`avg16-24` == 1))) 

mean_expression_RQ1e2_16vs24_8vs24_log = subset(mean_expression_RQ1e2_log,
  rownames(mean_expression_RQ1e2_log)%in%OnlySignGenes_RQ1e2_ConStage_average_names_16vs24_8vs24) 

## take only genes significant in 16-24 and 8-16
OnlySignGenes_RQ1e2_ConStage_average_names_16vs24_8vs16 = c(rownames(subset(
  OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`avg8-16` == 1 & 
  OnlySignGenes_RQ1e2_ConStage$`avg8-24` == 0 & OnlySignGenes_RQ1e2_ConStage$`avg16-24` == 1))) 

mean_expression_RQ1e2_16vs24_8vs16_log = subset(mean_expression_RQ1e2_log, 
  rownames(mean_expression_RQ1e2_log)%in%OnlySignGenes_RQ1e2_ConStage_average_names_16vs24_8vs16) 

## take only genes significant in 8-24 and 8-16
OnlySignGenes_RQ1e2_ConStage_average_names_8vs24_8vs16 = c(rownames(subset(
  OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`avg8-16` == 1 & 
  OnlySignGenes_RQ1e2_ConStage$`avg8-24` == 1 & OnlySignGenes_RQ1e2_ConStage$`avg16-24` == 0))) 

mean_expression_RQ1e2_8vs24_8vs16_log = subset(mean_expression_RQ1e2_log, 
  rownames(mean_expression_RQ1e2_log)%in%OnlySignGenes_RQ1e2_ConStage_average_names_8vs24_8vs16) 

## take only genes significant in 16-24
OnlySignGenes_RQ1e2_ConStage_average_names_16vs24 = c(rownames(subset(
  OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`avg8-16` == 0 & 
  OnlySignGenes_RQ1e2_ConStage$`avg8-24` == 0 & OnlySignGenes_RQ1e2_ConStage$`avg16-24` == 1))) 

mean_expression_RQ1e2_16vs24_log = subset(mean_expression_RQ1e2_log, 
  rownames(mean_expression_RQ1e2_log)%in%OnlySignGenes_RQ1e2_ConStage_average_names_16vs24) 

## take only genes significant in 8-24
OnlySignGenes_RQ1e2_ConStage_average_names_8vs24 = c(rownames(subset(
  OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`avg8-16` == 0 & 
  OnlySignGenes_RQ1e2_ConStage$`avg8-24` == 1 & OnlySignGenes_RQ1e2_ConStage$`avg16-24` == 0))) 

mean_expression_RQ1e2_8vs24_log = subset(mean_expression_RQ1e2_log, 
  rownames(mean_expression_RQ1e2_log)%in%OnlySignGenes_RQ1e2_ConStage_average_names_8vs24) 

## take only genes significant in 8-16
OnlySignGenes_RQ1e2_ConStage_average_names_8vs16 = c(rownames(subset(
  OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`avg8-16` == 1 & 
  OnlySignGenes_RQ1e2_ConStage$`avg8-24` == 0 & OnlySignGenes_RQ1e2_ConStage$`avg16-24` == 0)))

mean_expression_RQ1e2_8vs16_log = subset(mean_expression_RQ1e2_log,
  rownames(mean_expression_RQ1e2_log)%in%OnlySignGenes_RQ1e2_ConStage_average_names_8vs16)     

# select genes with specific patterns of up- and downregulation
## significant only in one contrast
RQ1e2_16g8 = rownames(subset(mean_expression_RQ1e2_8vs16_log, 
  mean_expression_RQ1e2_8vs16_log$`16ppt` > mean_expression_RQ1e2_8vs16_log$`8ppt`))
RQ1e2_16s8 = rownames(subset(mean_expression_RQ1e2_8vs16_log, 
  mean_expression_RQ1e2_8vs16_log$`16ppt` < mean_expression_RQ1e2_8vs16_log$`8ppt`))
RQ1e2_24g8 = rownames(subset(mean_expression_RQ1e2_8vs24_log,
  mean_expression_RQ1e2_8vs24_log$`24ppt` > mean_expression_RQ1e2_8vs24_log$`8ppt`))
RQ1e2_24s8 = rownames(subset(mean_expression_RQ1e2_8vs24_log,
  mean_expression_RQ1e2_8vs24_log$`24ppt` < mean_expression_RQ1e2_8vs24_log$`8ppt`))
RQ1e2_24g16 = rownames(subset(mean_expression_RQ1e2_16vs24_log,
  mean_expression_RQ1e2_16vs24_log$`24ppt` > mean_expression_RQ1e2_16vs24_log$`16ppt`))
RQ1e2_24s16 = rownames(subset(mean_expression_RQ1e2_16vs24_log,
  mean_expression_RQ1e2_16vs24_log$`24ppt` < mean_expression_RQ1e2_16vs24_log$`16ppt`))

## significant only in two contrasts
RQ1e2_24g16_24g8 = rownames(subset(mean_expression_RQ1e2_16vs24_8vs24_log,
  mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` > mean_expression_RQ1e2_16vs24_8vs24_log$`8ppt` &
  mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` > mean_expression_RQ1e2_16vs24_8vs24_log$`16ppt`))
RQ1e2_24s16_24s8 = rownames(subset(mean_expression_RQ1e2_16vs24_8vs24_log, 
  mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` < mean_expression_RQ1e2_16vs24_8vs24_log$`8ppt` &
  mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` < mean_expression_RQ1e2_16vs24_8vs24_log$`16ppt`))
RQ1e2_24g16_24s8 = rownames(subset(mean_expression_RQ1e2_16vs24_8vs24_log, 
  mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` < mean_expression_RQ1e2_16vs24_8vs24_log$`8ppt` &
  mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` > mean_expression_RQ1e2_16vs24_8vs24_log$`16ppt`))
RQ1e2_24s16_24g8 = rownames(subset(mean_expression_RQ1e2_16vs24_8vs24_log, 
  mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` > mean_expression_RQ1e2_16vs24_8vs24_log$`8ppt` &
  mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` < mean_expression_RQ1e2_16vs24_8vs24_log$`16ppt`))
RQ1e2_24g8_16g8 = rownames(subset(mean_expression_RQ1e2_8vs24_8vs16_log, 
  mean_expression_RQ1e2_8vs24_8vs16_log$`24ppt` > mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt` &
  mean_expression_RQ1e2_8vs24_8vs16_log$`16ppt` > mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt`))
RQ1e2_24s8_16s8 = rownames(subset(mean_expression_RQ1e2_8vs24_8vs16_log, 
  mean_expression_RQ1e2_8vs24_8vs16_log$`24ppt` < mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt` &
  mean_expression_RQ1e2_8vs24_8vs16_log$`16ppt` < mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt`))
RQ1e2_24g8_16s8 = rownames(subset(mean_expression_RQ1e2_8vs24_8vs16_log, 
  mean_expression_RQ1e2_8vs24_8vs16_log$`24ppt` > mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt` &
  mean_expression_RQ1e2_8vs24_8vs16_log$`16ppt` < mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt`))
RQ1e2_24s8_16g8 = rownames(subset(mean_expression_RQ1e2_8vs24_8vs16_log,
  mean_expression_RQ1e2_8vs24_8vs16_log$`24ppt` < mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt` &
  mean_expression_RQ1e2_8vs24_8vs16_log$`16ppt` > mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt`))
RQ1e2_24g16_16g8 = rownames(subset(mean_expression_RQ1e2_16vs24_8vs16_log, 
  mean_expression_RQ1e2_16vs24_8vs16_log$`24ppt` > mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` &
  mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` > mean_expression_RQ1e2_16vs24_8vs16_log$`8ppt`))
RQ1e2_24s16_16s8 = rownames(subset(mean_expression_RQ1e2_16vs24_8vs16_log,  
  mean_expression_RQ1e2_16vs24_8vs16_log$`24ppt` < mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` &
  mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` < mean_expression_RQ1e2_16vs24_8vs16_log$`8ppt`))
RQ1e2_24g16_16s8 = rownames(subset(mean_expression_RQ1e2_16vs24_8vs16_log, 
  mean_expression_RQ1e2_16vs24_8vs16_log$`24ppt` > mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` &
  mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` < mean_expression_RQ1e2_16vs24_8vs16_log$`8ppt`))
RQ1e2_24s16_16g8 = rownames(subset( mean_expression_RQ1e2_16vs24_8vs16_log, 
  mean_expression_RQ1e2_16vs24_8vs16_log$`24ppt` < mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` &
  mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` > mean_expression_RQ1e2_16vs24_8vs16_log$`8ppt`))

## significant only in three contrasts
RQ1e2_24g8_24g16_16g8 = rownames(subset(mean_expression_RQ1e2_allcontrasts_log,
  mean_expression_RQ1e2_allcontrasts_log$`24ppt` > mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
  mean_expression_RQ1e2_allcontrasts_log$`24ppt` > mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
  mean_expression_RQ1e2_allcontrasts_log$`16ppt` > mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_24s8_24s16_16s8 = rownames(subset(mean_expression_RQ1e2_allcontrasts_log,
  mean_expression_RQ1e2_allcontrasts_log$`24ppt` < mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
  mean_expression_RQ1e2_allcontrasts_log$`24ppt` < mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
  mean_expression_RQ1e2_allcontrasts_log$`16ppt` < mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_24g8_24s16_16g8 = rownames(subset(mean_expression_RQ1e2_allcontrasts_log,
  mean_expression_RQ1e2_allcontrasts_log$`24ppt` > mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
  mean_expression_RQ1e2_allcontrasts_log$`24ppt` < mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
  mean_expression_RQ1e2_allcontrasts_log$`16ppt` > mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_24g8_24s16_16s8 = rownames(subset(mean_expression_RQ1e2_allcontrasts_log,
  mean_expression_RQ1e2_allcontrasts_log$`24ppt` > mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
  mean_expression_RQ1e2_allcontrasts_log$`24ppt` < mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
  mean_expression_RQ1e2_allcontrasts_log$`16ppt` < mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_24g8_24g16_16s8 = rownames(subset(mean_expression_RQ1e2_allcontrasts_log,
  mean_expression_RQ1e2_allcontrasts_log$`24ppt` > mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
  mean_expression_RQ1e2_allcontrasts_log$`24ppt` > mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
  mean_expression_RQ1e2_allcontrasts_log$`16ppt` < mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_24s8_24g16_16s8 = rownames(subset(mean_expression_RQ1e2_allcontrasts_log,
  mean_expression_RQ1e2_allcontrasts_log$`24ppt` < mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
  mean_expression_RQ1e2_allcontrasts_log$`24ppt` > mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
  mean_expression_RQ1e2_allcontrasts_log$`16ppt` < mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_24s8_24s16_16g8 = rownames(subset(mean_expression_RQ1e2_allcontrasts_log,
  mean_expression_RQ1e2_allcontrasts_log$`24ppt` < mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
  mean_expression_RQ1e2_allcontrasts_log$`24ppt` < mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
  mean_expression_RQ1e2_allcontrasts_log$`16ppt` > mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_24s8_24g16_16g8 = rownames(subset(mean_expression_RQ1e2_allcontrasts_log,
  mean_expression_RQ1e2_allcontrasts_log$`24ppt` < mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
  mean_expression_RQ1e2_allcontrasts_log$`24ppt` > mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
  mean_expression_RQ1e2_allcontrasts_log$`16ppt` > mean_expression_RQ1e2_allcontrasts_log$`8ppt`))

# take subsets of relative expression values
RQ1e2_16g8_exp = subset(mean_expression_RQ1e2_8vs16_log, 
                        rownames(mean_expression_RQ1e2_8vs16_log)%in%RQ1e2_16g8)    
RQ1e2_16s8_exp = subset(mean_expression_RQ1e2_8vs16_log, 
                        rownames(mean_expression_RQ1e2_8vs16_log)%in%RQ1e2_16s8)   
RQ1e2_24g8_exp = subset(mean_expression_RQ1e2_8vs24_log, 
                        rownames(mean_expression_RQ1e2_8vs24_log)%in%RQ1e2_24g8)    
RQ1e2_24s8_exp = subset(mean_expression_RQ1e2_8vs24_log, 
                        rownames(mean_expression_RQ1e2_8vs24_log)%in%RQ1e2_24s8)   
RQ1e2_24g16_exp = subset(mean_expression_RQ1e2_16vs24_log, 
                         rownames(mean_expression_RQ1e2_16vs24_log)%in%RQ1e2_24g16)    
RQ1e2_24s16_exp = subset(mean_expression_RQ1e2_16vs24_log, 
                         rownames(mean_expression_RQ1e2_16vs24_log)%in%RQ1e2_24s16)
RQ1e2_24g16_24g8_exp = subset(mean_expression_RQ1e2_16vs24_8vs24_log, 
                              rownames(mean_expression_RQ1e2_16vs24_8vs24_log)%in%RQ1e2_24g16_24g8)
RQ1e2_24g16_24s8_exp = subset(mean_expression_RQ1e2_16vs24_8vs24_log, 
                              rownames(mean_expression_RQ1e2_16vs24_8vs24_log)%in%RQ1e2_24g16_24s8) 
RQ1e2_24s16_24g8_exp = subset(mean_expression_RQ1e2_16vs24_8vs24_log, 
                              rownames(mean_expression_RQ1e2_16vs24_8vs24_log)%in%RQ1e2_24s16_24g8)
RQ1e2_24s16_24s8_exp = subset(mean_expression_RQ1e2_16vs24_8vs24_log, 
                              rownames(mean_expression_RQ1e2_16vs24_8vs24_log)%in%RQ1e2_24s16_24s8)
RQ1e2_24g8_16g8_exp = subset(mean_expression_RQ1e2_8vs24_8vs16_log, 
                             rownames(mean_expression_RQ1e2_8vs24_8vs16_log)%in%RQ1e2_24g8_16g8)  
RQ1e2_24s8_16s8_exp = subset(mean_expression_RQ1e2_8vs24_8vs16_log, 
                             rownames(mean_expression_RQ1e2_8vs24_8vs16_log)%in%RQ1e2_24s8_16s8)  
RQ1e2_24g8_16s8_exp = subset(mean_expression_RQ1e2_8vs24_8vs16_log, 
                             rownames(mean_expression_RQ1e2_8vs24_8vs16_log)%in%RQ1e2_24g8_16s8)  
RQ1e2_24s8_16g8_exp = subset(mean_expression_RQ1e2_8vs24_8vs16_log, 
                             rownames(mean_expression_RQ1e2_8vs24_8vs16_log)%in%RQ1e2_24s8_16g8)  
RQ1e2_24g16_16g8_exp = subset(mean_expression_RQ1e2_16vs24_8vs16_log, 
                              rownames(mean_expression_RQ1e2_16vs24_8vs16_log)%in%RQ1e2_24g16_16g8)
RQ1e2_24s16_16s8_exp = subset(mean_expression_RQ1e2_16vs24_8vs16_log, 
                              rownames(mean_expression_RQ1e2_16vs24_8vs16_log)%in%RQ1e2_24s16_16s8)
RQ1e2_24g16_16s8_exp = subset(mean_expression_RQ1e2_16vs24_8vs16_log, 
                              rownames(mean_expression_RQ1e2_16vs24_8vs16_log)%in%RQ1e2_24g16_16s8)
RQ1e2_24s16_16g8_exp = subset(mean_expression_RQ1e2_16vs24_8vs16_log, 
                              rownames(mean_expression_RQ1e2_16vs24_8vs16_log)%in%RQ1e2_24s16_16g8)
RQ1e2_24g8_24g16_16g8_exp = subset(mean_expression_RQ1e2_allcontrasts_log, 
                                   rownames(mean_expression_RQ1e2_allcontrasts_log)%in%RQ1e2_24g8_24g16_16g8) 
RQ1e2_24s8_24s16_16s8_exp = subset(mean_expression_RQ1e2_allcontrasts_log, 
                                   rownames(mean_expression_RQ1e2_allcontrasts_log)%in%RQ1e2_24s8_24s16_16s8)  
RQ1e2_24g8_24s16_16g8_exp = subset(mean_expression_RQ1e2_allcontrasts_log, 
                                   rownames(mean_expression_RQ1e2_allcontrasts_log)%in%RQ1e2_24g8_24s16_16g8)   
RQ1e2_24g8_24s16_16s8_exp = subset(mean_expression_RQ1e2_allcontrasts_log, 
                                   rownames(mean_expression_RQ1e2_allcontrasts_log)%in%RQ1e2_24g8_24s16_16s8)  
RQ1e2_24g8_24g16_16s8_exp = subset(mean_expression_RQ1e2_allcontrasts_log, 
                                   rownames(mean_expression_RQ1e2_allcontrasts_log)%in%RQ1e2_24g8_24g16_16s8)  
RQ1e2_24s8_24g16_16s8_exp = subset(mean_expression_RQ1e2_allcontrasts_log, 
                                   rownames(mean_expression_RQ1e2_allcontrasts_log)%in%RQ1e2_24s8_24g16_16s8)  
RQ1e2_24s8_24s16_16g8_exp = subset(mean_expression_RQ1e2_allcontrasts_log, 
                                   rownames(mean_expression_RQ1e2_allcontrasts_log)%in%RQ1e2_24s8_24s16_16g8)  
RQ1e2_24s8_24g16_16g8_exp = subset(mean_expression_RQ1e2_allcontrasts_log, 
                                   rownames(mean_expression_RQ1e2_allcontrasts_log)%in%RQ1e2_24s8_24g16_16g8)  

# core response genes
## 24>8, 24>16 & 16>8 
RQ1e2_allcontrasts_core1 = subset(RQ1e2_24g8_24g16_16g8_exp, rownames(RQ1e2_24g8_24g16_16g8_exp)%in%RQ1e2_CoreResponse)      
## 24<8, 24<16 & 16<8 
RQ1e2_allcontrasts_core2 = subset(RQ1e2_24s8_24s16_16s8_exp, rownames(RQ1e2_24s8_24s16_16s8_exp)%in%RQ1e2_CoreResponse)          
## 24>8, 24<16 & 16>8 
RQ1e2_allcontrasts_core3 = subset(RQ1e2_24g8_24s16_16g8_exp, rownames(RQ1e2_24g8_24s16_16g8_exp)%in%RQ1e2_CoreResponse)          

# combine all the clusters into a list
RQ1e2_cluster_list = list(RQ1e2_16g8,RQ1e2_16s8,RQ1e2_24g8,RQ1e2_24s8,
                          RQ1e2_24g16,RQ1e2_24s16,RQ1e2_24g16_24g8,
                          RQ1e2_24s16_24s8,RQ1e2_24g16_24s8,RQ1e2_24s16_24g8,
                          RQ1e2_24g8_16g8,RQ1e2_24s8_16s8,RQ1e2_24g8_16s8,
                          RQ1e2_24s8_16g8,RQ1e2_24g16_16g8,RQ1e2_24s16_16s8,
                          RQ1e2_24g16_16s8,RQ1e2_24s16_16g8,RQ1e2_24g8_24g16_16g8,
                          RQ1e2_24s8_24s16_16s8,RQ1e2_24g8_24s16_16g8,
                          RQ1e2_24g8_24s16_16s8,RQ1e2_24g8_24g16_16s8,
                          RQ1e2_24s8_24g16_16s8,RQ1e2_24s8_24s16_16g8,
                          RQ1e2_24s8_24g16_16g8)

names(RQ1e2_cluster_list) = c("RQ1e2 avg: 16>8","RQ1e2 avg: 16<8","RQ1e2 avg: 24>8",
                              "RQ1e2 avg: 24<8","RQ1e2 avg: 24>16","RQ1e2 avg: 24<16",
                              "RQ1e2 avg: 24>16 and 24>8","RQ1e2 avg: 24<16 and 24<8",
                              "RQ1e2 avg: 24>16 and 24<8","RQ1e2 avg: 24<16 and 24>8",
                              "RQ1e2 avg: 24>8 and 16>8","RQ1e2 avg: 24<8 and 16<8",
                              "RQ1e2 avg: 24>8 and 16<8","RQ1e2 avg: 24<8 and 16>8", 
                              "RQ1e2 avg: 24>16 and 16>8","RQ1e2 avg: 24<16 and 16<8",
                              "RQ1e2 avg: 24>16 and 16<8","RQ1e2 avg: 24<16 and 16>8",
                              "RQ1e2 avg: 24>8 and 24>16 and 16>8","RQ1e2 avg: 24<8 and 24<16 and 16<8",
                              "RQ1e2 avg: 24>8 and 24<16 and 16>8","RQ1e2 avg: 24>8 and 24<16 and 16<8",
                              "RQ1e2 avg: 24>8 and 24>16 and 16<8","RQ1e2 avg: 24<8 and 24>16 and 16<8",
                              "RQ1e2 avg: 24<8 and 24<16 and 16>8","RQ1e2 avg: 24<8 and 24>16 and 16>8")

# create name list (necessary for downstream code)
names = c(RQ1e2_16g8,RQ1e2_16s8,RQ1e2_24g8,RQ1e2_24s8,RQ1e2_24g16,RQ1e2_24s16,
          RQ1e2_24g16_24g8,RQ1e2_24s16_24s8,RQ1e2_24g16_24s8,RQ1e2_24s16_24g8,
          RQ1e2_24g8_16g8,RQ1e2_24s8_16s8,RQ1e2_24g8_16s8,RQ1e2_24s8_16g8,
          RQ1e2_24g16_16g8,RQ1e2_24s16_16s8,RQ1e2_24g16_16s8,RQ1e2_24s16_16g8,
          RQ1e2_24g8_24g16_16g8,RQ1e2_24s8_24s16_16s8,RQ1e2_24g8_24s16_16g8,RQ1e2_24g8_24s16_16s8,
          RQ1e2_24g8_24g16_16s8,RQ1e2_24s8_24g16_16s8,RQ1e2_24s8_24s16_16g8,RQ1e2_24s8_24g16_16g8)

# plot the results
## set figure dimensions
par(mfrow = c(3,7))
## plots (only including sets for which at least one gene was significant)
### 16>8
matplot(t(RQ1e2_16g8_exp),type="l",lty=1,col=1,
        ylab="log average estimated expression",main="16>8",xaxt="n")
### 24>16
matplot(t(RQ1e2_24g16_exp),type="l",lty=1,col=1,
        ylab=NA,main="24>16",xaxt="n")
### 16<8
matplot(t(RQ1e2_16s8_exp),type="l",lty=1,col=1,
        ylab=NA,main="16<8",xaxt="n")
### 24<16
matplot(t(RQ1e2_24s16_exp),type="l",lty=1,col=1,
        ylab=NA,main="24<16",xaxt="n")
### 24<16, 16>8
matplot(t(RQ1e2_24s16_16g8_exp),type="l",lty=1,col=1,
        ylab=NA,main="24<16, 16>8",xaxt="n")
### 24>8, 24<16
matplot(t(RQ1e2_24s16_24g8_exp),type="l",lty=1,col=1,
        ylab=NA,main="24>8, 24<16",xaxt="n")
### 24<8, 16>8
matplot(t(RQ1e2_24s8_16g8_exp),type="l",lty=1,col=1,
        ylab=NA,main="24<8, 16>8",xaxt="n")

### 24>8
matplot(t(RQ1e2_24g8_exp),type="l",lty=1,col=1,
        ylab="log average estimated expression",main="24>8",xaxt="n")
### 24>8, 24>16
matplot(t(RQ1e2_24g16_24g8_exp),type="l",lty=1,col=1,
        ylab=NA,main="24>8, 24>16",xaxt="n")
### 24<8
matplot(t(RQ1e2_24s8_exp),type="l",lty=1,col=1,
        ylab=NA,main="24<8",xaxt="n")
### 24<8, 24<16
matplot(t(RQ1e2_24s16_24s8_exp),type="l",lty=1,col=1,
        ylab=NA,main="24<8, 24<16",xaxt="n")
### 24>8, 24<16 & 16>8
matplot(t(RQ1e2_24g8_24s16_16g8_exp),type="l",lty=1,col=1,
        ylab=NA,main="24>8, 24<16, 16>8",xaxt="n")
matlines(t(RQ1e2_allcontrasts_core3), type = "l", lty = 1, lwd = 1,col="red")
### 24<8, 24<16 & 16>8
matplot(t(RQ1e2_24s8_24s16_16g8_exp),type="l",lty=1,col=1,
        ylab=NA,main="24<8, 24<16, 16>8",xaxt="n")
### 24>16, 16<8
matplot(t(RQ1e2_24g16_16s8_exp),type="l",lty=1,col=1,
        ylab=NA,main="24>16, 16<8",xaxt="n")

### 24>8, 16>8
matplot(t(RQ1e2_24g8_16g8_exp),type="l",lty=1,col=1,
        ylab="log average estimated expression",main="24>8, 16>8",xaxt="n")
axis(1,at=c(1,2,3),labels=c("24ppt","16ppt","8ppt"),cex.axis=1)
### 24>8, 24>16 & 16>8
matplot(t(RQ1e2_24g8_24g16_16g8_exp),type="l",lty=1,col=1,
        ylab=NA,main="24>8, 24>16, 16>8",xaxt="n")
matlines(t(RQ1e2_allcontrasts_core1), type = "l", lty = 1, lwd = 1,col="red")
axis(1,at=c(1,2,3),labels=c("24ppt","16ppt","8ppt"),cex.axis=1)
### 24<8, 16<8
matplot(t(RQ1e2_24s8_16s8_exp),type="l",lty=1,col=1,
        ylab=NA,main="24<8, 16<8",xaxt="n")
axis(1,at=c(1,2,3),labels=c("24ppt","16ppt","8ppt"),cex.axis=1)
### 24<8, 24<16 & 16<8
matplot(t(RQ1e2_24s8_24s16_16s8_exp),type="l",lty=1,col=1,
        ylab=NA,main="24<8, 24<16, 16<8",xaxt="n")
matlines(t(RQ1e2_allcontrasts_core2), type = "l", lty = 1, lwd = 1,col="red")
axis(1,at=c(1,2,3),labels=c("24ppt","16ppt","8ppt"),cex.axis=1)
### 24>8, 16<8
matplot(t(RQ1e2_24g8_16s8_exp),type="l",lty=1,col=1,
        ylab=NA,main="24>8, 16<8",xaxt="n")
axis(1,at=c(1,2,3),labels=c("24ppt","16ppt","8ppt"),cex.axis=1)
### 24>8, 24>16 & 16<8
matplot(t(RQ1e2_24g8_24g16_16s8_exp),type="l",lty=1,col=1,
        ylab=NA,main="24>8, 24>16, 16<8",xaxt="n")
axis(1,at=c(1,2,3),labels=c("24ppt","16ppt","8ppt"),cex.axis=1)
### 24<8, 24>16 & 16<8
matplot(t(RQ1e2_24s8_24g16_16s8_exp),type="l",lty=1,col=1,
        ylab=NA,main="24<8, 24>16, 16<8",xaxt="n")
axis(1,at=c(1,2,3),labels=c("24ppt","16ppt","8ppt"),cex.axis=1)

In above figure, core response genes were indicated in red.

Next, we did the same thing for all genotypes separately, but without plotting the data.

Below, the code for genotype A is shown. This code was paralleled for the other genotypes (not shown). If you want to rerun the entire analysis in this document, you will have to apply below code for the other genotypes as well, because we used the final output of all genotypes for creating a data object that was used downstream.

# calculate the average expression per gene for each salinity treatment 
genoA_mean_16ppt_RQ1e2 = as.matrix(apply(expression_RQ1e2[,c(1:3)], 1, mean))
genoA_mean_24ppt_RQ1e2 = as.matrix(apply(expression_RQ1e2[,c(4:6)], 1, mean))
genoA_mean_8ppt_RQ1e2 = as.matrix(apply(expression_RQ1e2[,c(7:9)], 1, mean))

# add column names to the matrices with the average values
colnames(genoA_mean_16ppt_RQ1e2) = c("16ppt")
colnames(genoA_mean_24ppt_RQ1e2) = c("24ppt")
colnames(genoA_mean_8ppt_RQ1e2) = c("8ppt")

# merge the data frames & take the log value of the expression
genoA_mean_expression_RQ1e2_log = as.data.frame(log(cbind(genoA_mean_24ppt_RQ1e2, 
                                                          genoA_mean_16ppt_RQ1e2, 
                                                          genoA_mean_8ppt_RQ1e2)+1))

# extract expression levels of subsets of genes
## take only genes significant in all contrasts
genoA_OnlySignificantGenes_RQ1e2_allcontrasts = c(rownames(subset(
  OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`A8-A16` == 1 & 
  OnlySignGenes_RQ1e2_ConStage$`A8-A24` == 1 & 
  OnlySignGenes_RQ1e2_ConStage$`A16-A24` == 1))) 

genoA_mean_expression_RQ1e2_allcontrasts_log = subset(genoA_mean_expression_RQ1e2_log,
  rownames(genoA_mean_expression_RQ1e2_log)%in%genoA_OnlySignificantGenes_RQ1e2_allcontrasts) 

## take only genes significant in 24-16 and 24-8
genoA_OnlySignificantGenes_RQ1e2_16vs24_8vs24 = c(rownames(subset(
  OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`A8-A16` == 0 & 
  OnlySignGenes_RQ1e2_ConStage$`A8-A24` == 1 & OnlySignGenes_RQ1e2_ConStage$`A16-A24` == 1))) 

genoA_mean_expression_RQ1e2_16vs24_8vs24_log = subset(genoA_mean_expression_RQ1e2_log,
  rownames(genoA_mean_expression_RQ1e2_log)%in%genoA_OnlySignificantGenes_RQ1e2_16vs24_8vs24) 

# take only genes significant in 24-16 and 16-8
genoA_OnlySignificantGenes_RQ1e2_16vs24_8vs16 = c(rownames(subset(
  OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`A8-A16` == 1 & 
  OnlySignGenes_RQ1e2_ConStage$`A8-A24` == 0 & OnlySignGenes_RQ1e2_ConStage$`A16-A24` == 1))) 

genoA_mean_expression_RQ1e2_16vs24_8vs16_log = subset(genoA_mean_expression_RQ1e2_log,
  rownames(genoA_mean_expression_RQ1e2_log)%in%genoA_OnlySignificantGenes_RQ1e2_16vs24_8vs16) 

# take only genes significant in 24-8 and 16-8
genoA_OnlySignificantGenes_RQ1e2_8vs24_8vs16 = c(rownames(subset(
  OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`A8-A16` == 1 & 
  OnlySignGenes_RQ1e2_ConStage$`A8-A24` == 1 & OnlySignGenes_RQ1e2_ConStage$`A16-A24` == 0))) 

genoA_mean_expression_RQ1e2_8vs24_8vs16_log = subset(genoA_mean_expression_RQ1e2_log,
  rownames(genoA_mean_expression_RQ1e2_log)%in%genoA_OnlySignificantGenes_RQ1e2_8vs24_8vs16) 

# take only genes significant in 24-16
genoA_OnlySignificantGenes_RQ1e2_16vs24 = c(rownames(subset(
  OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`A8-A16` == 0 & 
  OnlySignGenes_RQ1e2_ConStage$`A8-A24` == 0 & OnlySignGenes_RQ1e2_ConStage$`A16-A24` == 1))) 

genoA_mean_expression_RQ1e2_16vs24_log = subset(genoA_mean_expression_RQ1e2_log, 
  rownames(genoA_mean_expression_RQ1e2_log)%in%genoA_OnlySignificantGenes_RQ1e2_16vs24) 

# take only genes significant in 24-8
genoA_OnlySignificantGenes_RQ1e2_8vs24 = c(rownames(subset(
  OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`A8-A16` == 0 & 
  OnlySignGenes_RQ1e2_ConStage$`A8-A24` == 1 & OnlySignGenes_RQ1e2_ConStage$`A16-A24` == 0))) 

genoA_mean_expression_RQ1e2_8vs24_log = subset(genoA_mean_expression_RQ1e2_log,
  rownames(genoA_mean_expression_RQ1e2_log)%in%genoA_OnlySignificantGenes_RQ1e2_8vs24) 

# take only genes significant in 16-8
genoA_OnlySignificantGenes_RQ1e2_8vs16 = c(rownames(subset(
  OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`A8-A16` == 1 & 
  OnlySignGenes_RQ1e2_ConStage$`A8-A24` == 0 & OnlySignGenes_RQ1e2_ConStage$`A16-A24` == 0))) 

genoA_mean_expression_RQ1e2_8vs16_log = subset(genoA_mean_expression_RQ1e2_log,
  rownames(genoA_mean_expression_RQ1e2_log)%in%genoA_OnlySignificantGenes_RQ1e2_8vs16) 

# take only genes that are not significant in the posthoc tests OR that are not selected in the average response
genoA_OnlySignGenes_RQ1e2_ConStage_average_names_posthoc = c(rownames(subset(
  OnlySignGenes_RQ1e2_ConStage, OnlySignGenes_RQ1e2_ConStage$`A8-A16` == 0 & 
  OnlySignGenes_RQ1e2_ConStage$`A8-A24` == 0 & OnlySignGenes_RQ1e2_ConStage$`A16-A24` == 0))) 

genoA_mean_expression_RQ1e2_posthoc_log = subset(genoA_mean_expression_RQ1e2_log,
  rownames(genoA_mean_expression_RQ1e2_log)%in%genoA_OnlySignGenes_RQ1e2_ConStage_average_names_posthoc)

# define the clusters
## significant in one contrast
RQ1e2_genoA_16g8 = rownames(subset(genoA_mean_expression_RQ1e2_8vs16_log, 
  genoA_mean_expression_RQ1e2_8vs16_log$`16ppt` > genoA_mean_expression_RQ1e2_8vs16_log$`8ppt`))
RQ1e2_genoA_16s8 = rownames(subset(genoA_mean_expression_RQ1e2_8vs16_log, 
  genoA_mean_expression_RQ1e2_8vs16_log$`16ppt` < genoA_mean_expression_RQ1e2_8vs16_log$`8ppt`))
RQ1e2_genoA_24g8 = rownames(subset(genoA_mean_expression_RQ1e2_8vs24_log, 
  genoA_mean_expression_RQ1e2_8vs24_log$`24ppt` > genoA_mean_expression_RQ1e2_8vs24_log$`8ppt`))
RQ1e2_genoA_24s8 = rownames(subset(genoA_mean_expression_RQ1e2_8vs24_log, 
  genoA_mean_expression_RQ1e2_8vs24_log$`24ppt` < genoA_mean_expression_RQ1e2_8vs24_log$`8ppt`))
RQ1e2_genoA_24g16 = rownames(subset(genoA_mean_expression_RQ1e2_16vs24_log, 
  genoA_mean_expression_RQ1e2_16vs24_log$`24ppt` > genoA_mean_expression_RQ1e2_16vs24_log$`16ppt`))
RQ1e2_genoA_24s16 = rownames(subset(genoA_mean_expression_RQ1e2_16vs24_log, 
  genoA_mean_expression_RQ1e2_16vs24_log$`24ppt` < genoA_mean_expression_RQ1e2_16vs24_log$`16ppt`))

## significant in two contrasts
RQ1e2_genoA_24g16_24g8 = rownames(subset(genoA_mean_expression_RQ1e2_16vs24_8vs24_log,
  genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` > genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`8ppt` &
  genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` > genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`16ppt`))
RQ1e2_genoA_24s16_24s8 = rownames(subset(genoA_mean_expression_RQ1e2_16vs24_8vs24_log,
  genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` < genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`8ppt` &
  genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` < genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`16ppt`))
RQ1e2_genoA_24g16_24s8 = rownames(subset(genoA_mean_expression_RQ1e2_16vs24_8vs24_log,
  genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` < genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`8ppt` &
  genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` > genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`16ppt`))
RQ1e2_genoA_24s16_24g8 = rownames(subset(genoA_mean_expression_RQ1e2_16vs24_8vs24_log,
  genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` > genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`8ppt` &
  genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`24ppt` < genoA_mean_expression_RQ1e2_16vs24_8vs24_log$`16ppt`))
RQ1e2_genoA_24g8_16g8 = rownames(subset(genoA_mean_expression_RQ1e2_8vs24_8vs16_log, 
  genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`24ppt` > genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt` &
  genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`16ppt` > genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt`))
RQ1e2_genoA_24s8_16s8 = rownames(subset(genoA_mean_expression_RQ1e2_8vs24_8vs16_log, 
  genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`24ppt` < genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt` &
  genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`16ppt` < genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt`))
RQ1e2_genoA_24g8_16s8 = rownames(subset(genoA_mean_expression_RQ1e2_8vs24_8vs16_log, 
  genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`24ppt` > genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt` &
  genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`16ppt` < genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt`))
RQ1e2_genoA_24s8_16g8 = rownames(subset(genoA_mean_expression_RQ1e2_8vs24_8vs16_log, 
  genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`24ppt` < genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt` &
  genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`16ppt` > genoA_mean_expression_RQ1e2_8vs24_8vs16_log$`8ppt`))
RQ1e2_genoA_24g16_16g8 = rownames(subset(genoA_mean_expression_RQ1e2_16vs24_8vs16_log,
  genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`24ppt` > genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` &
  genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` > genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`8ppt`))
RQ1e2_genoA_24s16_16s8 = rownames(subset(genoA_mean_expression_RQ1e2_16vs24_8vs16_log,
  genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`24ppt` < genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` &
  genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` < genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`8ppt`))
RQ1e2_genoA_24g16_16s8 = rownames(subset(genoA_mean_expression_RQ1e2_16vs24_8vs16_log,
  genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`24ppt` > genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` &
  genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` < genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`8ppt`))
RQ1e2_genoA_24s16_16g8 = rownames(subset(genoA_mean_expression_RQ1e2_16vs24_8vs16_log,
  genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`24ppt` < genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` &
  genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`16ppt` > genoA_mean_expression_RQ1e2_16vs24_8vs16_log$`8ppt`))

## significant in three contrasts
RQ1e2_genoA_24g8_24g16_16g8 = rownames(subset(genoA_mean_expression_RQ1e2_allcontrasts_log,
  genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` > genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
  genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` > genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
  genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` > genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_genoA_24s8_24s16_16s8 = rownames(subset(genoA_mean_expression_RQ1e2_allcontrasts_log,
  genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` < genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
  genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` < genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
  genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` < genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_genoA_24g8_24s16_16g8 = rownames(subset(genoA_mean_expression_RQ1e2_allcontrasts_log,
  genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` > genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
  genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` < genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
  genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` > genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_genoA_24g8_24s16_16s8 = rownames(subset(genoA_mean_expression_RQ1e2_allcontrasts_log,
  genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` > genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
  genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` < genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
  genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` < genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_genoA_24g8_24g16_16s8 = rownames(subset(genoA_mean_expression_RQ1e2_allcontrasts_log,
  genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` > genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
  genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` > genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
  genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` < genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_genoA_24s8_24g16_16s8 = rownames(subset(genoA_mean_expression_RQ1e2_allcontrasts_log,
  genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` < genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
  genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` > genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
  genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` < genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_genoA_24s8_24s16_16g8 = rownames(subset(genoA_mean_expression_RQ1e2_allcontrasts_log,
  genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` < genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
  genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` < genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
  genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` > genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt`))
RQ1e2_genoA_24s8_24g16_16g8 = rownames(subset(genoA_mean_expression_RQ1e2_allcontrasts_log,
  genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` < genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt` &
  genoA_mean_expression_RQ1e2_allcontrasts_log$`24ppt` > genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` &
  genoA_mean_expression_RQ1e2_allcontrasts_log$`16ppt` > genoA_mean_expression_RQ1e2_allcontrasts_log$`8ppt`))

# combine all the clusters into a list
RQ1e2_genoA_cluster_list = list(RQ1e2_genoA_16g8, RQ1e2_genoA_16s8, RQ1e2_genoA_24g8, RQ1e2_genoA_24s8, 
                                RQ1e2_genoA_24g16, RQ1e2_genoA_24s16, RQ1e2_genoA_24g16_24g8, RQ1e2_genoA_24s16_24s8,   
                                RQ1e2_genoA_24g16_24s8, RQ1e2_genoA_24s16_24g8, RQ1e2_genoA_24g8_16g8,
                                RQ1e2_genoA_24s8_16s8, RQ1e2_genoA_24g8_16s8, RQ1e2_genoA_24s8_16g8, 
                                RQ1e2_genoA_24g16_16g8, RQ1e2_genoA_24s16_16s8, RQ1e2_genoA_24g16_16s8,
                                RQ1e2_genoA_24s16_16g8, RQ1e2_genoA_24g8_24g16_16g8, RQ1e2_genoA_24s8_24s16_16s8,
                                RQ1e2_genoA_24g8_24s16_16g8, RQ1e2_genoA_24g8_24s16_16s8, RQ1e2_genoA_24g8_24g16_16s8,
                                RQ1e2_genoA_24s8_24g16_16s8, RQ1e2_genoA_24s8_24s16_16g8, RQ1e2_genoA_24s8_24g16_16g8)

names(RQ1e2_genoA_cluster_list) = c("RQ1e2 genA: 16>8","RQ1e2 genA: 16<8","RQ1e2 genA: 24>8","RQ1e2 genA: 24<8",
                                    "RQ1e2 genA: 24>16","RQ1e2 genA: 24<16","RQ1e2 genA: 24>16 and 24>8",
                                    "RQ1e2 genA: 24<16 and 24<8","RQ1e2 genA: 24>16 and 24<8",
                                    "RQ1e2 genA: 24<16 and 24>8","RQ1e2 genA: 24>8 and 16>8",
                                    "RQ1e2 genA: 24<8 and 16<8","RQ1e2 genA: 24>8 and 16<8",
                                    "RQ1e2 genA: 24<8 and 16>8","RQ1e2 genA: 24>16 and 16>8",
                                    "RQ1e2 genA: 24<16 and 16<8","RQ1e2 genA: 24>16 and 16<8",
                                    "RQ1e2 genA: 24<16 and 16>8","RQ1e2 genA: 24>8 and 24>16 and 16>8",
                                    "RQ1e2 genA: 24<8 and 24<16 and 16<8","RQ1e2 genA: 24>8 and 24<16 and 16>8",
                                    "RQ1e2 genA: 24>8 and 24<16 and 16<8","RQ1e2 genA: 24>8 and 24>16 and 16<8",
                                    "RQ1e2 genA: 24<8 and 24>16 and 16<8","RQ1e2 genA: 24<8 and 24<16 and 16>8",
                                    "RQ1e2 genA: 24<8 and 24>16 and 16>8")

#create name list
namesA = c(RQ1e2_genoA_16g8,RQ1e2_genoA_16s8,RQ1e2_genoA_24g8,RQ1e2_genoA_24s8,RQ1e2_genoA_24g16,
           RQ1e2_genoA_24s16,RQ1e2_genoA_24g16_24g8,RQ1e2_genoA_24s16_24s8,RQ1e2_genoA_24g16_24s8,
           RQ1e2_genoA_24s16_24g8,RQ1e2_genoA_24g8_16g8,RQ1e2_genoA_24s8_16s8,RQ1e2_genoA_24g8_16s8,
           RQ1e2_genoA_24s8_16g8,RQ1e2_genoA_24g16_16g8,RQ1e2_genoA_24s16_16s8,RQ1e2_genoA_24g16_16s8,
           RQ1e2_genoA_24s16_16g8,RQ1e2_genoA_24g8_24g16_16g8,RQ1e2_genoA_24s8_24s16_16s8,
           RQ1e2_genoA_24g8_24s16_16g8,RQ1e2_genoA_24g8_24s16_16s8,RQ1e2_genoA_24g8_24g16_16s8,
           RQ1e2_genoA_24s8_24g16_16s8,RQ1e2_genoA_24s8_24s16_16g8,RQ1e2_genoA_24s8_24g16_16g8)

After running above code for all genotypes, we will now combine cluster information of the average response and each genotype-specific response into one data object:

# create list of lists that gives cluster information for each gene for each genotype and the average response
RQ1e2_cluster_df = setNames(lapply(names, 
                                   function(x) names(which(sapply(RQ1e2_cluster_list, 
                                   function(y) x %in% y)))), names)
RQ1e2_genoA_cluster_df = setNames(lapply(namesA, 
                                         function(x) names(which(sapply(RQ1e2_genoA_cluster_list, 
                                         function(y) x %in% y)))), namesA)
RQ1e2_genoB_cluster_df = setNames(lapply(namesB, 
                                         function(x) names(which(sapply(RQ1e2_genoB_cluster_list, 
                                         function(y) x %in% y)))), namesB)
RQ1e2_genoD_cluster_df = setNames(lapply(namesD, 
                                         function(x) names(which(sapply(RQ1e2_genoD_cluster_list, 
                                         function(y) x %in% y)))), namesD)
RQ1e2_genoF_cluster_df = setNames(lapply(namesF, 
                                         function(x) names(which(sapply(RQ1e2_genoF_cluster_list, 
                                         function(y) x %in% y)))), namesF)
RQ1e2_genoI_cluster_df = setNames(lapply(namesI, 
                                         function(x) names(which(sapply(RQ1e2_genoI_cluster_list, 
                                         function(y) x %in% y)))), namesI)
RQ1e2_genoJ_cluster_df = setNames(lapply(namesJ, 
                                         function(x) names(which(sapply(RQ1e2_genoJ_cluster_list, 
                                         function(y) x %in% y)))), namesJ)
RQ1e2_genoK_cluster_df = setNames(lapply(namesK, 
                                         function(x) names(which(sapply(RQ1e2_genoK_cluster_list, 
                                         function(y) x %in% y)))), namesK)
RQ1e2_genoP_cluster_df = setNames(lapply(namesP, 
                                         function(x) names(which(sapply(RQ1e2_genoP_cluster_list, 
                                         function(y) x %in% y)))), namesP)

# combine the different lists
keys = unique(c(names(RQ1e2_cluster_df), names(RQ1e2_genoA_cluster_df), names(RQ1e2_genoB_cluster_df), 
                names(RQ1e2_genoD_cluster_df), names(RQ1e2_genoF_cluster_df),names(RQ1e2_genoI_cluster_df),
                names(RQ1e2_genoJ_cluster_df), names(RQ1e2_genoK_cluster_df), names(RQ1e2_genoP_cluster_df)))

RQ1e2_ALL_cluster_df = setNames(mapply(c, RQ1e2_cluster_df[keys],  RQ1e2_genoA_cluster_df[keys], 
                                       RQ1e2_genoB_cluster_df[keys], RQ1e2_genoD_cluster_df[keys],
                                       RQ1e2_genoF_cluster_df[keys],RQ1e2_genoI_cluster_df[keys], 
                                       RQ1e2_genoJ_cluster_df[keys], RQ1e2_genoK_cluster_df[keys], 
                                       RQ1e2_genoP_cluster_df[keys]), keys)

# example output for one gene
RQ1e2_ALL_cluster_df$Sm_g00016729

## NULL

3.3.8 GO enrichment: ORA analysis with TopGo

In this section, we performed GO enrichment using Fisher’s Exact test in TopGO. GO enrichment was done using the cluster information of the average response and each individual genotype which were obtained in section 3.3.7.

We started with importing the GO annotations of the S. marinoi genome. The file Skmarinoi8x3_GOannotation.txt contained the GO terms that were obtained in the InterProScan analysis.

# import GO information
geneID2GO = readMappings(file = "01.Skeletonema_marinoi_genome_v1.1.2/Smarinoi_Ref1.1.2_GOterms.txt")
geneUniverse = names(geneID2GO)

Next, we defined four clusters of genes with similar expression patterns, based on the cluster information in section 3.3.7 on the average response:

genes that were upregulated in low salinities
genes that were downregulated in low salinities
genes that were upregulated in intermediate salinities
genes that were downregulated in intermediate salinities

To create above categories, several clusters were combined together:

# define four sets of genes for GO enrichment

## genes that are upregulated in low salinities
RQ1e2_UpInLowSal = c(RQ1e2_16s8,RQ1e2_24s8,RQ1e2_24s16,RQ1e2_24s16_24s8,RQ1e2_24s8_16s8,RQ1e2_24s8_24s16_16s8)
length(RQ1e2_UpInLowSal)

## [1] 2637

## genes that are downregulated in low salinities
RQ1e2_UpInHighSal = c(RQ1e2_16g8,RQ1e2_24g8,RQ1e2_24g16,RQ1e2_24g16_24g8,RQ1e2_24g8_16g8,RQ1e2_24g8_24g16_16g8)
length(RQ1e2_UpInHighSal)

## [1] 2461

## genes that are upregulated in intermediate salinities
RQ1e2_24s16_16g8_c1 = c(RQ1e2_24s16_24g8,RQ1e2_24s8_16g8,RQ1e2_24s16_16g8,RQ1e2_24g8_24s16_16g8,RQ1e2_24s8_24s16_16g8)
length(RQ1e2_24s16_16g8_c1)

## [1] 100

## genes that are downregulated in intermediate salinities
RQ1e2_24g16_16s8_c2 = c(RQ1e2_24g8_16s8,RQ1e2_24g16_16s8,RQ1e2_24g8_24g16_16s8,RQ1e2_24s8_24g16_16s8)
length(RQ1e2_24g16_16s8_c2)

## [1] 87

Next, we performed GO enrichment on these four sets of genes:

# topGO: downregulated in low salinities

## create gene list for input in topGO
geneList_cluster_UpInHighSal = factor(as.integer(geneUniverse %in% RQ1e2_UpInHighSal))
names(geneList_cluster_UpInHighSal) = geneUniverse
str(geneList_cluster_UpInHighSal)

## create a topGO object (for biological process GOs)
GOdata_BP_cluster_UpInHighSal = new("topGOdata", ontology="BP", allGenes=geneList_cluster_UpInHighSal, 
                                    annot = annFUN.gene2GO, gene2GO = geneID2GO)
## create a topGO object (for molecular function GOs)
GOdata_MF_cluster_UpInHighSal = new("topGOdata", ontology="MF", allGenes=geneList_cluster_UpInHighSal, 
                                    annot = annFUN.gene2GO, gene2GO = geneID2GO)
## create a topGO object (for cellular component GOs)
GOdata_CC_cluster_UpInHighSal = new("topGOdata", ontology="CC", allGenes=geneList_cluster_UpInHighSal, 
                                    annot = annFUN.gene2GO, gene2GO = geneID2GO)

## run Fisher's exact test
resultFisher_BP_cluster_UpInHighSal_elim = runTest(GOdata_BP_cluster_UpInHighSal, 
                                                    algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_UpInHighSal_elim = runTest(GOdata_MF_cluster_UpInHighSal, 
                                                    algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_UpInHighSal_elim = runTest(GOdata_CC_cluster_UpInHighSal, 
                                               algorithm = "elim", statistic = "fisher")

## extract the significant GO terms
allRes_BP_cluster_UpInHighSal_elim = GenTable(GOdata_BP_cluster_UpInHighSal, 
                                               classic = resultFisher_BP_cluster_UpInHighSal_elim, 
                                               orderBy = "elim", ranksOf = "elim", 
                                               topNodes = 45, numChar=1000)
allRes_MF_cluster_UpInHighSal_elim = GenTable(GOdata_MF_cluster_UpInHighSal, 
                                               classic = resultFisher_MF_cluster_UpInHighSal_elim, 
                                               orderBy = "elim", ranksOf = "elim", 
                                               topNodes = 50, numChar=1000)
allRes_CC_cluster_UpInHighSal_elim = GenTable(GOdata_CC_cluster_UpInHighSal, 
                                          classic = resultFisher_CC_cluster_UpInHighSal_elim, 
                                          orderBy = "elim", ranksOf = "elim", 
                                          topNodes = 20, numChar=1000)

# topGO: upregulated in low salinities

## create gene list for input in topGO
geneList_cluster_UpInLowSal = factor(as.integer(geneUniverse %in% RQ1e2_UpInLowSal))
names(geneList_cluster_UpInLowSal) = geneUniverse
str(geneList_cluster_UpInLowSal)

## create a topGO object (for biological process GOs)
GOdata_BP_cluster_UpInLowSal = new("topGOdata", ontology="BP", allGenes=geneList_cluster_UpInLowSal, 
                                   annot = annFUN.gene2GO, gene2GO = geneID2GO)
## create a topGO object (for molecular function GOs)
GOdata_MF_cluster_UpInLowSal = new("topGOdata", ontology="MF", allGenes=geneList_cluster_UpInLowSal, 
                                   annot = annFUN.gene2GO, gene2GO = geneID2GO)
## create a topGO object (for cellular component GOs)
GOdata_CC_cluster_UpInLowSal = new("topGOdata", ontology="CC", allGenes=geneList_cluster_UpInLowSal, 
                                   annot = annFUN.gene2GO, gene2GO = geneID2GO)

## run Fisher's exact test
resultFisher_BP_cluster_UpInLowSal_elim = runTest(GOdata_BP_cluster_UpInLowSal, 
                                                   algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_UpInLowSal_elim = runTest(GOdata_MF_cluster_UpInLowSal, 
                                                   algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_UpInLowSal_elim = runTest(GOdata_CC_cluster_UpInLowSal, 
                                              algorithm = "elim", statistic = "fisher")

## extract the significant GO terms
allRes_BP_cluster_UpInLowSal_elim = GenTable(GOdata_BP_cluster_UpInLowSal, 
                                              classic = resultFisher_BP_cluster_UpInLowSal_elim, 
                                              orderBy = "elim", ranksOf = "elim", 
                                              topNodes = 110, numChar=1000)
allRes_MF_cluster_UpInLowSal_elim = GenTable(GOdata_MF_cluster_UpInLowSal, 
                                              classic = resultFisher_MF_cluster_UpInLowSal_elim, 
                                              orderBy = "elim", ranksOf = "elim", 
                                              topNodes = 75, numChar=1000)
allRes_CC_cluster_UpInLowSal_elim = GenTable(GOdata_CC_cluster_UpInLowSal, 
                                         classic = resultFisher_CC_cluster_UpInLowSal_elim, 
                                         orderBy = "elim", ranksOf = "elim", 
                                         topNodes = 20, numChar=1000)

# topGO: upregulated in intermediate salinities

## create gene list for input in topGO
geneList_cluster_UpInMedSal = factor(as.integer(geneUniverse %in% RQ1e2_24s16_16g8_c1))
names(geneList_cluster_UpInMedSal) = geneUniverse
str(geneList_cluster_UpInMedSal)

## create a topGO object (for biological process GOs)
GOdata_BP_cluster_UpInMedSal = new("topGOdata", ontology="BP", allGenes=geneList_cluster_UpInMedSal, 
                                   annot = annFUN.gene2GO, gene2GO = geneID2GO)
## create a topGO object (for molecular function GOs)
GOdata_MF_cluster_UpInMedSal = new("topGOdata", ontology="MF", allGenes=geneList_cluster_UpInMedSal, 
                                   annot = annFUN.gene2GO, gene2GO = geneID2GO)
## create a topGO object (for cellular component GOs)
GOdata_CC_cluster_UpInMedSal = new("topGOdata", ontology="CC", allGenes=geneList_cluster_UpInMedSal, 
                                   annot = annFUN.gene2GO, gene2GO = geneID2GO)

## run Fisher's exact test
resultFisher_BP_cluster_UpInMedSal_elim = runTest(GOdata_BP_cluster_UpInMedSal, 
                                                   algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_UpInMedSal_elim = runTest(GOdata_MF_cluster_UpInMedSal, 
                                                   algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_UpInMedSal_elim = runTest(GOdata_CC_cluster_UpInMedSal, 
                                             algorithm = "elim", statistic = "fisher")

## extract the significant GO terms
allRes_BP_cluster_UpInMedSal_elim = GenTable(GOdata_BP_cluster_UpInMedSal, 
                                              lassic = resultFisher_BP_cluster_UpInMedSal_elim, 
                                              orderBy = "elim", ranksOf = "elim", 
                                              topNodes = 10, numChar=1000)
allRes_MF_cluster_UpInMedSal_elim = GenTable(GOdata_MF_cluster_UpInMedSal, 
                                              classic = resultFisher_MF_cluster_UpInMedSal_elim, 
                                              orderBy = "elim", ranksOf = "elim", 
                                              topNodes = 10, numChar=1000)
allRes_CC_cluster_UpInMedSal_elim = GenTable(GOdata_CC_cluster_UpInMedSal, 
                                         classic = resultFisher_CC_cluster_UpInMedSal_elim, 
                                         orderBy = "elim", ranksOf = "elim", 
                                         topNodes = 10, numChar=1000)

# topGO: downregulated in intermediate salinities

## create gene list for input in topGO
geneList_cluster_DownInMedSal = factor(as.integer(geneUniverse %in% RQ1e2_24g16_16s8_c2 ))
names(geneList_cluster_DownInMedSal) = geneUniverse
str(geneList_cluster_DownInMedSal)

## create a topGO object (for biological process GOs)
GOdata_BP_cluster_DownInMedSal = new("topGOdata", ontology="BP", allGenes=geneList_cluster_DownInMedSal, 
                                     annot = annFUN.gene2GO, gene2GO = geneID2GO)
## create a topGO object (for molecular function GOs)
GOdata_MF_cluster_DownInMedSal = new("topGOdata", ontology="MF", allGenes=geneList_cluster_DownInMedSal, 
                                     annot = annFUN.gene2GO, gene2GO = geneID2GO)
## create a topGO object (for cellular component GOs)
GOdata_CC_cluster_DownInMedSal = new("topGOdata", ontology="CC", allGenes=geneList_cluster_DownInMedSal, 
                                     annot = annFUN.gene2GO, gene2GO = geneID2GO)

## run Fisher's exact test
resultFisher_BP_cluster_DownInMedSal_elim = runTest(GOdata_BP_cluster_DownInMedSal, 
                                                     algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_DownInMedSal_elim = runTest(GOdata_MF_cluster_DownInMedSal, 
                                                     algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_DownInMedSal_elim = runTest(GOdata_CC_cluster_DownInMedSal, 
                                                algorithm = "elim", statistic = "fisher")

## extract the significant GO terms
allRes_BP_cluster_DownInMedSal_elim = GenTable(GOdata_BP_cluster_DownInMedSal, 
                                                classic = resultFisher_BP_cluster_DownInMedSal_elim, 
                                                orderBy = "elim", ranksOf = "elim", 
                                                topNodes = 20,numChar=1000)
allRes_MF_cluster_DownInMedSal_elim = GenTable(GOdata_MF_cluster_DownInMedSal, 
                                                classic = resultFisher_MF_cluster_DownInMedSal_elim, 
                                                orderBy = "elim", ranksOf = "elim", 
                                                topNodes = 25, numChar=1000)
allRes_CC_cluster_DownInMedSal_elim = GenTable(GOdata_CC_cluster_DownInMedSal, 
                                           classic = resultFisher_CC_cluster_DownInMedSal_elim, 
                                           orderBy = "elim", ranksOf = "elim", 
                                           topNodes = 10, numChar=1000)

In a next step, we reduced the list with significant GO terms using the online application REVIGO. For REVIGO, we used the output of the Fisher’s Exact test (only including the GO terms that had a P-value <= 0.05) and used a 0.5 similarity threshold with the SimRel algorithm. P-values from the Fisher’s Exact Test were included in the input to REVIGO.

REVIGO for the above analyses was accessed on July 6th 2021, and used the Gene Ontology database of May 1st 2021 and the UniProt-to-GO mapping database from April 9th 2021.

We repeated above analyses for each genotype separately (code only shown for genotype A). If you want to rerun the entire analysis in this document, you will have to apply below code for the other genotypes as well, because we will use the final output for creating a data object that will be used downstream.

# define clusters to test
RQ1e2_genoA_UpInLowSal = c(RQ1e2_genoA_16s8,RQ1e2_genoA_24s8,RQ1e2_genoA_24s16,RQ1e2_genoA_24s16_24s8,
                           RQ1e2_genoA_24s8_16s8,RQ1e2_genoA_24s8_24s16_16s8)
RQ1e2_genoA_UpInHighSal = c(RQ1e2_genoA_16g8,RQ1e2_genoA_24g8,RQ1e2_genoA_24g16,RQ1e2_genoA_24g16_24g8,
                            RQ1e2_genoA_24g8_16g8,RQ1e2_genoA_24g8_24g16_16g8)

# genotype A: upregulated in high salinities
## create gene list for input in topGO
geneList_cluster_genoA_UpInHighSal = factor(as.integer(geneUniverse %in% RQ1e2_genoA_UpInHighSal))
names(geneList_cluster_genoA_UpInHighSal) = geneUniverse
str(geneList_cluster_genoA_UpInHighSal)

## create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoA_UpInHighSal = new("topGOdata", ontology="BP", 
                                          allGenes=geneList_cluster_genoA_UpInHighSal, 
                                          annot = annFUN.gene2GO, gene2GO = geneID2GO)
## create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoA_UpInHighSal = new("topGOdata", ontology="MF", 
                                          allGenes=geneList_cluster_genoA_UpInHighSal, 
                                          annot = annFUN.gene2GO, gene2GO = geneID2GO)
## create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoA_UpInHighSal = new("topGOdata", ontology="CC",
                                          allGenes=geneList_cluster_genoA_UpInHighSal, 
                                          annot = annFUN.gene2GO, gene2GO = geneID2GO)

## run Fisher's exact test
resultFisher_BP_cluster_genoA_UpInHighSal = runTest(GOdata_BP_cluster_genoA_UpInHighSal, 
                                                     algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoA_UpInHighSal = runTest(GOdata_MF_cluster_genoA_UpInHighSal, 
                                                     algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoA_UpInHighSal = runTest(GOdata_CC_cluster_genoA_UpInHighSal, 
                                                     algorithm = "elim", statistic = "fisher")

## extract the significant GO terms
allRes_BP_cluster_genoA_UpInHighSal = GenTable(GOdata_BP_cluster_genoA_UpInHighSal, 
                                                classic = resultFisher_BP_cluster_genoA_UpInHighSal, 
                                                orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoA_UpInHighSal = GenTable(GOdata_MF_cluster_genoA_UpInHighSal, 
                                                classic = resultFisher_MF_cluster_genoA_UpInHighSal, 
                                                orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoA_UpInHighSal = GenTable(GOdata_CC_cluster_genoA_UpInHighSal, 
                                                classic =resultFisher_CC_cluster_genoA_UpInHighSal, 
                                                orderBy = "elim", ranksOf = "elim", topNodes = 100)

# genotype A: upregulated in low salinities
## create gene list for input in topGO
geneList_cluster_genoA_UpInLowSal = factor(as.integer(geneUniverse %in% RQ1e2_genoA_UpInLowSal))
names(geneList_cluster_genoA_UpInLowSal) = geneUniverse
str(geneList_cluster_genoA_UpInLowSal)

## create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoA_UpInLowSal = new("topGOdata", ontology="BP", allGenes=geneList_cluster_genoA_UpInLowSal, 
                                         annot = annFUN.gene2GO, gene2GO = geneID2GO)
## create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoA_UpInLowSal = new("topGOdata", ontology="MF", allGenes=geneList_cluster_genoA_UpInLowSal, 
                                         annot = annFUN.gene2GO, gene2GO = geneID2GO)
# #create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoA_UpInLowSal = new("topGOdata", ontology="CC", allGenes=geneList_cluster_genoA_UpInLowSal, 
                                         annot = annFUN.gene2GO, gene2GO = geneID2GO)

## run Fisher's exact test
resultFisher_BP_cluster_genoA_UpInLowSal = runTest(GOdata_BP_cluster_genoA_UpInLowSal, 
                                                    algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoA_UpInLowSal = runTest(GOdata_MF_cluster_genoA_UpInLowSal, 
                                                    algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoA_UpInLowSal = runTest(GOdata_CC_cluster_genoA_UpInLowSal, 
                                                    algorithm = "elim", statistic = "fisher")

## extract the significant GO terms
allRes_BP_cluster_genoA_UpInLowSal = GenTable(GOdata_BP_cluster_genoA_UpInLowSal, 
                                               classic = resultFisher_BP_cluster_genoA_UpInLowSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoA_UpInLowSal = GenTable(GOdata_MF_cluster_genoA_UpInLowSal, 
                                               classic = resultFisher_MF_cluster_genoA_UpInLowSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoA_UpInLowSal = GenTable(GOdata_CC_cluster_genoA_UpInLowSal, 
                                               classic = resultFisher_CC_cluster_genoA_UpInLowSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)

###genotype B
RQ1e2_genoB_UpInLowSal = c(RQ1e2_genoB_16s8,RQ1e2_genoB_24s8,RQ1e2_genoB_24s16,RQ1e2_genoB_24s16_24s8,RQ1e2_genoB_24s8_16s8,RQ1e2_genoB_24s8_24s16_16s8)
RQ1e2_genoB_UpInHighSal = c(RQ1e2_genoB_16g8,RQ1e2_genoB_24g8,RQ1e2_genoB_24g16,RQ1e2_genoB_24g16_24g8,RQ1e2_genoB_24g8_16g8,RQ1e2_genoB_24g8_24g16_16g8)

###genotype D
RQ1e2_genoD_UpInLowSal = c(RQ1e2_genoD_16s8,RQ1e2_genoD_24s8,RQ1e2_genoD_24s16,RQ1e2_genoD_24s16_24s8,RQ1e2_genoD_24s8_16s8,RQ1e2_genoD_24s8_24s16_16s8)
RQ1e2_genoD_UpInHighSal = c(RQ1e2_genoD_16g8,RQ1e2_genoD_24g8,RQ1e2_genoD_24g16,RQ1e2_genoD_24g16_24g8,RQ1e2_genoD_24g8_16g8,RQ1e2_genoD_24g8_24g16_16g8)

###genotype F
RQ1e2_genoF_UpInLowSal = c(RQ1e2_genoF_16s8,RQ1e2_genoF_24s8,RQ1e2_genoF_24s16,RQ1e2_genoF_24s16_24s8,RQ1e2_genoF_24s8_16s8,RQ1e2_genoF_24s8_24s16_16s8)
RQ1e2_genoF_UpInHighSal = c(RQ1e2_genoF_16g8,RQ1e2_genoF_24g8,RQ1e2_genoF_24g16,RQ1e2_genoF_24g16_24g8,RQ1e2_genoF_24g8_16g8,RQ1e2_genoF_24g8_24g16_16g8)

###genotype I
RQ1e2_genoI_UpInLowSal = c(RQ1e2_genoI_16s8,RQ1e2_genoI_24s8,RQ1e2_genoI_24s16,RQ1e2_genoI_24s16_24s8,RQ1e2_genoI_24s8_16s8,RQ1e2_genoI_24s8_24s16_16s8)
RQ1e2_genoI_UpInHighSal = c(RQ1e2_genoI_16g8,RQ1e2_genoI_24g8,RQ1e2_genoI_24g16,RQ1e2_genoI_24g16_24g8,RQ1e2_genoI_24g8_16g8,RQ1e2_genoI_24g8_24g16_16g8)

###genotype J
RQ1e2_genoJ_UpInLowSal = c(RQ1e2_genoJ_16s8,RQ1e2_genoJ_24s8,RQ1e2_genoJ_24s16,RQ1e2_genoJ_24s16_24s8,RQ1e2_genoJ_24s8_16s8,RQ1e2_genoJ_24s8_24s16_16s8)
RQ1e2_genoJ_UpInHighSal = c(RQ1e2_genoJ_16g8,RQ1e2_genoJ_24g8,RQ1e2_genoJ_24g16,RQ1e2_genoJ_24g16_24g8,RQ1e2_genoJ_24g8_16g8,RQ1e2_genoJ_24g8_24g16_16g8)
length(RQ1e2_genoJ_UpInLowSal)
length(RQ1e2_genoJ_UpInHighSal)

###genotype K
RQ1e2_genoK_UpInLowSal = c(RQ1e2_genoK_16s8,RQ1e2_genoK_24s8,RQ1e2_genoK_24s16,RQ1e2_genoK_24s16_24s8,RQ1e2_genoK_24s8_16s8,RQ1e2_genoK_24s8_24s16_16s8)
RQ1e2_genoK_UpInHighSal = c(RQ1e2_genoK_16g8,RQ1e2_genoK_24g8,RQ1e2_genoK_24g16,RQ1e2_genoK_24g16_24g8,RQ1e2_genoK_24g8_16g8,RQ1e2_genoK_24g8_24g16_16g8)

###genotype P
RQ1e2_genoP_UpInLowSal = c(RQ1e2_genoP_16s8,RQ1e2_genoP_24s8,RQ1e2_genoP_24s16,RQ1e2_genoP_24s16_24s8,RQ1e2_genoP_24s8_16s8,RQ1e2_genoP_24s8_24s16_16s8)
RQ1e2_genoP_UpInHighSal = c(RQ1e2_genoP_16g8,RQ1e2_genoP_24g8,RQ1e2_genoP_24g16,RQ1e2_genoP_24g16_24g8,RQ1e2_genoP_24g8_16g8,RQ1e2_genoP_24g8_24g16_16g8)

###genotype B Up in High Salinities
####create gene list for input in topGO
geneList_cluster_genoB_UpInHighSal = factor(as.integer(geneUniverse %in% RQ1e2_genoB_UpInHighSal))
names(geneList_cluster_genoB_UpInHighSal) = geneUniverse
str(geneList_cluster_genoB_UpInHighSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoB_UpInHighSal = new("topGOdata", ontology="BP", 
                                          allGenes=geneList_cluster_genoB_UpInHighSal, 
                                          annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoB_UpInHighSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoB_UpInHighSal = new("topGOdata", ontology="MF", 
                                          allGenes=geneList_cluster_genoB_UpInHighSal, 
                                          annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoB_UpInHighSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoB_UpInHighSal = new("topGOdata", ontology="CC", 
                                          allGenes=geneList_cluster_genoB_UpInHighSal, 
                                          annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoB_UpInHighSal
####run Fisher's exact test
resultFisher_BP_cluster_genoB_UpInHighSal = runTest(GOdata_BP_cluster_genoB_UpInHighSal, 
                                                    algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoB_UpInHighSal
resultFisher_MF_cluster_genoB_UpInHighSal = runTest(GOdata_MF_cluster_genoB_UpInHighSal, 
                                                    algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoB_UpInHighSal
resultFisher_CC_cluster_genoB_UpInHighSal = runTest(GOdata_CC_cluster_genoB_UpInHighSal, 
                                                    algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoB_UpInHighSal
####extract the significant GO terms
allRes_BP_cluster_genoB_UpInHighSal = GenTable(GOdata_BP_cluster_genoB_UpInHighSal, 
                                               classic = resultFisher_BP_cluster_genoB_UpInHighSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoB_UpInHighSal
allRes_MF_cluster_genoB_UpInHighSal = GenTable(GOdata_MF_cluster_genoB_UpInHighSal, 
                                               classic = resultFisher_MF_cluster_genoB_UpInHighSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoB_UpInHighSal
allRes_CC_cluster_genoB_UpInHighSal = GenTable(GOdata_CC_cluster_genoB_UpInHighSal, 
                                               classic = resultFisher_CC_cluster_genoB_UpInHighSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoB_UpInHighSal

###genotype B Up in Low Salinities
####create gene list for input in topGO
geneList_cluster_genoB_UpInLowSal = factor(as.integer(geneUniverse %in% RQ1e2_genoB_UpInLowSal))
names(geneList_cluster_genoB_UpInLowSal) = geneUniverse
str(geneList_cluster_genoB_UpInLowSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoB_UpInLowSal = new("topGOdata", ontology="BP", 
                                         allGenes=geneList_cluster_genoB_UpInLowSal, 
                                         annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoB_UpInLowSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoB_UpInLowSal = new("topGOdata", ontology="MF", 
                                         allGenes=geneList_cluster_genoB_UpInLowSal, 
                                         annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoB_UpInLowSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoB_UpInLowSal = new("topGOdata", ontology="CC", 
                                         allGenes=geneList_cluster_genoB_UpInLowSal, 
                                         annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoB_UpInLowSal
####run Fisher's exact test
resultFisher_BP_cluster_genoB_UpInLowSal = runTest(GOdata_BP_cluster_genoB_UpInLowSal, 
                                                   algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoB_UpInLowSal
resultFisher_MF_cluster_genoB_UpInLowSal = runTest(GOdata_MF_cluster_genoB_UpInLowSal, 
                                                   algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoB_UpInLowSal
resultFisher_CC_cluster_genoB_UpInLowSal = runTest(GOdata_CC_cluster_genoB_UpInLowSal, 
                                                   algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoB_UpInLowSal
####extract the significant GO terms
allRes_BP_cluster_genoB_UpInLowSal = GenTable(GOdata_BP_cluster_genoB_UpInLowSal, 
                                              classic = resultFisher_BP_cluster_genoB_UpInLowSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoB_UpInLowSal
allRes_MF_cluster_genoB_UpInLowSal = GenTable(GOdata_MF_cluster_genoB_UpInLowSal, 
                                              classic = resultFisher_MF_cluster_genoB_UpInLowSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoB_UpInLowSal
allRes_CC_cluster_genoB_UpInLowSal = GenTable(GOdata_CC_cluster_genoB_UpInLowSal, 
                                              classic = resultFisher_CC_cluster_genoB_UpInLowSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoB_UpInLowSal

###genotype D Up in High Salinities
####create gene list for input in topGO
geneList_cluster_genoD_UpInHighSal = factor(as.integer(geneUniverse %in% RQ1e2_genoD_UpInHighSal))
names(geneList_cluster_genoD_UpInHighSal) = geneUniverse
str(geneList_cluster_genoD_UpInHighSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoD_UpInHighSal = new("topGOdata", ontology="BP", 
                                          allGenes=geneList_cluster_genoD_UpInHighSal, 
                                          annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoD_UpInHighSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoD_UpInHighSal = new("topGOdata", ontology="MF", 
                                          allGenes=geneList_cluster_genoD_UpInHighSal, 
                                          annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoD_UpInHighSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoD_UpInHighSal = new("topGOdata", ontology="CC", 
                                          allGenes=geneList_cluster_genoD_UpInHighSal, 
                                          annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoD_UpInHighSal
####run Fisher's exact test
resultFisher_BP_cluster_genoD_UpInHighSal = runTest(GOdata_BP_cluster_genoD_UpInHighSal, 
                                                    algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoD_UpInHighSal
resultFisher_MF_cluster_genoD_UpInHighSal = runTest(GOdata_MF_cluster_genoD_UpInHighSal, 
                                                    algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoD_UpInHighSal
resultFisher_CC_cluster_genoD_UpInHighSal = runTest(GOdata_CC_cluster_genoD_UpInHighSal, 
                                                    algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoD_UpInHighSal
####extract the significant GO terms
allRes_BP_cluster_genoD_UpInHighSal = GenTable(GOdata_BP_cluster_genoD_UpInHighSal, 
                                               classic = resultFisher_BP_cluster_genoD_UpInHighSal, 
                                                orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoD_UpInHighSal
allRes_MF_cluster_genoD_UpInHighSal = GenTable(GOdata_MF_cluster_genoD_UpInHighSal, 
                                               classic = resultFisher_MF_cluster_genoD_UpInHighSal, 
                                                orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoD_UpInHighSal
allRes_CC_cluster_genoD_UpInHighSal = GenTable(GOdata_CC_cluster_genoD_UpInHighSal, 
                                               classic = resultFisher_CC_cluster_genoD_UpInHighSal, 
                                                orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoD_UpInHighSal

###genotype D Up in Low Salinities
####create gene list for input in topGO
geneList_cluster_genoD_UpInLowSal = factor(as.integer(geneUniverse %in% RQ1e2_genoD_UpInLowSal))
names(geneList_cluster_genoD_UpInLowSal) = geneUniverse
str(geneList_cluster_genoD_UpInLowSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoD_UpInLowSal = new("topGOdata", ontology="BP", 
                                         allGenes=geneList_cluster_genoD_UpInLowSal, 
                                         annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoD_UpInLowSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoD_UpInLowSal = new("topGOdata", ontology="MF", 
                                         allGenes=geneList_cluster_genoD_UpInLowSal, 
                                         annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoD_UpInLowSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoD_UpInLowSal = new("topGOdata", ontology="CC", 
                                         allGenes=geneList_cluster_genoD_UpInLowSal, 
                                         annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoD_UpInLowSal
####run Fisher's exact test
resultFisher_BP_cluster_genoD_UpInLowSal = runTest(GOdata_BP_cluster_genoD_UpInLowSal, 
                                                   algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoD_UpInLowSal
resultFisher_MF_cluster_genoD_UpInLowSal = runTest(GOdata_MF_cluster_genoD_UpInLowSal, 
                                                   algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoD_UpInLowSal
resultFisher_CC_cluster_genoD_UpInLowSal = runTest(GOdata_CC_cluster_genoD_UpInLowSal, 
                                                   algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoD_UpInLowSal
####extract the significant GO terms
allRes_BP_cluster_genoD_UpInLowSal = GenTable(GOdata_BP_cluster_genoD_UpInLowSal, 
                                              classic = resultFisher_BP_cluster_genoD_UpInLowSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoD_UpInLowSal
allRes_MF_cluster_genoD_UpInLowSal = GenTable(GOdata_MF_cluster_genoD_UpInLowSal, 
                                              classic = resultFisher_MF_cluster_genoD_UpInLowSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoD_UpInLowSal
allRes_CC_cluster_genoD_UpInLowSal = GenTable(GOdata_CC_cluster_genoD_UpInLowSal, 
                                              classic = resultFisher_CC_cluster_genoD_UpInLowSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoD_UpInLowSal

###genotype F Up in High Salinities
####create gene list for input in topGO
geneList_cluster_genoF_UpInHighSal = factor(as.integer(geneUniverse %in% RQ1e2_genoF_UpInHighSal))
names(geneList_cluster_genoF_UpInHighSal) = geneUniverse
str(geneList_cluster_genoF_UpInHighSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoF_UpInHighSal = new("topGOdata", ontology="BP", 
                                          allGenes=geneList_cluster_genoF_UpInHighSal, 
                                          annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoF_UpInHighSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoF_UpInHighSal = new("topGOdata", ontology="MF", 
                                          allGenes=geneList_cluster_genoF_UpInHighSal, 
                                          annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoF_UpInHighSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoF_UpInHighSal = new("topGOdata", ontology="CC", 
                                          allGenes=geneList_cluster_genoF_UpInHighSal, 
                                          annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoF_UpInHighSal
####run Fisher's exact test
resultFisher_BP_cluster_genoF_UpInHighSal = runTest(GOdata_BP_cluster_genoF_UpInHighSal, 
                                                    algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoF_UpInHighSal
resultFisher_MF_cluster_genoF_UpInHighSal = runTest(GOdata_MF_cluster_genoF_UpInHighSal, 
                                                    algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoF_UpInHighSal
resultFisher_CC_cluster_genoF_UpInHighSal = runTest(GOdata_CC_cluster_genoF_UpInHighSal, 
                                                    algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoF_UpInHighSal
####extract the significant GO terms
allRes_BP_cluster_genoF_UpInHighSal = GenTable(GOdata_BP_cluster_genoF_UpInHighSal, 
                                               classic = resultFisher_BP_cluster_genoF_UpInHighSal, 
                                                orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoF_UpInHighSal
allRes_MF_cluster_genoF_UpInHighSal = GenTable(GOdata_MF_cluster_genoF_UpInHighSal, 
                                               classic = resultFisher_MF_cluster_genoF_UpInHighSal, 
                                                orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoF_UpInHighSal
allRes_CC_cluster_genoF_UpInHighSal = GenTable(GOdata_CC_cluster_genoF_UpInHighSal, 
                                               classic = resultFisher_CC_cluster_genoF_UpInHighSal, 
                                                orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoF_UpInHighSal

###genotype F Up in Low Salinities
####create gene list for input in topGO
geneList_cluster_genoF_UpInLowSal = factor(as.integer(geneUniverse %in% RQ1e2_genoF_UpInLowSal))
names(geneList_cluster_genoF_UpInLowSal) = geneUniverse
str(geneList_cluster_genoF_UpInLowSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoF_UpInLowSal = new("topGOdata", ontology="BP", 
                                         allGenes=geneList_cluster_genoF_UpInLowSal, 
                                         annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoF_UpInLowSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoF_UpInLowSal = new("topGOdata", ontology="MF", 
                                         allGenes=geneList_cluster_genoF_UpInLowSal, 
                                         annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoF_UpInLowSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoF_UpInLowSal = new("topGOdata", ontology="CC", 
                                         allGenes=geneList_cluster_genoF_UpInLowSal, 
                                         annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoF_UpInLowSal
####run Fisher's exact test
resultFisher_BP_cluster_genoF_UpInLowSal = runTest(GOdata_BP_cluster_genoF_UpInLowSal, 
                                                   algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoF_UpInLowSal
resultFisher_MF_cluster_genoF_UpInLowSal = runTest(GOdata_MF_cluster_genoF_UpInLowSal, 
                                                   algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoF_UpInLowSal
resultFisher_CC_cluster_genoF_UpInLowSal = runTest(GOdata_CC_cluster_genoF_UpInLowSal, 
                                                   algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoF_UpInLowSal
####extract the significant GO terms
allRes_BP_cluster_genoF_UpInLowSal = GenTable(GOdata_BP_cluster_genoF_UpInLowSal, 
                                              classic = resultFisher_BP_cluster_genoF_UpInLowSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoF_UpInLowSal
allRes_MF_cluster_genoF_UpInLowSal = GenTable(GOdata_MF_cluster_genoF_UpInLowSal, 
                                              classic = resultFisher_MF_cluster_genoF_UpInLowSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoF_UpInLowSal
allRes_CC_cluster_genoF_UpInLowSal = GenTable(GOdata_CC_cluster_genoF_UpInLowSal, 
                                              classic = resultFisher_CC_cluster_genoF_UpInLowSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoF_UpInLowSal

###genotype I Up in High Salinities
####create gene list for input in topGO
geneList_cluster_genoI_UpInHighSal = factor(as.integer(geneUniverse %in% RQ1e2_genoI_UpInHighSal))
names(geneList_cluster_genoI_UpInHighSal) = geneUniverse
str(geneList_cluster_genoI_UpInHighSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoI_UpInHighSal = new("topGOdata", ontology="BP", 
                                          allGenes=geneList_cluster_genoI_UpInHighSal, 
                                          annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoI_UpInHighSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoI_UpInHighSal = new("topGOdata", ontology="MF", 
                                          allGenes=geneList_cluster_genoI_UpInHighSal, 
                                          annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoI_UpInHighSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoI_UpInHighSal = new("topGOdata", ontology="CC", 
                                          allGenes=geneList_cluster_genoI_UpInHighSal, 
                                          annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoI_UpInHighSal
####run Fisher's exact test
resultFisher_BP_cluster_genoI_UpInHighSal = runTest(GOdata_BP_cluster_genoI_UpInHighSal, 
                                                    algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoI_UpInHighSal
resultFisher_MF_cluster_genoI_UpInHighSal = runTest(GOdata_MF_cluster_genoI_UpInHighSal, 
                                                    algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoI_UpInHighSal
resultFisher_CC_cluster_genoI_UpInHighSal = runTest(GOdata_CC_cluster_genoI_UpInHighSal, 
                                                    algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoI_UpInHighSal
####extract the significant GO terms
allRes_BP_cluster_genoI_UpInHighSal = GenTable(GOdata_BP_cluster_genoI_UpInHighSal, 
                                               classic = resultFisher_BP_cluster_genoI_UpInHighSal, 
                                                orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoI_UpInHighSal
allRes_MF_cluster_genoI_UpInHighSal = GenTable(GOdata_MF_cluster_genoI_UpInHighSal, 
                                               classic = resultFisher_MF_cluster_genoI_UpInHighSal, 
                                                orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoI_UpInHighSal
allRes_CC_cluster_genoI_UpInHighSal = GenTable(GOdata_CC_cluster_genoI_UpInHighSal, 
                                               classic = resultFisher_CC_cluster_genoI_UpInHighSal, 
                                                orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoI_UpInHighSal

###genotype I Up in Low Salinities
####create gene list for input in topGO
geneList_cluster_genoI_UpInLowSal = factor(as.integer(geneUniverse %in% RQ1e2_genoI_UpInLowSal))
names(geneList_cluster_genoI_UpInLowSal) = geneUniverse
str(geneList_cluster_genoI_UpInLowSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoI_UpInLowSal = new("topGOdata", ontology="BP", 
                                         allGenes=geneList_cluster_genoI_UpInLowSal, 
                                         annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoI_UpInLowSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoI_UpInLowSal = new("topGOdata", ontology="MF", 
                                         allGenes=geneList_cluster_genoI_UpInLowSal, 
                                         annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoI_UpInLowSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoI_UpInLowSal = new("topGOdata", ontology="CC", 
                                         allGenes=geneList_cluster_genoI_UpInLowSal, 
                                         annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoI_UpInLowSal
####run Fisher's exact test
resultFisher_BP_cluster_genoI_UpInLowSal = runTest(GOdata_BP_cluster_genoI_UpInLowSal, 
                                                   algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoI_UpInLowSal
resultFisher_MF_cluster_genoI_UpInLowSal = runTest(GOdata_MF_cluster_genoI_UpInLowSal, 
                                                   algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoI_UpInLowSal
resultFisher_CC_cluster_genoI_UpInLowSal = runTest(GOdata_CC_cluster_genoI_UpInLowSal, 
                                                   algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoI_UpInLowSal
####extract the significant GO terms
allRes_BP_cluster_genoI_UpInLowSal = GenTable(GOdata_BP_cluster_genoI_UpInLowSal, 
                                              classic = resultFisher_BP_cluster_genoI_UpInLowSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoI_UpInLowSal
allRes_MF_cluster_genoI_UpInLowSal = GenTable(GOdata_MF_cluster_genoI_UpInLowSal, 
                                              classic = resultFisher_MF_cluster_genoI_UpInLowSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoI_UpInLowSal
allRes_CC_cluster_genoI_UpInLowSal = GenTable(GOdata_CC_cluster_genoI_UpInLowSal, 
                                              classic = resultFisher_CC_cluster_genoI_UpInLowSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoI_UpInLowSal

###genotype J Up in High Salinities
####create gene list for input in topGO
geneList_cluster_genoJ_UpInHighSal = factor(as.integer(geneUniverse %in% RQ1e2_genoJ_UpInHighSal))
names(geneList_cluster_genoJ_UpInHighSal) = geneUniverse
str(geneList_cluster_genoJ_UpInHighSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoJ_UpInHighSal = new("topGOdata", ontology="BP", 
                                          allGenes=geneList_cluster_genoJ_UpInHighSal, 
                                          annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoJ_UpInHighSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoJ_UpInHighSal = new("topGOdata", ontology="MF", 
                                          allGenes=geneList_cluster_genoJ_UpInHighSal, 
                                          annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoJ_UpInHighSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoJ_UpInHighSal = new("topGOdata", ontology="CC", 
                                          allGenes=geneList_cluster_genoJ_UpInHighSal, 
                                          annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoJ_UpInHighSal
####run Fisher's exact test
resultFisher_BP_cluster_genoJ_UpInHighSal = runTest(GOdata_BP_cluster_genoJ_UpInHighSal, 
                                                    algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoJ_UpInHighSal
resultFisher_MF_cluster_genoJ_UpInHighSal = runTest(GOdata_MF_cluster_genoJ_UpInHighSal, 
                                                    algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoJ_UpInHighSal
resultFisher_CC_cluster_genoJ_UpInHighSal = runTest(GOdata_CC_cluster_genoJ_UpInHighSal, 
                                                    algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoJ_UpInHighSal
####extract the significant GO terms
allRes_BP_cluster_genoJ_UpInHighSal = GenTable(GOdata_BP_cluster_genoJ_UpInHighSal, 
                                               classic = resultFisher_BP_cluster_genoJ_UpInHighSal, 
                                                orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoJ_UpInHighSal
allRes_MF_cluster_genoJ_UpInHighSal = GenTable(GOdata_MF_cluster_genoJ_UpInHighSal, 
                                               classic = resultFisher_MF_cluster_genoJ_UpInHighSal, 
                                                orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoJ_UpInHighSal
allRes_CC_cluster_genoJ_UpInHighSal = GenTable(GOdata_CC_cluster_genoJ_UpInHighSal, 
                                               classic = resultFisher_CC_cluster_genoJ_UpInHighSal, 
                                                orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoJ_UpInHighSal

###genotype J Up in Low Salinities
####create gene list for input in topGO
geneList_cluster_genoJ_UpInLowSal = factor(as.integer(geneUniverse %in% RQ1e2_genoJ_UpInLowSal))
names(geneList_cluster_genoJ_UpInLowSal) = geneUniverse
str(geneList_cluster_genoJ_UpInLowSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoJ_UpInLowSal = new("topGOdata", ontology="BP", 
                                         allGenes=geneList_cluster_genoJ_UpInLowSal, 
                                         annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoJ_UpInLowSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoJ_UpInLowSal = new("topGOdata", ontology="MF", 
                                         allGenes=geneList_cluster_genoJ_UpInLowSal, 
                                         annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoJ_UpInLowSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoJ_UpInLowSal = new("topGOdata", ontology="CC", 
                                         allGenes=geneList_cluster_genoJ_UpInLowSal, 
                                         annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoJ_UpInLowSal
####run Fisher's exact test
resultFisher_BP_cluster_genoJ_UpInLowSal = runTest(GOdata_BP_cluster_genoJ_UpInLowSal, 
                                                   algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoJ_UpInLowSal
resultFisher_MF_cluster_genoJ_UpInLowSal = runTest(GOdata_MF_cluster_genoJ_UpInLowSal, 
                                                   algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoJ_UpInLowSal
resultFisher_CC_cluster_genoJ_UpInLowSal = runTest(GOdata_CC_cluster_genoJ_UpInLowSal, 
                                                   algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoJ_UpInLowSal
####extract the significant GO terms
allRes_BP_cluster_genoJ_UpInLowSal = GenTable(GOdata_BP_cluster_genoJ_UpInLowSal, 
                                              classic = resultFisher_BP_cluster_genoJ_UpInLowSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoJ_UpInLowSal
allRes_MF_cluster_genoJ_UpInLowSal = GenTable(GOdata_MF_cluster_genoJ_UpInLowSal, 
                                              classic = resultFisher_MF_cluster_genoJ_UpInLowSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoJ_UpInLowSal
allRes_CC_cluster_genoJ_UpInLowSal = GenTable(GOdata_CC_cluster_genoJ_UpInLowSal, 
                                              classic = resultFisher_CC_cluster_genoJ_UpInLowSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoJ_UpInLowSal

###genotype K Up in High Salinities
####create gene list for input in topGO
geneList_cluster_genoK_UpInHighSal = factor(as.integer(geneUniverse %in% RQ1e2_genoK_UpInHighSal))
names(geneList_cluster_genoK_UpInHighSal) = geneUniverse
str(geneList_cluster_genoK_UpInHighSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoK_UpInHighSal = new("topGOdata", ontology="BP", 
                                          allGenes=geneList_cluster_genoK_UpInHighSal, 
                                          annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoK_UpInHighSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoK_UpInHighSal = new("topGOdata", ontology="MF", 
                                          allGenes=geneList_cluster_genoK_UpInHighSal, 
                                          annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoK_UpInHighSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoK_UpInHighSal = new("topGOdata", ontology="CC", 
                                          allGenes=geneList_cluster_genoK_UpInHighSal, 
                                          annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoK_UpInHighSal
####run Fisher's exact test
resultFisher_BP_cluster_genoK_UpInHighSal = runTest(GOdata_BP_cluster_genoK_UpInHighSal, 
                                                    algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoK_UpInHighSal
resultFisher_MF_cluster_genoK_UpInHighSal = runTest(GOdata_MF_cluster_genoK_UpInHighSal, 
                                                    algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoK_UpInHighSal
resultFisher_CC_cluster_genoK_UpInHighSal = runTest(GOdata_CC_cluster_genoK_UpInHighSal, 
                                                    algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoK_UpInHighSal
####extract the significant GO terms
allRes_BP_cluster_genoK_UpInHighSal = GenTable(GOdata_BP_cluster_genoK_UpInHighSal, 
                                               classic = resultFisher_BP_cluster_genoK_UpInHighSal, 
                                                orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoK_UpInHighSal
allRes_MF_cluster_genoK_UpInHighSal = GenTable(GOdata_MF_cluster_genoK_UpInHighSal, 
                                               classic = resultFisher_MF_cluster_genoK_UpInHighSal, 
                                                orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoK_UpInHighSal
allRes_CC_cluster_genoK_UpInHighSal = GenTable(GOdata_CC_cluster_genoK_UpInHighSal, 
                                               classic = resultFisher_CC_cluster_genoK_UpInHighSal, 
                                                orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoK_UpInHighSal

###genotype K Up in Low Salinities
####create gene list for input in topGO
geneList_cluster_genoK_UpInLowSal = factor(as.integer(geneUniverse %in% RQ1e2_genoK_UpInLowSal))
names(geneList_cluster_genoK_UpInLowSal) = geneUniverse
str(geneList_cluster_genoK_UpInLowSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoK_UpInLowSal = new("topGOdata", ontology="BP", 
                                         allGenes=geneList_cluster_genoK_UpInLowSal, 
                                         annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoK_UpInLowSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoK_UpInLowSal = new("topGOdata", ontology="MF", 
                                         allGenes=geneList_cluster_genoK_UpInLowSal, 
                                         annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoK_UpInLowSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoK_UpInLowSal = new("topGOdata", ontology="CC", 
                                         allGenes=geneList_cluster_genoK_UpInLowSal, 
                                         annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoK_UpInLowSal
####run Fisher's exact test
resultFisher_BP_cluster_genoK_UpInLowSal = runTest(GOdata_BP_cluster_genoK_UpInLowSal, 
                                                   algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoK_UpInLowSal
resultFisher_MF_cluster_genoK_UpInLowSal = runTest(GOdata_MF_cluster_genoK_UpInLowSal, 
                                                   algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoK_UpInLowSal
resultFisher_CC_cluster_genoK_UpInLowSal = runTest(GOdata_CC_cluster_genoK_UpInLowSal, 
                                                   algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoK_UpInLowSal
####extract the significant GO terms
allRes_BP_cluster_genoK_UpInLowSal = GenTable(GOdata_BP_cluster_genoK_UpInLowSal, 
                                              classic = resultFisher_BP_cluster_genoK_UpInLowSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoK_UpInLowSal
allRes_MF_cluster_genoK_UpInLowSal = GenTable(GOdata_MF_cluster_genoK_UpInLowSal, 
                                              classic = resultFisher_MF_cluster_genoK_UpInLowSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoK_UpInLowSal
allRes_CC_cluster_genoK_UpInLowSal = GenTable(GOdata_CC_cluster_genoK_UpInLowSal, 
                                              classic = resultFisher_CC_cluster_genoK_UpInLowSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoK_UpInLowSal

###genotype P Up in High Salinities
####create gene list for input in topGO
geneList_cluster_genoP_UpInHighSal = factor(as.integer(geneUniverse %in% RQ1e2_genoP_UpInHighSal))
names(geneList_cluster_genoP_UpInHighSal) = geneUniverse
str(geneList_cluster_genoP_UpInHighSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoP_UpInHighSal = new("topGOdata", ontology="BP", 
                                          allGenes=geneList_cluster_genoP_UpInHighSal, 
                                          annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoP_UpInHighSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoP_UpInHighSal = new("topGOdata", ontology="MF", 
                                          allGenes=geneList_cluster_genoP_UpInHighSal, 
                                          annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoP_UpInHighSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoP_UpInHighSal = new("topGOdata", ontology="CC", 
                                          allGenes=geneList_cluster_genoP_UpInHighSal, 
                                          annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoP_UpInHighSal
####run Fisher's exact test
resultFisher_BP_cluster_genoP_UpInHighSal = runTest(GOdata_BP_cluster_genoP_UpInHighSal, 
                                                    algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoP_UpInHighSal
resultFisher_MF_cluster_genoP_UpInHighSal = runTest(GOdata_MF_cluster_genoP_UpInHighSal, 
                                                    algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoP_UpInHighSal
resultFisher_CC_cluster_genoP_UpInHighSal = runTest(GOdata_CC_cluster_genoP_UpInHighSal, 
                                                    algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoP_UpInHighSal
####extract the significant GO terms
allRes_BP_cluster_genoP_UpInHighSal = GenTable(GOdata_BP_cluster_genoP_UpInHighSal, 
                                               classic = resultFisher_BP_cluster_genoP_UpInHighSal, 
                                                orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoP_UpInHighSal
allRes_MF_cluster_genoP_UpInHighSal = GenTable(GOdata_MF_cluster_genoP_UpInHighSal, 
                                               classic = resultFisher_MF_cluster_genoP_UpInHighSal, 
                                                orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoP_UpInHighSal
allRes_CC_cluster_genoP_UpInHighSal = GenTable(GOdata_CC_cluster_genoP_UpInHighSal, 
                                               classic = resultFisher_CC_cluster_genoP_UpInHighSal, 
                                                orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoP_UpInHighSal

###genotype P Up in Low Salinities
####create gene list for input in topGO
geneList_cluster_genoP_UpInLowSal = factor(as.integer(geneUniverse %in% RQ1e2_genoP_UpInLowSal))
names(geneList_cluster_genoP_UpInLowSal) = geneUniverse
str(geneList_cluster_genoP_UpInLowSal)
####create a topGO object (for biological process GOs)
GOdata_BP_cluster_genoP_UpInLowSal = new("topGOdata", ontology="BP", 
                                         allGenes=geneList_cluster_genoP_UpInLowSal, 
                                         annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_BP_cluster_genoP_UpInLowSal
####create a topGO object (for molecular function GOs)
GOdata_MF_cluster_genoP_UpInLowSal = new("topGOdata", ontology="MF", 
                                         allGenes=geneList_cluster_genoP_UpInLowSal, 
                                         annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_cluster_genoP_UpInLowSal
####create a topGO object (for cellular component GOs)
GOdata_CC_cluster_genoP_UpInLowSal = new("topGOdata", ontology="CC", 
                                         allGenes=geneList_cluster_genoP_UpInLowSal, 
                                         annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_cluster_genoP_UpInLowSal
####run Fisher's exact test
resultFisher_BP_cluster_genoP_UpInLowSal = runTest(GOdata_BP_cluster_genoP_UpInLowSal, 
                                                   algorithm = "elim", statistic = "fisher")
resultFisher_BP_cluster_genoP_UpInLowSal
resultFisher_MF_cluster_genoP_UpInLowSal = runTest(GOdata_MF_cluster_genoP_UpInLowSal, 
                                                   algorithm = "elim", statistic = "fisher")
resultFisher_MF_cluster_genoP_UpInLowSal
resultFisher_CC_cluster_genoP_UpInLowSal = runTest(GOdata_CC_cluster_genoP_UpInLowSal, 
                                                   algorithm = "elim", statistic = "fisher")
resultFisher_CC_cluster_genoP_UpInLowSal
####extract the significant GO terms
allRes_BP_cluster_genoP_UpInLowSal = GenTable(GOdata_BP_cluster_genoP_UpInLowSal, 
                                              classic = resultFisher_BP_cluster_genoP_UpInLowSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_BP_cluster_genoP_UpInLowSal
allRes_MF_cluster_genoP_UpInLowSal = GenTable(GOdata_MF_cluster_genoP_UpInLowSal, 
                                              classic = resultFisher_MF_cluster_genoP_UpInLowSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_MF_cluster_genoP_UpInLowSal
allRes_CC_cluster_genoP_UpInLowSal = GenTable(GOdata_CC_cluster_genoP_UpInLowSal, 
                                              classic = resultFisher_CC_cluster_genoP_UpInLowSal, 
                                               orderBy = "elim", ranksOf = "elim", topNodes = 100)
allRes_CC_cluster_genoP_UpInLowSal

3.3.9 GO enrichment: GSEA analysis with CAMERA

Next, we performed GSEA GO enrichment using CAMERA.

We first separated the list of GO terms from InterProScan in the three categories: Biological Process (BP), Molecular Function (MF), and Cellular Component (CC). This allowed us to perform GSEA analysis on the three categories separately. To do this, we used the topGO package. TopGO adds additional GO terms to those provided by InterProScan. However, we only wanted to include the GO terms that were selected by InterProScan. Therefore, we first used the topGO package to obtain a full list of all GO terms associated with the DE genes in our dataset, we then separated these GO terms in three lists (BP, MF and CC), after which we selected only the GO terms detected by InterProScan:

# turn geneID2GO into GO2geneID
GO_terms_geneID2GO = unique(unlist(geneID2GO, use.names = FALSE)) 
GO2geneID_InterPro = unstack(subset(stack(geneID2GO), values%in%GO_terms_geneID2GO), ind~values)
## the GO2geneID_InterPro object contained lists of genes with a given GO term as determined by InterProScan 

# create gene list
genesOfInterest_allSign  = rownames(subset(OnlySignGenes_RQ1e2_ConStage, 
                                           rownames(OnlySignGenes_RQ1e2_ConStage)%in%geneUniverse))
geneList_allDE = factor(as.integer(geneUniverse %in% genesOfInterest_allSign))
names(geneList_allDE) = geneUniverse
str(geneList_allDE)

# subset GOs into BP, MF and CC processes
GOdata_BP_allDE = new("topGOdata", ontology="BP", allGenes=geneList_allDE, 
                      annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_MF_allDE = new("topGOdata", ontology="MF", allGenes=geneList_allDE, 
                      annot = annFUN.gene2GO, gene2GO = geneID2GO)
GOdata_CC_allDE = new("topGOdata", ontology="CC", allGenes=geneList_allDE, 
                      annot = annFUN.gene2GO, gene2GO = geneID2GO)

# extract all the GO terms for a given set
ug_BP = usedGO(GOdata_BP_allDE) 
ug_MF = usedGO(GOdata_MF_allDE) 
ug_CC = usedGO(GOdata_CC_allDE) 

# create a list of lists which contains the genes that are associated with a given GO term
## BP GO terms
GO2geneID_InterPro_BP = list()
for (GO in ug_BP){
  selection = unlist(GO2geneID_InterPro[names(GO2geneID_InterPro) %in% GO == TRUE], use.names = FALSE) 
  GO2geneID_InterPro_BP[[GO]] = selection
}

## MF GO terms
GO2geneID_InterPro_MF = list()
for (GO in ug_MF){
  selection = unlist(GO2geneID_InterPro[names(GO2geneID_InterPro) %in% GO == TRUE], use.names = FALSE) 
  GO2geneID_InterPro_MF[[GO]] = selection
}

## CC GO terms
GO2geneID_InterPro_CC= list()
for (GO in ug_CC){
  selection = unlist(GO2geneID_InterPro[names(GO2geneID_InterPro) %in% GO == TRUE], use.names = FALSE) 
  GO2geneID_InterPro_CC[[GO]] = selection
}

Next, we ran CAMERA for the average contrasts:

#CAMERA on biological process GOs for each average contrast
CAMERA_InterPro_BP_avg8vs16 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_BP, design=design, 
                                                    contrast=C_RQ1e2[,25]) 
CAMERA_InterPro_BP_avg16vs24 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_BP, design=design, 
                                                     contrast=C_RQ1e2[,26])
CAMERA_InterPro_BP_avg8vs24 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_BP, design=design, 
                                                    contrast=C_RQ1e2[,27]) 

#CAMERA on molecular function GOs for each average contrast
CAMERA_InterPro_MF_avg8vs16 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_MF, 
                                                    design=design, contrast=C_RQ1e2[,25]) 
CAMERA_InterPro_MF_avg16vs24 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_MF, 
                                                     design=design, contrast=C_RQ1e2[,26])
CAMERA_InterPro_MF_avg8vs24 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_MF, 
                                                    design=design, contrast=C_RQ1e2[,27])

#CAMERA on cellular component GOs for each average contrast
CAMERA_InterPro_CC_avg8vs16 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_CC, 
                                                    design=design, contrast=C_RQ1e2[,25]) 
CAMERA_InterPro_CC_avg16vs24 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_CC, 
                                                     design=design, contrast=C_RQ1e2[,26]) 
CAMERA_InterPro_CC_avg8vs24 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_CC, 
                                                    design=design, contrast=C_RQ1e2[,27])

We also ran CAMERA for each genotype separately (code only shown for genotype A). If you want to rerun the entire analysis in this document, you will have to apply below code for the other genotypes as well, because we will use the final output of all genotypes further downstream.

# genotype A
## CAMERA on biological process GOs for each average contrast
CAMERA_InterPro_BP_genoA_8vs16 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_BP, 
                                                       design=design, contrast=C_RQ1e2[,1]) 
CAMERA_InterPro_BP_genoA_16vs24 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_BP, 
                                                        design=design, contrast=C_RQ1e2[,2]) 
CAMERA_InterPro_BP_genoA_8vs24 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_BP, 
                                                       design=design, contrast=C_RQ1e2[,3]) 
## CAMERA on molecular function GOs for each average contrast
CAMERA_InterPro_MF_genoA_8vs16 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_MF, 
                                                       design=design, contrast=C_RQ1e2[,1]) 
CAMERA_InterPro_MF_genoA_16vs24 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_MF, 
                                                        design=design, contrast=C_RQ1e2[,2]) 
CAMERA_InterPro_MF_genoA_8vs24 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_MF, 
                                                       design=design, contrast=C_RQ1e2[,3]) 
## CAMERA on cellular componenmt GOs for each average contrast
CAMERA_InterPro_CC_genoA_8vs16 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_CC, 
                                                       design=design, contrast=C_RQ1e2[,1]) 
CAMERA_InterPro_CC_genoA_16vs24 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_CC, 
                                                        design=design, contrast=C_RQ1e2[,2]) 
CAMERA_InterPro_CC_genoA_8vs24 = edgeR::camera.DGEList(y=y, index=GO2geneID_InterPro_CC, 
                                                       design=design, contrast=C_RQ1e2[,3])

REVIGO for the above analyses was accessed on July 6th 2021, and used the Gene Ontology database of May 1st 2021 and the UniProt-to-GO mapping database from April 9th 2021.

3.3.10 GO enrichment: number of enriched GO terms

Next, we created an overview of the GO enrichment results by plotting barplots.

First, topGO results:

# calculate number of significantly enriched GO terms ORA (topGO)
ORA_BP_UpInHighSal_sign = length(which(lapply(allRes_BP_cluster_UpInHighSal_elim$classic,as.numeric) < 0.05))
ORA_BP_UpInLowSal_sign = length(which(lapply(allRes_BP_cluster_UpInLowSal_elim$classic,as.numeric) < 0.05))
ORA_MF_UpInHighSal_sign = length(which(lapply(allRes_MF_cluster_UpInHighSal_elim$classic,as.numeric) < 0.05))
ORA_MF_UpInLowSal_sign = length(which(lapply(allRes_MF_cluster_UpInLowSal_elim$classic,as.numeric) < 0.05))
ORA_CC_UpInHighSal_sign = length(which(lapply(allRes_CC_cluster_UpInHighSal_elim$classic,as.numeric) < 0.05))
ORA_CC_UpInLowSal_sign = length(which(lapply(allRes_CC_cluster_UpInLowSal_elim$classic,as.numeric) < 0.05))

ORA_BP_UpInHighSal_genoA_sign = length(which(lapply(allRes_BP_cluster_genoA_UpInHighSal$classic,
                                                    as.numeric) < 0.05))
ORA_BP_UpInLowSal_genoA_sign = length(which(lapply(allRes_BP_cluster_genoA_UpInLowSal$classic,
                                                   as.numeric) < 0.05))
ORA_MF_UpInHighSal_genoA_sign = length(which(lapply(allRes_MF_cluster_genoA_UpInHighSal$classic,
                                                    as.numeric) < 0.05))
ORA_MF_UpInLowSal_genoA_sign = length(which(lapply(allRes_MF_cluster_genoA_UpInLowSal$classic,
                                                   as.numeric) < 0.05))
ORA_CC_UpInHighSal_genoA_sign = length(which(lapply(allRes_CC_cluster_genoA_UpInHighSal$classic,
                                                    as.numeric) < 0.05))
ORA_CC_UpInLowSal_genoA_sign = length(which(lapply(allRes_CC_cluster_genoA_UpInLowSal$classic,
                                                   as.numeric) < 0.05))

ORA_BP_UpInHighSal_genoB_sign = length(which(lapply(allRes_BP_cluster_genoB_UpInHighSal$classic,
                                                    as.numeric) < 0.05))
ORA_BP_UpInLowSal_genoB_sign = length(which(lapply(allRes_BP_cluster_genoB_UpInLowSal$classic,
                                                   as.numeric) < 0.05))
ORA_MF_UpInHighSal_genoB_sign = length(which(lapply(allRes_MF_cluster_genoB_UpInHighSal$classic,
                                                    as.numeric) < 0.05))
ORA_MF_UpInLowSal_genoB_sign = length(which(lapply(allRes_MF_cluster_genoB_UpInLowSal$classic,
                                                   as.numeric) < 0.05))
ORA_CC_UpInHighSal_genoB_sign = length(which(lapply(allRes_CC_cluster_genoB_UpInHighSal$classic,
                                                    as.numeric) < 0.05))
ORA_CC_UpInLowSal_genoB_sign = length(which(lapply(allRes_CC_cluster_genoB_UpInLowSal$classic,
                                                   as.numeric) < 0.05))

ORA_BP_UpInHighSal_genoD_sign = length(which(lapply(allRes_BP_cluster_genoD_UpInHighSal$classic,
                                                    as.numeric) < 0.05))
ORA_BP_UpInLowSal_genoD_sign = length(which(lapply(allRes_BP_cluster_genoD_UpInLowSal$classic,
                                                   as.numeric) < 0.05))
ORA_MF_UpInHighSal_genoD_sign = length(which(lapply(allRes_MF_cluster_genoD_UpInHighSal$classic,
                                                    as.numeric) < 0.05))
ORA_MF_UpInLowSal_genoD_sign = length(which(lapply(allRes_MF_cluster_genoD_UpInLowSal$classic,
                                                   as.numeric) < 0.05))
ORA_CC_UpInHighSal_genoD_sign = length(which(lapply(allRes_CC_cluster_genoD_UpInHighSal$classic,
                                                    as.numeric) < 0.05))
ORA_CC_UpInLowSal_genoD_sign = length(which(lapply(allRes_CC_cluster_genoD_UpInLowSal$classic,
                                                   as.numeric) < 0.05))

ORA_BP_UpInHighSal_genoF_sign = length(which(lapply(allRes_BP_cluster_genoF_UpInHighSal$classic,
                                                    as.numeric) < 0.05))
ORA_BP_UpInLowSal_genoF_sign = length(which(lapply(allRes_BP_cluster_genoF_UpInLowSal$classic,
                                                   as.numeric) < 0.05))
ORA_MF_UpInHighSal_genoF_sign = length(which(lapply(allRes_MF_cluster_genoF_UpInHighSal$classic,
                                                    as.numeric) < 0.05))
ORA_MF_UpInLowSal_genoF_sign = length(which(lapply(allRes_MF_cluster_genoF_UpInLowSal$classic,
                                                   as.numeric) < 0.05))
ORA_CC_UpInHighSal_genoF_sign = length(which(lapply(allRes_CC_cluster_genoF_UpInHighSal$classic,
                                                    as.numeric) < 0.05))
ORA_CC_UpInLowSal_genoF_sign = length(which(lapply(allRes_CC_cluster_genoF_UpInLowSal$classic,
                                                   as.numeric) < 0.05))

ORA_BP_UpInHighSal_genoI_sign = length(which(lapply(allRes_BP_cluster_genoI_UpInHighSal$classic,
                                                    as.numeric) < 0.05))
ORA_BP_UpInLowSal_genoI_sign = length(which(lapply(allRes_BP_cluster_genoI_UpInLowSal$classic,
                                                   as.numeric) < 0.05))
ORA_MF_UpInHighSal_genoI_sign = length(which(lapply(allRes_MF_cluster_genoI_UpInHighSal$classic,
                                                    as.numeric) < 0.05))
ORA_MF_UpInLowSal_genoI_sign = length(which(lapply(allRes_MF_cluster_genoI_UpInLowSal$classic,
                                                   as.numeric) < 0.05))
ORA_CC_UpInHighSal_genoI_sign = length(which(lapply(allRes_CC_cluster_genoI_UpInHighSal$classic,
                                                    as.numeric) < 0.05))
ORA_CC_UpInLowSal_genoI_sign = length(which(lapply(allRes_CC_cluster_genoI_UpInLowSal$classic,
                                                   as.numeric) < 0.05))

ORA_BP_UpInHighSal_genoJ_sign = length(which(lapply(allRes_BP_cluster_genoJ_UpInHighSal$classic,
                                                    as.numeric) < 0.05))
ORA_BP_UpInLowSal_genoJ_sign = length(which(lapply(allRes_BP_cluster_genoJ_UpInLowSal$classic,
                                                   as.numeric) < 0.05))
ORA_MF_UpInHighSal_genoJ_sign = length(which(lapply(allRes_MF_cluster_genoJ_UpInHighSal$classic,
                                                    as.numeric) < 0.05))
ORA_MF_UpInLowSal_genoJ_sign = length(which(lapply(allRes_MF_cluster_genoJ_UpInLowSal$classic,
                                                   as.numeric) < 0.05))
ORA_CC_UpInHighSal_genoJ_sign = length(which(lapply(allRes_CC_cluster_genoJ_UpInHighSal$classic,
                                                    as.numeric) < 0.05))
ORA_CC_UpInLowSal_genoJ_sign = length(which(lapply(allRes_CC_cluster_genoJ_UpInLowSal$classic,
                                                   as.numeric) < 0.05))

ORA_BP_UpInHighSal_genoK_sign = length(which(lapply(allRes_BP_cluster_genoK_UpInHighSal$classic,
                                                    as.numeric) < 0.05))
ORA_BP_UpInLowSal_genoK_sign = length(which(lapply(allRes_BP_cluster_genoK_UpInLowSal$classic,
                                                   as.numeric) < 0.05))
ORA_MF_UpInHighSal_genoK_sign = length(which(lapply(allRes_MF_cluster_genoK_UpInHighSal$classic,
                                                    as.numeric) < 0.05))
ORA_MF_UpInLowSal_genoK_sign = length(which(lapply(allRes_MF_cluster_genoK_UpInLowSal$classic,
                                                   as.numeric) < 0.05))
ORA_CC_UpInHighSal_genoK_sign = length(which(lapply(allRes_CC_cluster_genoK_UpInHighSal$classic,
                                                    as.numeric) < 0.05))
ORA_CC_UpInLowSal_genoK_sign = length(which(lapply(allRes_CC_cluster_genoK_UpInLowSal$classic,
                                                   as.numeric) < 0.05))

ORA_BP_UpInHighSal_genoP_sign = length(which(lapply(allRes_BP_cluster_genoP_UpInHighSal$classic,
                                                    as.numeric) < 0.05))
ORA_BP_UpInLowSal_genoP_sign = length(which(lapply(allRes_BP_cluster_genoP_UpInLowSal$classic,
                                                   as.numeric) < 0.05))
ORA_MF_UpInHighSal_genoP_sign = length(which(lapply(allRes_MF_cluster_genoP_UpInHighSal$classic,
                                                    as.numeric) < 0.05))
ORA_MF_UpInLowSal_genoP_sign = length(which(lapply(allRes_MF_cluster_genoP_UpInLowSal$classic,
                                                   as.numeric) < 0.05))
ORA_CC_UpInHighSal_genoP_sign = length(which(lapply(allRes_CC_cluster_genoP_UpInHighSal$classic,
                                                    as.numeric) < 0.05))
ORA_CC_UpInLowSal_genoP_sign = length(which(lapply(allRes_CC_cluster_genoP_UpInLowSal$classic,
                                                   as.numeric) < 0.05))

# combine data
ORA_UpInHighSal_SignNum = -sum(ORA_BP_UpInHighSal_sign,ORA_MF_UpInHighSal_sign,ORA_CC_UpInHighSal_sign)
ORA_UpInLowSal_SignNum = sum(ORA_BP_UpInLowSal_sign,ORA_MF_UpInLowSal_sign,ORA_CC_UpInLowSal_sign)
ORA_UpInHighSal_genoA_SignNum = -sum(ORA_BP_UpInHighSal_genoA_sign,ORA_MF_UpInHighSal_genoA_sign,
                                    ORA_CC_UpInHighSal_genoA_sign)
ORA_UpInLowSal_genoA_SignNum = sum(ORA_BP_UpInLowSal_genoA_sign,ORA_MF_UpInLowSal_genoA_sign,
                                   ORA_CC_UpInLowSal_genoA_sign)
ORA_UpInHighSal_genoB_SignNum = -sum(ORA_BP_UpInHighSal_genoB_sign,ORA_MF_UpInHighSal_genoB_sign,
                                     ORA_CC_UpInHighSal_genoB_sign)
ORA_UpInLowSal_genoB_SignNum = sum(ORA_BP_UpInLowSal_genoB_sign,ORA_MF_UpInLowSal_genoB_sign,
                                   ORA_CC_UpInLowSal_genoB_sign)
ORA_UpInHighSal_genoD_SignNum = -sum(ORA_BP_UpInHighSal_genoD_sign,ORA_MF_UpInHighSal_genoD_sign,
                                     ORA_CC_UpInHighSal_genoD_sign)
ORA_UpInLowSal_genoD_SignNum = sum(ORA_BP_UpInLowSal_genoD_sign,ORA_MF_UpInLowSal_genoD_sign,
                                   ORA_CC_UpInLowSal_genoD_sign)
ORA_UpInHighSal_genoF_SignNum = -sum(ORA_BP_UpInHighSal_genoF_sign,ORA_MF_UpInHighSal_genoF_sign,
                                     ORA_CC_UpInHighSal_genoF_sign)
ORA_UpInLowSal_genoF_SignNum = sum(ORA_BP_UpInLowSal_genoF_sign,ORA_MF_UpInLowSal_genoF_sign,
                                   ORA_CC_UpInLowSal_genoF_sign)
ORA_UpInHighSal_genoI_SignNum = -sum(ORA_BP_UpInHighSal_genoI_sign,ORA_MF_UpInHighSal_genoI_sign,
                                     ORA_CC_UpInHighSal_genoI_sign)
ORA_UpInLowSal_genoI_SignNum = sum(ORA_BP_UpInLowSal_genoI_sign,ORA_MF_UpInLowSal_genoI_sign,
                                   ORA_CC_UpInLowSal_genoI_sign)
ORA_UpInHighSal_genoJ_SignNum = -sum(ORA_BP_UpInHighSal_genoJ_sign,ORA_MF_UpInHighSal_genoJ_sign,
                                     ORA_CC_UpInHighSal_genoJ_sign)
ORA_UpInLowSal_genoJ_SignNum = sum(ORA_BP_UpInLowSal_genoJ_sign,ORA_MF_UpInLowSal_genoJ_sign,
                                   ORA_CC_UpInLowSal_genoJ_sign)
ORA_UpInHighSal_genoK_SignNum = -sum(ORA_BP_UpInHighSal_genoK_sign,ORA_MF_UpInHighSal_genoK_sign,
                                     ORA_CC_UpInHighSal_genoK_sign)
ORA_UpInLowSal_genoK_SignNum = sum(ORA_BP_UpInLowSal_genoK_sign,ORA_MF_UpInLowSal_genoK_sign,
                                   ORA_CC_UpInLowSal_genoK_sign)
ORA_UpInHighSal_genoP_SignNum = -sum(ORA_BP_UpInHighSal_genoP_sign,ORA_MF_UpInHighSal_genoP_sign,
                                     ORA_CC_UpInHighSal_genoP_sign)
ORA_UpInLowSal_genoP_SignNum = sum(ORA_BP_UpInLowSal_genoP_sign,ORA_MF_UpInLowSal_genoP_sign,
                                   ORA_CC_UpInLowSal_genoP_sign)

# create data frame for ggplot
values = c(ORA_UpInHighSal_SignNum,ORA_UpInHighSal_genoA_SignNum,
           ORA_UpInHighSal_genoB_SignNum,ORA_UpInHighSal_genoD_SignNum,
           ORA_UpInHighSal_genoF_SignNum,ORA_UpInHighSal_genoI_SignNum,
           ORA_UpInHighSal_genoJ_SignNum,ORA_UpInHighSal_genoK_SignNum,
           ORA_UpInHighSal_genoP_SignNum,
           ORA_UpInLowSal_SignNum,ORA_UpInLowSal_genoA_SignNum,
           ORA_UpInLowSal_genoB_SignNum,ORA_UpInLowSal_genoD_SignNum,
           ORA_UpInLowSal_genoF_SignNum,ORA_UpInLowSal_genoI_SignNum,
           ORA_UpInLowSal_genoJ_SignNum,ORA_UpInLowSal_genoK_SignNum,
           ORA_UpInLowSal_genoP_SignNum)
meta = c('average','genotype A', 'genotype B', 'genotype D', 'genotype F', 
         'genotype I', 'genotype J', 'genotype K', 'genotype P')
df_ORA = as.data.frame(cbind(values,meta))

 #reorder data for plotting
df_ORA$meta = factor(df_ORA$meta, levels = c('average','genotype P','genotype K','genotype J',
                                             'genotype I','genotype F','genotype D','genotype B','genotype A'))

# plot barplot
g = ggplot(df_ORA, aes(x = meta, y = as.numeric(values))) +
  geom_bar(stat = "identity", position = "identity", fill = 'black',
           color = "white") + coord_flip() +
  scale_x_discrete(limits = rev(levels(x))) +
  ylab("Number of GO terms") + 
  theme_test()
g

Next, CAMERA results:

# calculate number of significantly enriched terms GO enrichment analysis CAMERA Interpro (distinguishing between up- and downregulated GO terms)

## combine relevant data frames in data frame list
df_summary =  list(CAMERA_InterPro_BP_avg8vs16, CAMERA_InterPro_BP_avg16vs24, CAMERA_InterPro_BP_avg8vs24,
                   CAMERA_InterPro_MF_avg8vs16, CAMERA_InterPro_MF_avg16vs24, CAMERA_InterPro_MF_avg8vs24,
                   CAMERA_InterPro_CC_avg8vs16, CAMERA_InterPro_CC_avg16vs24, CAMERA_InterPro_CC_avg8vs24,
                   CAMERA_InterPro_BP_genoA_8vs16, CAMERA_InterPro_BP_genoA_16vs24,
                   CAMERA_InterPro_BP_genoA_8vs24, CAMERA_InterPro_MF_genoA_8vs16,
                   CAMERA_InterPro_MF_genoA_16vs24, CAMERA_InterPro_MF_genoA_8vs24,
                   CAMERA_InterPro_CC_genoA_8vs16, CAMERA_InterPro_CC_genoA_16vs24,
                   CAMERA_InterPro_CC_genoA_8vs24, CAMERA_InterPro_BP_genoB_8vs16,
                   CAMERA_InterPro_BP_genoB_16vs24, CAMERA_InterPro_BP_genoB_8vs24,
                   CAMERA_InterPro_MF_genoB_8vs16, CAMERA_InterPro_MF_genoB_16vs24,
                   CAMERA_InterPro_MF_genoB_8vs24, CAMERA_InterPro_CC_genoB_8vs16,
                   CAMERA_InterPro_CC_genoB_16vs24, CAMERA_InterPro_CC_genoB_8vs24,
                   CAMERA_InterPro_BP_genoD_8vs16, CAMERA_InterPro_BP_genoD_16vs24,
                   CAMERA_InterPro_BP_genoD_8vs24, CAMERA_InterPro_MF_genoD_8vs16,
                   CAMERA_InterPro_MF_genoD_16vs24, CAMERA_InterPro_MF_genoD_8vs24,
                   CAMERA_InterPro_CC_genoD_8vs16, CAMERA_InterPro_CC_genoD_16vs24,
                   CAMERA_InterPro_CC_genoD_8vs24, CAMERA_InterPro_BP_genoF_8vs16,
                   CAMERA_InterPro_BP_genoF_16vs24, CAMERA_InterPro_BP_genoF_8vs24,
                   CAMERA_InterPro_MF_genoF_8vs16, CAMERA_InterPro_MF_genoF_16vs24,
                   CAMERA_InterPro_MF_genoF_8vs24, CAMERA_InterPro_CC_genoF_8vs16,
                   CAMERA_InterPro_CC_genoF_16vs24, CAMERA_InterPro_CC_genoF_8vs24,
                   CAMERA_InterPro_BP_genoI_8vs16, CAMERA_InterPro_BP_genoI_16vs24,
                   CAMERA_InterPro_BP_genoI_8vs24, CAMERA_InterPro_MF_genoI_8vs16,
                   CAMERA_InterPro_MF_genoI_16vs24, CAMERA_InterPro_MF_genoI_8vs24,
                   CAMERA_InterPro_CC_genoI_8vs16, CAMERA_InterPro_CC_genoI_16vs24,
                   CAMERA_InterPro_CC_genoI_8vs24, CAMERA_InterPro_BP_genoJ_8vs16,
                   CAMERA_InterPro_BP_genoJ_16vs24, CAMERA_InterPro_BP_genoJ_8vs24,
                   CAMERA_InterPro_MF_genoJ_8vs16, CAMERA_InterPro_MF_genoJ_16vs24,
                   CAMERA_InterPro_MF_genoJ_8vs24, CAMERA_InterPro_CC_genoJ_8vs16,
                   CAMERA_InterPro_CC_genoJ_16vs24, CAMERA_InterPro_CC_genoJ_8vs24,
                   CAMERA_InterPro_BP_genoK_8vs16, CAMERA_InterPro_BP_genoK_16vs24,
                   CAMERA_InterPro_BP_genoK_8vs24, CAMERA_InterPro_MF_genoK_8vs16,
                   CAMERA_InterPro_MF_genoK_16vs24, CAMERA_InterPro_MF_genoK_8vs24,
                   CAMERA_InterPro_CC_genoK_8vs16, CAMERA_InterPro_CC_genoK_16vs24,
                   CAMERA_InterPro_CC_genoK_8vs24, CAMERA_InterPro_BP_genoP_8vs16,
                   CAMERA_InterPro_BP_genoP_16vs24, CAMERA_InterPro_BP_genoP_8vs24,
                   CAMERA_InterPro_MF_genoP_8vs16, CAMERA_InterPro_MF_genoP_16vs24,
                   CAMERA_InterPro_MF_genoP_8vs24, CAMERA_InterPro_CC_genoP_8vs16,
                   CAMERA_InterPro_CC_genoP_16vs24, CAMERA_InterPro_CC_genoP_8vs24)

# give names to the data frames in the list
names(df_summary) = c('BP_avg8vs16', 'BP_avg16vs24', 'BP_avg8vs24',
                      'MF_avg8vs16', 'MF_avg16vs24', 'MF_avg8vs24',
                      'CC_avg8vs16', 'CC_avg16vs24', 'CC_avg8vs24',
                      'BP_genoA_8vs16', 'BP_genoA_16vs24', 'BP_genoA_8vs24',
                      'MF_genoA_8vs16', 'MF_genoA_16vs24', 'MF_genoA_8vs24',
                      'CC_genoA_8vs16', 'CC_genoA_16vs24', 'CC_genoA_8vs24',
                      'BP_genoB_8vs16', 'BP_genoB_16vs24', 'BP_genoB_8vs24',
                      'MF_genoB_8vs16', 'MF_genoB_16vs24', 'MF_genoB_8vs24',
                      'CC_genoB_8vs16', 'CC_genoB_16vs24', 'CC_genoB_8vs24',
                      'BP_genoD_8vs16', 'BP_genoD_16vs24', 'BP_genoD_8vs24',
                      'MF_genoD_8vs16', 'MF_genoD_16vs24', 'MF_genoD_8vs24',
                      'CC_genoD_8vs16', 'CC_genoD_16vs24', 'CC_genoD_8vs24',
                      'BP_genoF_8vs16', 'BP_genoF_16vs24', 'BP_genoF_8vs24',
                      'MF_genoF_8vs16', 'MF_genoF_16vs24', 'MF_genoF_8vs24',
                      'CC_genoF_8vs16', 'CC_genoF_16vs24', 'CC_genoF_8vs24',
                      'BP_genoI_8vs16', 'BP_genoI_16vs24', 'BP_genoI_8vs24',
                      'MF_genoI_8vs16', 'MF_genoI_16vs24', 'MF_genoI_8vs24',
                      'CC_genoI_8vs16', 'CC_genoI_16vs24', 'CC_genoI_8vs24',
                      'BP_genoJ_8vs16', 'BP_genoJ_16vs24', 'BP_genoJ_8vs24',
                      'MF_genoJ_8vs16', 'MF_genoJ_16vs24', 'MF_genoJ_8vs24',
                      'CC_genoJ_8vs16', 'CC_genoJ_16vs24', 'CC_genoJ_8vs24',
                      'BP_genoK_8vs16', 'BP_genoK_16vs24', 'BP_genoK_8vs24',
                      'MF_genoK_8vs16', 'MF_genoK_16vs24', 'MF_genoK_8vs24',
                      'CC_genoK_8vs16', 'CC_genoK_16vs24', 'CC_genoK_8vs24',
                      'BP_genoP_8vs16', 'BP_genoP_16vs24', 'BP_genoP_8vs24',
                      'MF_genoP_8vs16', 'MF_genoP_16vs24', 'MF_genoP_8vs24',
                      'CC_genoP_8vs16', 'CC_genoP_16vs24', 'CC_genoP_8vs24')

# loop over the data frame list to select number of up- and downregulated GO terms in different contrasts
df3 = data.frame()
for (df in df_summary){
  df_down = df[df$Direction == 'Down',]
  df_up = df[df$Direction == 'Up',]
  df_down_sign = length(which(df_down$FDR < 0.05))
  df_up_sign = length(which(df_up$FDR < 0.05))
  df3 = rbind.data.frame(df3, c(df_up_sign,df_down_sign))
}

# add column with metadata grouping sets of similar GO terms together
meta = c('avg8vs16', 'avg16vs24', 'avg8vs24',
         'avg8vs16', 'avg16vs24', 'avg8vs24',
         'avg8vs16', 'avg16vs24', 'avg8vs24',
         'genoA_8vs16', 'genoA_16vs24', 'genoA_8vs24',
         'genoA_8vs16', 'genoA_16vs24', 'genoA_8vs24',
         'genoA_8vs16', 'genoA_16vs24', 'genoA_8vs24',
         'genoB_8vs16', 'genoB_16vs24', 'genoB_8vs24',
         'genoB_8vs16', 'genoB_16vs24', 'genoB_8vs24',
         'genoB_8vs16', 'genoB_16vs24', 'genoB_8vs24',
         'genoD_8vs16', 'genoD_16vs24', 'genoD_8vs24',
         'genoD_8vs16', 'genoD_16vs24', 'genoD_8vs24',
         'genoD_8vs16', 'genoD_16vs24', 'genoD_8vs24',
         'genoF_8vs16', 'genoF_16vs24', 'genoF_8vs24',
         'genoF_8vs16', 'genoF_16vs24', 'genoF_8vs24',
         'genoF_8vs16', 'genoF_16vs24', 'genoF_8vs24',
         'genoI_8vs16', 'genoI_16vs24', 'genoI_8vs24',
         'genoI_8vs16', 'genoI_16vs24', 'genoI_8vs24',
         'genoI_8vs16', 'genoI_16vs24', 'genoI_8vs24',
         'genoJ_8vs16', 'genoJ_16vs24', 'genoJ_8vs24',
         'genoJ_8vs16', 'genoJ_16vs24', 'genoJ_8vs24',
         'genoJ_8vs16', 'genoJ_16vs24', 'genoJ_8vs24',
         'genoK_8vs16', 'genoK_16vs24', 'genoK_8vs24',
         'genoK_8vs16', 'genoK_16vs24', 'genoK_8vs24',
         'genoK_8vs16', 'genoK_16vs24', 'genoK_8vs24',
         'genoP_8vs16', 'genoP_16vs24', 'genoP_8vs24',
         'genoP_8vs16', 'genoP_16vs24', 'genoP_8vs24',
         'genoP_8vs16', 'genoP_16vs24', 'genoP_8vs24')  

df3bis = cbind(df3,meta)     

# add column names and row names  
rownames(df3bis) = c('BP_avg8vs16', 'BP_avg16vs24', 'BP_avg8vs24',
                     'MF_avg8vs16', 'MF_avg16vs24', 'MF_avg8vs24',
                     'CC_avg8vs16', 'CC_avg16vs24', 'CC_avg8vs24',
                     'BP_genoA_8vs16', 'BP_genoA_16vs24', 'BP_genoA_8vs24',
                     'MF_genoA_8vs16', 'MF_genoA_16vs24', 'MF_genoA_8vs24',
                     'CC_genoA_8vs16', 'CC_genoA_16vs24', 'CC_genoA_8vs24',
                     'BP_genoB_8vs16', 'BP_genoB_16vs24', 'BP_genoB_8vs24',
                     'MF_genoB_8vs16', 'MF_genoB_16vs24', 'MF_genoB_8vs24',
                     'CC_genoB_8vs16', 'CC_genoB_16vs24', 'CC_genoB_8vs24',
                     'BP_genoD_8vs16', 'BP_genoD_16vs24', 'BP_genoD_8vs24',
                     'MF_genoD_8vs16', 'MF_genoD_16vs24', 'MF_genoD_8vs24',
                     'CC_genoD_8vs16', 'CC_genoD_16vs24', 'CC_genoD_8vs24',
                     'BP_genoF_8vs16', 'BP_genoF_16vs24', 'BP_genoF_8vs24',
                     'MF_genoF_8vs16', 'MF_genoF_16vs24', 'MF_genoF_8vs24',
                     'CC_genoF_8vs16', 'CC_genoF_16vs24', 'CC_genoF_8vs24',
                     'BP_genoI_8vs16', 'BP_genoI_16vs24', 'BP_genoI_8vs24',
                     'MF_genoI_8vs16', 'MF_genoI_16vs24', 'MF_genoI_8vs24',
                     'CC_genoI_8vs16', 'CC_genoI_16vs24', 'CC_genoI_8vs24',
                     'BP_genoJ_8vs16', 'BP_genoJ_16vs24', 'BP_genoJ_8vs24',
                     'MF_genoJ_8vs16', 'MF_genoJ_16vs24', 'MF_genoJ_8vs24',
                     'CC_genoJ_8vs16', 'CC_genoJ_16vs24', 'CC_genoJ_8vs24',
                     'BP_genoK_8vs16', 'BP_genoK_16vs24', 'BP_genoK_8vs24',
                     'MF_genoK_8vs16', 'MF_genoK_16vs24', 'MF_genoK_8vs24',
                     'CC_genoK_8vs16', 'CC_genoK_16vs24', 'CC_genoK_8vs24',
                     'BP_genoP_8vs16', 'BP_genoP_16vs24', 'BP_genoP_8vs24',
                     'MF_genoP_8vs16', 'MF_genoP_16vs24', 'MF_genoP_8vs24',
                     'CC_genoP_8vs16', 'CC_genoP_16vs24', 'CC_genoP_8vs24')
colnames(df3bis) = c('upregulated', 'downregulated', 'metadata')

# sum over the different GO classes
df3_sum = ddply(df3bis, "metadata", numcolwise(sum))

# turn downregulated values negative for plotting in ggplot
df3_sum[,3] = -df3_sum[,3]

# stack data
df3_stack = stack(df3_sum[,2:3])

# add metadata
meta = c('avg16-avg24', 'avg8-avg16', 'avg8-avg24',
         'A16-A24', 'A8-A16', 'A8-A24',
         'B16-B24', 'B8-B16', 'B8-B24',
         'D16-D24', 'D8-D16', 'D8-D24',
         'F16-F24', 'F8-F16', 'F8-F24',
         'I16-I24', 'I8-I16', 'I8-I24',
         'J16-J24', 'J8-J16', 'J8-J24',
         'K16-K24', 'K8-K16', 'K8-K24',
         'P16-P24', 'P8-P16', 'P8-P24')
category = c('16vs24', '8vs16', '8vs24')
df3_stack2 = cbind(df3_stack,meta,category)

# reorder data for plotting
df3_stack2$meta = factor(df3_stack2$meta, levels = c('avg8-avg24', 'avg8-avg16', 'avg16-avg24',
                                                     'P8-P24', 'P8-P16', 'P16-P24',
                                                     'K8-K24', 'K8-K16', 'K16-K24',
                                                     'J8-J24', 'J8-J16', 'J16-J24',
                                                     'I8-I24', 'I8-I16', 'I16-I24',
                                                     'F8-F24', 'F8-F16', 'F16-F24',
                                                     'D8-D24', 'D8-D16', 'D16-D24',
                                                     'B8-B24', 'B8-B16', 'B16-B24',
                                                     'A8-A24', 'A8-A16', 'A16-A24'))

# plot barplot
g = ggplot(df3_stack2, aes(x = meta, y = values, fill = category)) +
  geom_bar(stat = "identity", position = "identity",
           color = "white") + coord_flip() +
  scale_fill_manual("legend", values = c('8vs16' = "#3690C0", 
                                         '16vs24' = "#A6BDDB",  
                                         '8vs24' = "#023858")) +
  scale_x_discrete(limits = rev(levels(x))) +
  theme_test()
g

Finally, we combined topGO and CAMERA results:

# combine CAMERA and ORA data in one ggplot
## drop ind column which is not present in the ORA data frame
df3_stack2 = df3_stack2[-c(2)]

## add category column to the ORA data frame for coloring
category = rep(c('ORA'),18)
df_ORA2 = cbind(df_ORA,category)

## combine data frames
GO_all = rbind(df3_stack2,df_ORA2)

# plot barplot 
g = ggplot(GO_all, aes(x = meta, y = as.numeric(values), fill = category)) +
  geom_bar(stat = "identity", position = "identity",
           color = "white") + coord_flip() +
  scale_fill_manual("legend", values = c('8vs16' = "#3690C0", 
                                         '16vs24' = "#A6BDDB", 
                                         '8vs24' = "#023858", 
                                         'ORA' = 'gray20')) +
  scale_x_discrete(limits = rev(levels(x))) +
  theme_test()
g

3.4 Testing for differential expression using stage-wise analysis [omnibus test]: interaction-effects (RQ3)

In this omnibus test we tested for interaction-effects between genotypes for a given salinity contrast. Interaction-effects give information on differences between genotypes in their response to changing salinity.

3.4.1 Defining contrasts to test

In a first step, we defined all the contrasts that need to be tested: 84 in total (each genotype-salinity combination):

# define all contrasts to test
C_RQ3=matrix(0,nrow=ncol(fit_group_model$coefficients),ncol=84)
rownames(C_RQ3)=colnames(fit_group_model$coefficients)
colnames(C_RQ3)=c("8vs16_A-8vs16_B","8vs16_A-8vs16_D","8vs16_A-8vs16_F","8vs16_A-8vs16_I",
                  "8vs16_A-8vs16_J","8vs16_A-8vs16_K","8vs16_A-8vs16_P","8vs16_B-8vs16_D",
                  "8vs16_B-8vs16_F","8vs16_B-8vs16_I","8vs16_B-8vs16_J","8vs16_B-8vs16_K",
                  "8vs16_B-8vs16_P","8vs16_D-8vs16_F","8vs16_D-8vs16_I","8vs16_D-8vs16_J",
                  "8vs16_D-8vs16_K","8vs16_D-8vs16_P","8vs16_F-8vs16_I","8vs16_F-8vs16_J",
                  "8vs16_F-8vs16_K","8vs16_F-8vs16_P","8vs16_I-8vs16_J","8vs16_I-8vs16_K",
                  "8vs16_I-8vs16_P","8vs16_J-8vs16_K","8vs16_J-8vs16_P","8vs16_K-8vs16_P",
                  "8vs24_A-8vs24_B","8vs24_A-8vs24_D","8vs24_A-8vs24_F","8vs24_A-8vs24_I",
                  "8vs24_A-8vs24_J","8vs24_A-8vs24_K","8vs24_A-8vs24_P","8vs24_B-8vs24_D",
                  "8vs24_B-8vs24_F","8vs24_B-8vs24_I","8vs24_B-8vs24_J","8vs24_B-8vs24_K",
                  "8vs24_B-8vs24_P","8vs24_D-8vs24_F","8vs24_D-8vs24_I","8vs24_D-8vs24_J",
                  "8vs24_D-8vs24_K","8vs24_D-8vs24_P","8vs24_F-8vs24_I","8vs24_F-8vs24_J",
                  "8vs24_F-8vs24_K","8vs24_F-8vs24_P","8vs24_I-8vs24_J","8vs24_I-8vs24_K",
                  "8vs24_I-8vs24_P","8vs24_J-8vs24_K","8vs24_J-8vs24_P","8vs24_K-8vs24_P",
                  "16vs24_A-16vs24_B","16vs24_A-16vs24_D","16vs24_A-16vs24_F","16vs24_A-16vs24_I",
                  "16vs24_A-16vs24_J","16vs24_A-16vs24_K","16vs24_A-16vs24_P","16vs24_B-16vs24_D",
                  "16vs24_B-16vs24_F","16vs24_B-16vs24_I","16vs24_B-16vs24_J","16vs24_B-16vs24_K",
                  "16vs24_B-16vs24_P","16vs24_D-16vs24_F","16vs24_D-16vs24_I","16vs24_D-16vs24_J",
                  "16vs24_D-16vs24_K","16vs24_D-16vs24_P","16vs24_F-16vs24_I","16vs24_F-16vs24_J",
                  "16vs24_F-16vs24_K","16vs24_F-16vs24_P","16vs24_I-16vs24_J","16vs24_I-16vs24_K",
                  "16vs24_I-16vs24_P","16vs24_J-16vs24_K","16vs24_J-16vs24_P","16vs24_K-16vs24_P")

# 8vs16 contrast
C_RQ3[c("A.16ppt","A.8ppt","B.16ppt","B.8ppt"),"8vs16_A-8vs16_B"]=c(-1,1,1,-1)
C_RQ3[c("A.16ppt","A.8ppt","D.16ppt","D.8ppt"),"8vs16_A-8vs16_D"]=c(-1,1,1,-1)
C_RQ3[c("A.16ppt","A.8ppt","F.16ppt","F.8ppt"),"8vs16_A-8vs16_F"]=c(-1,1,1,-1)
C_RQ3[c("A.16ppt","A.8ppt","I.16ppt","I.8ppt"),"8vs16_A-8vs16_I"]=c(-1,1,1,-1)
C_RQ3[c("A.16ppt","A.8ppt","J.16ppt","J.8ppt"),"8vs16_A-8vs16_J"]=c(-1,1,1,-1)
C_RQ3[c("A.16ppt","A.8ppt","K.16ppt","K.8ppt"),"8vs16_A-8vs16_K"]=c(-1,1,1,-1)
C_RQ3[c("A.16ppt","A.8ppt","P.16ppt","P.8ppt"),"8vs16_A-8vs16_P"]=c(-1,1,1,-1)
C_RQ3[c("B.16ppt","B.8ppt","D.16ppt","D.8ppt"),"8vs16_B-8vs16_D"]=c(-1,1,1,-1)
C_RQ3[c("B.16ppt","B.8ppt","F.16ppt","F.8ppt"),"8vs16_B-8vs16_F"]=c(-1,1,1,-1)
C_RQ3[c("B.16ppt","B.8ppt","I.16ppt","I.8ppt"),"8vs16_B-8vs16_I"]=c(-1,1,1,-1)
C_RQ3[c("B.16ppt","B.8ppt","J.16ppt","J.8ppt"),"8vs16_B-8vs16_J"]=c(-1,1,1,-1)
C_RQ3[c("B.16ppt","B.8ppt","K.16ppt","K.8ppt"),"8vs16_B-8vs16_K"]=c(-1,1,1,-1)
C_RQ3[c("B.16ppt","B.8ppt","P.16ppt","P.8ppt"),"8vs16_B-8vs16_P"]=c(-1,1,1,-1)
C_RQ3[c("D.16ppt","D.8ppt","F.16ppt","F.8ppt"),"8vs16_D-8vs16_F"]=c(-1,1,1,-1)
C_RQ3[c("D.16ppt","D.8ppt","I.16ppt","I.8ppt"),"8vs16_D-8vs16_I"]=c(-1,1,1,-1)
C_RQ3[c("D.16ppt","D.8ppt","J.16ppt","J.8ppt"),"8vs16_D-8vs16_J"]=c(-1,1,1,-1)
C_RQ3[c("D.16ppt","D.8ppt","K.16ppt","K.8ppt"),"8vs16_D-8vs16_K"]=c(-1,1,1,-1)
C_RQ3[c("D.16ppt","D.8ppt","P.16ppt","P.8ppt"),"8vs16_D-8vs16_P"]=c(-1,1,1,-1)
C_RQ3[c("F.16ppt","F.8ppt","I.16ppt","I.8ppt"),"8vs16_F-8vs16_I"]=c(-1,1,1,-1)
C_RQ3[c("F.16ppt","F.8ppt","J.16ppt","J.8ppt"),"8vs16_F-8vs16_J"]=c(-1,1,1,-1)
C_RQ3[c("F.16ppt","F.8ppt","K.16ppt","K.8ppt"),"8vs16_F-8vs16_K"]=c(-1,1,1,-1)
C_RQ3[c("F.16ppt","F.8ppt","P.16ppt","P.8ppt"),"8vs16_F-8vs16_P"]=c(-1,1,1,-1)
C_RQ3[c("I.16ppt","I.8ppt","J.16ppt","J.8ppt"),"8vs16_I-8vs16_J"]=c(-1,1,1,-1)
C_RQ3[c("I.16ppt","I.8ppt","K.16ppt","K.8ppt"),"8vs16_I-8vs16_K"]=c(-1,1,1,-1)
C_RQ3[c("I.16ppt","I.8ppt","P.16ppt","P.8ppt"),"8vs16_I-8vs16_P"]=c(-1,1,1,-1)
C_RQ3[c("J.16ppt","J.8ppt","K.16ppt","K.8ppt"),"8vs16_J-8vs16_K"]=c(-1,1,1,-1)
C_RQ3[c("J.16ppt","J.8ppt","P.16ppt","P.8ppt"),"8vs16_J-8vs16_P"]=c(-1,1,1,-1)
C_RQ3[c("K.16ppt","K.8ppt","P.16ppt","P.8ppt"),"8vs16_K-8vs16_P"]=c(-1,1,1,-1)

# 16vs24 contrast
C_RQ3[c("A.24ppt","A.16ppt","B.24ppt","B.16ppt"),"16vs24_A-16vs24_B"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.16ppt","D.24ppt","D.16ppt"),"16vs24_A-16vs24_D"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.16ppt","F.24ppt","F.16ppt"),"16vs24_A-16vs24_F"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.16ppt","I.24ppt","I.16ppt"),"16vs24_A-16vs24_I"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.16ppt","J.24ppt","J.16ppt"),"16vs24_A-16vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.16ppt","K.24ppt","K.16ppt"),"16vs24_A-16vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.16ppt","P.24ppt","P.16ppt"),"16vs24_A-16vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.16ppt","D.24ppt","D.16ppt"),"16vs24_B-16vs24_D"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.16ppt","F.24ppt","F.16ppt"),"16vs24_B-16vs24_F"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.16ppt","I.24ppt","I.16ppt"),"16vs24_B-16vs24_I"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.16ppt","J.24ppt","J.16ppt"),"16vs24_B-16vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.16ppt","K.24ppt","K.16ppt"),"16vs24_B-16vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.16ppt","P.24ppt","P.16ppt"),"16vs24_B-16vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.16ppt","F.24ppt","F.16ppt"),"16vs24_D-16vs24_F"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.16ppt","I.24ppt","I.16ppt"),"16vs24_D-16vs24_I"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.16ppt","J.24ppt","J.16ppt"),"16vs24_D-16vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.16ppt","K.24ppt","K.16ppt"),"16vs24_D-16vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.16ppt","P.24ppt","P.16ppt"),"16vs24_D-16vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("F.24ppt","F.16ppt","I.24ppt","I.16ppt"),"16vs24_F-16vs24_I"]=c(-1,1,1,-1)
C_RQ3[c("F.24ppt","F.16ppt","J.24ppt","J.16ppt"),"16vs24_F-16vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("F.24ppt","F.16ppt","K.24ppt","K.16ppt"),"16vs24_F-16vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("F.24ppt","F.16ppt","P.24ppt","P.16ppt"),"16vs24_F-16vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("I.24ppt","I.16ppt","J.24ppt","J.16ppt"),"16vs24_I-16vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("I.24ppt","I.16ppt","K.24ppt","K.16ppt"),"16vs24_I-16vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("I.24ppt","I.16ppt","P.24ppt","P.16ppt"),"16vs24_I-16vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("J.24ppt","J.16ppt","K.24ppt","K.16ppt"),"16vs24_J-16vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("J.24ppt","J.16ppt","P.24ppt","P.16ppt"),"16vs24_J-16vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("K.24ppt","K.16ppt","P.24ppt","P.16ppt"),"16vs24_K-16vs24_P"]=c(-1,1,1,-1)

# 8vs24 contrast
C_RQ3[c("A.24ppt","A.8ppt","B.24ppt","B.8ppt"),"8vs24_A-8vs24_B"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.8ppt","D.24ppt","D.8ppt"),"8vs24_A-8vs24_D"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.8ppt","F.24ppt","F.8ppt"),"8vs24_A-8vs24_F"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.8ppt","I.24ppt","I.8ppt"),"8vs24_A-8vs24_I"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.8ppt","J.24ppt","J.8ppt"),"8vs24_A-8vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.8ppt","K.24ppt","K.8ppt"),"8vs24_A-8vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.8ppt","P.24ppt","P.8ppt"),"8vs24_A-8vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.8ppt","D.24ppt","D.8ppt"),"8vs24_B-8vs24_D"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.8ppt","F.24ppt","F.8ppt"),"8vs24_B-8vs24_F"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.8ppt","I.24ppt","I.8ppt"),"8vs24_B-8vs24_I"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.8ppt","J.24ppt","J.8ppt"),"8vs24_B-8vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.8ppt","K.24ppt","K.8ppt"),"8vs24_B-8vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.8ppt","P.24ppt","P.8ppt"),"8vs24_B-8vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.8ppt","F.24ppt","F.8ppt"),"8vs24_D-8vs24_F"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.8ppt","I.24ppt","I.8ppt"),"8vs24_D-8vs24_I"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.8ppt","J.24ppt","J.8ppt"),"8vs24_D-8vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.8ppt","K.24ppt","K.8ppt"),"8vs24_D-8vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.8ppt","P.24ppt","P.8ppt"),"8vs24_D-8vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("F.24ppt","F.8ppt","I.24ppt","I.8ppt"),"8vs24_F-8vs24_I"]=c(-1,1,1,-1)
C_RQ3[c("F.24ppt","F.8ppt","J.24ppt","J.8ppt"),"8vs24_F-8vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("F.24ppt","F.8ppt","K.24ppt","K.8ppt"),"8vs24_F-8vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("F.24ppt","F.8ppt","P.24ppt","P.8ppt"),"8vs24_F-8vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("I.24ppt","I.8ppt","J.24ppt","J.8ppt"),"8vs24_I-8vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("I.24ppt","I.8ppt","K.24ppt","K.8ppt"),"8vs24_I-8vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("I.24ppt","I.8ppt","P.24ppt","P.8ppt"),"8vs24_I-8vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("J.24ppt","J.8ppt","K.24ppt","K.8ppt"),"8vs24_J-8vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("J.24ppt","J.8ppt","P.24ppt","P.8ppt"),"8vs24_J-8vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("K.24ppt","K.8ppt","P.24ppt","P.8ppt"),"8vs24_K-8vs24_P"]=c(-1,1,1,-1)

3.4.2 Stage-wise testing

We followed the same procedure as in section 3.3.2.

# screening stage
alpha=0.05
screenTest_RQ3 = glmQLFTest(fit_group_model, contrast=C_RQ3)
pScreen_RQ3 = screenTest_RQ3$table$PValue
names(pScreen_RQ3) = rownames(screenTest_RQ3$table)

# confirmation stage
confirmationResults_RQ3 = sapply(1:ncol(C_RQ3),function(i) glmQLFTest(fit_group_model, 
                                 contrast = C_RQ3[,i]), simplify=FALSE) 
confirmationPList_RQ3 = lapply(confirmationResults_RQ3, function(x) x$table$PValue)
confirmationP_RQ3 = as.matrix(Reduce(f=cbind,confirmationPList_RQ3)) 
rownames(confirmationP_RQ3) = rownames(confirmationResults_RQ3[[1]]$table)
colnames(confirmationP_RQ3) = colnames(C_RQ3)
stageRObj_RQ3 = stageR(pScreen=pScreen_RQ3, pConfirmation=confirmationP_RQ3) 
stageRAdj_RQ3 = stageWiseAdjustment(object=stageRObj_RQ3, method="holm", alpha=0.05)
resRQ3 = getResults(stageRAdj_RQ3)

## The returned adjusted p-values are based on a stage-wise testing approach and are only valid for the provided target OFDR level of 5%. If a different target OFDR level is of interest,the entire adjustment should be re-run.

# get the significant genes for each contrast
SignifGenesRQ3 = colSums(resRQ3) 
SignifGenesRQ3

##        padjScreen   8vs16_A-8vs16_B   8vs16_A-8vs16_D   8vs16_A-8vs16_F 
##              5099               104               440               268 
##   8vs16_A-8vs16_I   8vs16_A-8vs16_J   8vs16_A-8vs16_K   8vs16_A-8vs16_P 
##                46               150               141               248 
##   8vs16_B-8vs16_D   8vs16_B-8vs16_F   8vs16_B-8vs16_I   8vs16_B-8vs16_J 
##               399               308                54                80 
##   8vs16_B-8vs16_K   8vs16_B-8vs16_P   8vs16_D-8vs16_F   8vs16_D-8vs16_I 
##                68               193               729               305 
##   8vs16_D-8vs16_J   8vs16_D-8vs16_K   8vs16_D-8vs16_P   8vs16_F-8vs16_I 
##               444               209               966               158 
##   8vs16_F-8vs16_J   8vs16_F-8vs16_K   8vs16_F-8vs16_P   8vs16_I-8vs16_J 
##               142               118               251                29 
##   8vs16_I-8vs16_K   8vs16_I-8vs16_P   8vs16_J-8vs16_K   8vs16_J-8vs16_P 
##                39               175                58               102 
##   8vs16_K-8vs16_P   8vs24_A-8vs24_B   8vs24_A-8vs24_D   8vs24_A-8vs24_F 
##               238               506               441               272 
##   8vs24_A-8vs24_I   8vs24_A-8vs24_J   8vs24_A-8vs24_K   8vs24_A-8vs24_P 
##               181               188               350               488 
##   8vs24_B-8vs24_D   8vs24_B-8vs24_F   8vs24_B-8vs24_I   8vs24_B-8vs24_J 
##               595               325               379               218 
##   8vs24_B-8vs24_K   8vs24_B-8vs24_P   8vs24_D-8vs24_F   8vs24_D-8vs24_I 
##               232               260               303               114 
##   8vs24_D-8vs24_J   8vs24_D-8vs24_K   8vs24_D-8vs24_P   8vs24_F-8vs24_I 
##               125               411               565               172 
##   8vs24_F-8vs24_J   8vs24_F-8vs24_K   8vs24_F-8vs24_P   8vs24_I-8vs24_J 
##                98               167               197                52 
##   8vs24_I-8vs24_K   8vs24_I-8vs24_P   8vs24_J-8vs24_K   8vs24_J-8vs24_P 
##               175               494               141               105 
##   8vs24_K-8vs24_P 16vs24_A-16vs24_B 16vs24_A-16vs24_D 16vs24_A-16vs24_F 
##               118               113               114                88 
## 16vs24_A-16vs24_I 16vs24_A-16vs24_J 16vs24_A-16vs24_K 16vs24_A-16vs24_P 
##                25                29               115                37 
## 16vs24_B-16vs24_D 16vs24_B-16vs24_F 16vs24_B-16vs24_I 16vs24_B-16vs24_J 
##                41               390               180                26 
## 16vs24_B-16vs24_K 16vs24_B-16vs24_P 16vs24_D-16vs24_F 16vs24_D-16vs24_I 
##                42               132               159               112 
## 16vs24_D-16vs24_J 16vs24_D-16vs24_K 16vs24_D-16vs24_P 16vs24_F-16vs24_I 
##                65                48               195               119 
## 16vs24_F-16vs24_J 16vs24_F-16vs24_K 16vs24_F-16vs24_P 16vs24_I-16vs24_J 
##               107               271               200                39 
## 16vs24_I-16vs24_K 16vs24_I-16vs24_P 16vs24_J-16vs24_K 16vs24_J-16vs24_P 
##               163                49               123                45 
## 16vs24_K-16vs24_P 
##               134

adjusted_p_RQ3 = getAdjustedPValues(stageRAdj_RQ3, onlySignificantGenes = FALSE, order = FALSE)

## The returned adjusted p-values are based on a stage-wise testing approach and are only valid for the provided target OFDR level of 5%. If a different target OFDR level is of interest,the entire adjustment should be re-run.

# visualize number of significant genes
resRQ3_df = as.data.frame(resRQ3) 
resRQ3_df2 = resRQ3_df
resRQ3_df2$gene = rownames(resRQ3_df2)
OnlySignGenes_RQ3 = resRQ3_df[resRQ3_df$padjScreen == 1,] 
dim(OnlySignGenes_RQ3)

## [1] 5099   85

# select genes that were significant after the screening stage, but not the confirmation stage
genesSI_RQ3 = rownames(adjusted_p_RQ3)[adjusted_p_RQ3[,"padjScreen"]<=0.05]
genesNotFoundStageII_RQ3 = genesSI_RQ3[genesSI_RQ3 %in% rownames(resRQ3)[rowSums(resRQ3==0)==84]]
length(genesNotFoundStageII_RQ3)

## [1] 1141

# select genes that were significant after the confirmation stage
OnlySignGenes_RQ3_ConStage = OnlySignGenes_RQ3 [!rownames(OnlySignGenes_RQ3 ) %in% genesNotFoundStageII_RQ3, ]
nrow(OnlySignGenes_RQ3_ConStage)

## [1] 3958

3857 genes are significant after the confirmation stage.

3.4.3 Summarize the results for downstream analyses

# select the adjusted P-values for each contrast
adjusted_p_RQ3 = getAdjustedPValues(stageRAdj_RQ3, onlySignificantGenes = FALSE, order = FALSE)

## The returned adjusted p-values are based on a stage-wise testing approach and are only valid for the provided target OFDR level of 5%. If a different target OFDR level is of interest,the entire adjustment should be re-run.

# rename column headers in adjusted_p_RQ3
colnames(adjusted_p_RQ3) = c("padjScreen","8vs16_A-8vs16_B_Padj","8vs16_A-8vs16_D_Padj","8vs16_A-8vs16_F_Padj",
                             "8vs16_A-8vs16_I_Padj","8vs16_A-8vs16_J_Padj","8vs16_A-8vs16_K_Padj",
                             "8vs16_A-8vs16_P_Padj","8vs16_B-8vs16_D_Padj","8vs16_B-8vs16_F_Padj",
                             "8vs16_B-8vs16_I_Padj","8vs16_B-8vs16_J_Padj","8vs16_B-8vs16_K_Padj",
                             "8vs16_B-8vs16_P_Padj","8vs16_D-8vs16_F_Padj","8vs16_D-8vs16_I_Padj",
                             "8vs16_D-8vs16_J_Padj","8vs16_D-8vs16_K_Padj","8vs16_D-8vs16_P_Padj",
                             "8vs16_F-8vs16_I_Padj","8vs16_F-8vs16_J_Padj","8vs16_F-8vs16_K_Padj",
                             "8vs16_F-8vs16_P_Padj","8vs16_I-8vs16_J_Padj","8vs16_I-8vs16_K_Padj",
                             "8vs16_I-8vs16_P_Padj","8vs16_J-8vs16_K_Padj","8vs16_J-8vs16_P_Padj",
                             "8vs16_K-8vs16_P_Padj","8vs24_A-8vs24_B_Padj","8vs24_A-8vs24_D_Padj",
                             "8vs24_A-8vs24_F_Padj","8vs24_A-8vs24_I_Padj","8vs24_A-8vs24_J_Padj",
                             "8vs24_A-8vs24_K_Padj","8vs24_A-8vs24_P_Padj","8vs24_B-8vs24_D_Padj",
                             "8vs24_B-8vs24_F_Padj","8vs24_B-8vs24_I_Padj","8vs24_B-8vs24_J_Padj",
                             "8vs24_B-8vs24_K_Padj","8vs24_B-8vs24_P_Padj","8vs24_D-8vs24_F_Padj",
                             "8vs24_D-8vs24_I_Padj","8vs24_D-8vs24_J_Padj","8vs24_D-8vs24_K_Padj",
                             "8vs24_D-8vs24_P_Padj","8vs24_F-8vs24_I_Padj","8vs24_F-8vs24_J_Padj",
                             "8vs24_F-8vs24_K_Padj","8vs24_F-8vs24_P_Padj","8vs24_I-8vs24_J_Padj",
                             "8vs24_I-8vs24_K_Padj","8vs24_I-8vs24_P_Padj","8vs24_J-8vs24_K_Padj",
                             "8vs24_J-8vs24_P_Padj","8vs24_K-8vs24_P_Padj","16vs24_A-16vs24_B_Padj",
                             "16vs24_A-16vs24_D_Padj","16vs24_A-16vs24_F_Padj","16vs24_A-16vs24_I_Padj",
                             "16vs24_A-16vs24_J_Padj","16vs24_A-16vs24_K_Padj","16vs24_A-16vs24_P_Padj",
                             "16vs24_B-16vs24_D_Padj","16vs24_B-16vs24_F_Padj","16vs24_B-16vs24_I_Padj",
                             "16vs24_B-16vs24_J_Padj","16vs24_B-16vs24_K_Padj","16vs24_B-16vs24_P_Padj",
                             "16vs24_D-16vs24_F_Padj","16vs24_D-16vs24_I_Padj","16vs24_D-16vs24_J_Padj",
                             "16vs24_D-16vs24_K_Padj","16vs24_D-16vs24_P_Padj","16vs24_F-16vs24_I_Padj",
                             "16vs24_F-16vs24_J_Padj","16vs24_F-16vs24_K_Padj","16vs24_F-16vs24_P_Padj",
                             "16vs24_I-16vs24_J_Padj","16vs24_I-16vs24_K_Padj","16vs24_I-16vs24_P_Padj",
                             "16vs24_J-16vs24_K_Padj","16vs24_J-16vs24_P_Padj","16vs24_K-16vs24_P_Padj")

# create empty list to hold the data values
datalist_RQ3 = list()

# loop over the confirmationResults_RQ3 object to obtain the relevant information (table)
for (contrast in c(1:84)){
  table = confirmationResults_RQ3[[contrast]]$table
  datalist_RQ3[[contrast]] = table
}

# turn list into data frame
confirmationResults_RQ3_total_dataset = data.frame(datalist_RQ3)

# rename column names for tractability
colnames(confirmationResults_RQ3_total_dataset)=c("8vs16_A-8vs16_B_logFC","8vs16_A-8vs16_B_logCPM","8vs16_A-8vs16_B_F","8vs16_A-8vs16_B_nonadjPValue","8vs16_A-8vs16_D_logFC","8vs16_A-8vs16_D_logCPM","8vs16_A-8vs16_D_F","8vs16_A-8vs16_D_nonadjPValue","8vs16_A-8vs16_F_logFC","8vs16_A-8vs16_F_logCPM","8vs16_A-8vs16_F_F","8vs16_A-8vs16_F_nonadjPValue","8vs16_A-8vs16_I_logFC","8vs16_A-8vs16_I_logCPM","8vs16_A-8vs16_I_F","8vs16_A-8vs16_I_nonadjPValue","8vs16_A-8vs16_J_logFC","8vs16_A-8vs16_J_logCPM","8vs16_A-8vs16_J_F","8vs16_A-8vs16_J_nonadjPValue","8vs16_A-8vs16_K_logFC","8vs16_A-8vs16_K_logCPM","8vs16_A-8vs16_K_F","8vs16_A-8vs16_K_nonadjPValue","8vs16_A-8vs16_P_logFC","8vs16_A-8vs16_P_logCPM","8vs16_A-8vs16_P_F","8vs16_A-8vs16_P_nonadjPValue","8vs16_B-8vs16_D_logFC","8vs16_B-8vs16_D_logCPM","8vs16_B-8vs16_D_F","8vs16_B-8vs16_D_nonadjPValue","8vs16_B-8vs16_F_logFC","8vs16_B-8vs16_F_logCPM","8vs16_B-8vs16_F_F","8vs16_B-8vs16_F_nonadjPValue","8vs16_B-8vs16_I_logFC","8vs16_B-8vs16_I_logCPM","8vs16_B-8vs16_I_F","8vs16_B-8vs16_I_nonadjPValue","8vs16_B-8vs16_J_logFC","8vs16_B-8vs16_J_logCPM","8vs16_B-8vs16_J_F","8vs16_B-8vs16_J_nonadjPValue","8vs16_B-8vs16_K_logFC","8vs16_B-8vs16_K_logCPM","8vs16_B-8vs16_K_F","8vs16_B-8vs16_K_nonadjPValue","8vs16_B-8vs16_P_logFC","8vs16_B-8vs16_P_logCPM","8vs16_B-8vs16_P_F","8vs16_B-8vs16_P_nonadjPValue","8vs16_D-8vs16_F_logFC","8vs16_D-8vs16_F_logCPM","8vs16_D-8vs16_F_F","8vs16_D-8vs16_F_nonadjPValue","8vs16_D-8vs16_I_logFC","8vs16_D-8vs16_I_logCPM","8vs16_D-8vs16_I_F","8vs16_D-8vs16_I_nonadjPValue", "8vs16_D-8vs16_J_logFC","8vs16_D-8vs16_J_logCPM","8vs16_D-8vs16_J_F","8vs16_D-8vs16_J_nonadjPValue","8vs16_D-8vs16_K_logFC","8vs16_D-8vs16_K_logCPM","8vs16_D-8vs16_K_F","8vs16_D-8vs16_K_nonadjPValue","8vs16_D-8vs16_P_logFC","8vs16_D-8vs16_P_logCPM","8vs16_D-8vs16_P_F","8vs16_D-8vs16_P_nonadjPValue","8vs16_F-8vs16_I_logFC","8vs16_F-8vs16_I_logCPM","8vs16_F-8vs16_I_F","8vs16_F-8vs16_I_nonadjPValue","8vs16_F-8vs16_J_logFC","8vs16_F-8vs16_J_logCPM","8vs16_F-8vs16_J_F","8vs16_F-8vs16_J_nonadjPValue","8vs16_F-8vs16_K_logFC","8vs16_F-8vs16_K_logCPM","8vs16_F-8vs16_K_F","8vs16_F-8vs16_K_nonadjPValue","8vs16_F-8vs16_P_logFC","8vs16_F-8vs16_P_logCPM","8vs16_F-8vs16_P_F","8vs16_F-8vs16_P_nonadjPValue","8vs16_I-8vs16_J_logFC","8vs16_I-8vs16_J_logCPM","8vs16_I-8vs16_J_F","8vs16_I-8vs16_J_nonadjPValue","8vs16_I-8vs16_K_logFC","8vs16_I-8vs16_K_logCPM","8vs16_I-8vs16_K_F","8vs16_I-8vs16_K_nonadjPValue","8vs16_I-8vs16_P_logFC","8vs16_I-8vs16_P_logCPM","8vs16_I-8vs16_P_F","8vs16_I-8vs16_P_nonadjPValue","8vs16_J-8vs16_K_logFC","8vs16_J-8vs16_K_logCPM","8vs16_J-8vs16_K_F","8vs16_J-8vs16_K_nonadjPValue","8vs16_J-8vs16_P_logFC","8vs16_J-8vs16_P_logCPM","8vs16_J-8vs16_P_F","8vs16_J-8vs16_P_nonadjPValue","8vs16_K-8vs16_P_logFC","8vs16_K-8vs16_P_logCPM","8vs16_K-8vs16_P_F","8vs16_K-8vs16_P_nonadjPValue","8vs24_A-8vs24_B_logFC","8vs24_A-8vs24_B_logCPM","8vs24_A-8vs24_B_F","8vs24_A-8vs24_B_nonadjPValue", "8vs24_A-8vs24_D_logFC","8vs24_A-8vs24_D_logCPM","8vs24_A-8vs24_D_F","8vs24_A-8vs24_D_nonadjPValue","8vs24_A-8vs24_F_logFC","8vs24_A-8vs24_F_logCPM","8vs24_A-8vs24_F_F","8vs24_A-8vs24_F_nonadjPValue","8vs24_A-8vs24_I_logFC","8vs24_A-8vs24_I_logCPM","8vs24_A-8vs24_I_F","8vs24_A-8vs24_I_nonadjPValue","8vs24_A-8vs24_J_logFC","8vs24_A-8vs24_J_logCPM","8vs24_A-8vs24_J_F","8vs24_A-8vs24_J_nonadjPValue","8vs24_A-8vs24_K_logFC","8vs24_A-8vs24__logCPM","8vs24_A-8vs24_K_F","8vs24_A-8vs24_K_nonadjPValue","8vs24_A-8vs24_P_logFC","8vs24_A-8vs24_P_logCPM","8vs24_A-8vs24_P_F","8vs24_A-8vs24_P_nonadjPValue","8vs24_B-8vs24_D_logFC","8vs24_B-8vs24_D_logCPM","8vs24_B-8vs24_D_F","8vs24_B-8vs24_D_nonadjPValue","8vs24_B-8vs24_F_logFC","8vs24_B-8vs24_F_logCPM","8vs24_B-8vs24_F_F","8vs24_B-8vs24_F_nonadjPValue","8vs24_B-8vs24_I_logFC","8vs24_B-8vs24_I_logCPM","8vs24_B-8vs24_I_F","8vs24_B-8vs24_I_nonadjPValue","8vs24_B-8vs24_J_logFC","8vs24_B-8vs24_J_logCPM","8vs24_B-8vs24_J_F","8vs24_B-8vs24_J_nonadjPValue","8vs24_B-8vs24_K_logFC","8vs24_B-8vs24_K_logCPM","8vs24_B-8vs24_K_F","8vs24_B-8vs24_K_nonadjPValue","8vs24_B-8vs24_P_logFC","8vs24_B-8vs24_P_logCPM","8vs24_B-8vs24_P_F","8vs24_B-8vs24_P_nonadjPValue","8vs24_D-8vs24_F_logFC","8vs24_D-8vs24_F_logCPM","8vs24_D-8vs24_F_F","8vs24_D-8vs24_F_nonadjPValue","8vs24_D-8vs24_I_logFC","8vs24_D-8vs24_I_logCPM","8vs24_D-8vs24_I_F","8vs24_D-8vs24_I_nonadjPValue","8vs24_D-8vs24_J_logFC","8vs24_D-8vs24_J_logCPM","8vs24_D-8vs24_J_F","8vs24_D-8vs24_J_nonadjPValue","8vs24_D-8vs24_K_logFC","8vs24_D-8vs24_K_logCPM","8vs24_D-8vs24_K_F","8vs24_D-8vs24_K_nonadjPValue","8vs24_D-8vs24_P_logFC","8vs24_D-8vs24_P_logCPM","8vs24_D-8vs24_P_F","8vs24_D-8vs24_P_nonadjPValue","8vs24_F-8vs24_I_logFC","8vs24_F-8vs24_I_logCPM","8vs24_F-8vs24_I_F","8vs24_F-8vs24_I_nonadjPValue","8vs24_F-8vs24_J_logFC","8vs24_F-8vs24_J_logCPM","8vs24_F-8vs24_J_F","8vs24_F-8vs24_J_nonadjPValue","8vs24_F-8vs24_K_logFC","8vs24_F-8vs24_K_logCPM","8vs24_F-8vs24_K_F","8vs24_F-8vs24_K_nonadjPValue","8vs24_F-8vs24_P_logFC","8vs24_F-8vs24_P_logCPM","8vs24_F-8vs24_P_F","8vs24_F-8vs24_P_nonadjPValue","8vs24_I-8vs24_J_logFC","8vs24_I-8vs24_J_logCPM","8vs24_I-8vs24_J_F","8vs24_I-8vs24_J_nonadjPValue","8vs24_I-8vs24_K_logFC","8vs24_I-8vs24_K_logCPM","8vs24_I-8vs24_K_F","8vs24_I-8vs24_K_nonadjPValue","8vs24_I-8vs24_P_logFC","8vs24_I-8vs24_P_logCPM","8vs24_I-8vs24_P_F","8vs24_I-8vs24_P_nonadjPValue","8vs24_J-8vs24_K_logFC","8vs24_J-8vs24_K_logCPM","8vs24_J-8vs24_K_F","8vs24_J-8vs24_K_nonadjPValue","8vs24_J-8vs24_P_logFC","8vs24_J-8vs24_P_logCPM","8vs24_J-8vs24_P_F","8vs24_J-8vs24_P_nonadjPValue","8vs24_K-8vs24_P_logFC","8vs24_K-8vs24_P_logCPM","8vs24_K-8vs24_P_F","8vs24_K-8vs24_P_nonadjPValue","16vs24_A-16vs24_B_logFC","16vs24_A-16vs24_B_logCPM","16vs24_A-16vs24_B_F","16vs24_A-16vs24_B_nonadjPValue","16vs24_A-16vs24_D_logFC","16vs24_A-16vs24_D_logCPM","16vs24_A-16vs24_D_F","16vs24_A-16vs24_D_nonadjPValue","16vs24_A-16vs24_F_logFC","16vs24_A-16vs24_F_logCPM","16vs24_A-16vs24_F_F","16vs24_A-16vs24_F_nonadjPValue","16vs24_A-16vs24_I_logFC","16vs24_A-16vs24_I_logCPM","16vs24_A-16vs24_I_F","16vs24_A-16vs24_I_nonadjPValue","16vs24_A-16vs24_J_logFC","16vs24_A-16vs24_J_logCPM","16vs24_A-16vs24_J_F","16vs24_A-16vs24_J_nonadjPValue","16vs24_A-16vs24_K_logFC","16vs24_A-16vs24_K_logCPM","16vs24_A-16vs24_K_F","16vs24_A-16vs24_K_nonadjPValue","16vs24_A-16vs24_P_logFC","16vs24_A-16vs24_P_logCPM","16vs24_A-16vs24_P_F","16vs24_A-16vs24_P_nonadjPValue","16vs24_B-16vs24_D_logFC","16vs24_B-16vs24_D_logCPM","16vs24_B-16vs24_D_F","16vs24_B-16vs24_D_nonadjPValue","16vs24_B-16vs24_F_logFC","16vs24_B-16vs24_F_logCPM","16vs24_B-16vs24_F_F","16vs24_B-16vs24_F_nonadjPValue","16vs24_B-16vs24_I_logFC","16vs24_B-16vs24_I_logCPM","16vs24_B-16vs24_I_F","16vs24_B-16vs24_I_nonadjPValue","16vs24_B-16vs24_J_logFC","16vs24_B-16vs24_J_logCPM","16vs24_B-16vs24_J_F","16vs24_B-16vs24_J_nonadjPValue","16vs24_B-16vs24_K_logFC","16vs24_B-16vs24_K_logCPM","16vs24_B-16vs24_K_F","16vs24_B-16vs24_K_nonadjPValue","16vs24_B-16vs24_P_logFC","16vs24_B-16vs24_P_logCPM","16vs24_B-16vs24_P_F","16vs24_B-16vs24_P_nonadjPValue","16vs24_D-16vs24_F_logFC","16vs24_D-16vs24_F_logCPM","16vs24_D-16vs24_F_F","16vs24_D-16vs24_F_nonadjPValue","16vs24_D-16vs24_I_logFC","16vs24_D-16vs24_I_logCPM","16vs24_D-16vs24_I_F","16vs24_D-16vs24_I_nonadjPValue","16vs24_D-16vs24_J_logFC","16vs24_D-16vs24_J_logCPM","16vs24_D-16vs24_J_F","16vs24_D-16vs24_J_nonadjPValue","16vs24_D-16vs24_K_logFC","16vs24_D-16vs24_K_logCPM","16vs24_D-16vs24_K_F","16vs24_D-16vs24_K_nonadjPValue","16vs24_D-16vs24_P_logFC","16vs24_D-16vs24_P_logCPM","16vs24_D-16vs24_P_F","16vs24_D-16vs24_P_nonadjPValue","16vs24_F-16vs24_I_logFC","16vs24_F-16vs24_I_logCPM","16vs24_F-16vs24_I_F","16vs24_F-16vs24_I_nonadjPValue","16vs24_F-16vs24_J_logFC","16vs24_F-16vs24_J_logCPM","16vs24_F-16vs24_J_F","16vs24_F-16vs24_J_nonadjPValue","16vs24_F-16vs24_K_logFC","16vs24_F-16vs24_K_logCPM","16vs24_F-16vs24_K_F","16vs24_F-16vs24_K_nonadjPValue","16vs24_F-16vs24_P_logFC","16vs24_F-16vs24_P_logCPM","16vs24_F-16vs24_P_F","16vs24_F-16vs24_P_nonadjPValue","16vs24_I-16vs24_J_logFC","16vs24_I-16vs24_J_logCPM","16vs24_I-16vs24_J_F","16vs24_I-16vs24_J_nonadjPValue","16vs24_I-16vs24_K_logFC","16vs24_I-16vs24_K_logCPM","16vs24_I-16vs24_K_F","16vs24_I-16vs24_K_nonadjPValue","16vs24_I-16vs24_P_logFC","16vs24_I-16vs24_P_logCPM","16vs24_I-16vs24_P_F","16vs24_I-16vs24_P_nonadjPValue","16vs24_J-16vs24_K_logFC","16vs24_J-16vs24_K_logCPM","16vs24_J-16vs24_K_F","16vs24_J-16vs24_K_nonadjPValue","16vs24_J-16vs24_P_logFC","16vs24_J-16vs24_P_logCPM","16vs24_J-16vs24_P_F","16vs24_J-16vs24_P_nonadjPValue","16vs24_K-16vs24_P_logFC","16vs24_K-16vs24_P_logCPM","16vs24_K-16vs24_P_F","16vs24_K-16vs24_P_nonadjPValue")

# merge the data frames
table = merge(confirmationResults_RQ3_total_dataset,adjusted_p_RQ3, by = 0, all = TRUE)

# use the first column (gene names) for the row names
all_results_RQ3 = table[,-1]
rownames(all_results_RQ3) = table[,1]

3.4.4 Top 100 set of interaction-effect genes

In this section, we visualized the top 100 set of interaction-effect genes. This set of 100 genes is selected based on stageR’s FDR-adjusted P-value of the global null hypothesis (Padjscreen).

We started with selecting the top 100 set of genes with interaction-effects:

# rank genes based on P-value
Padjscreen_sorted_RQ3 = all_results_RQ3[with(all_results_RQ3, order(all_results_RQ3$padjScreen)),]

# select top 100 genes based on Padjscreen
Padjscreen_sorted_RQ3_top100 = rownames(Padjscreen_sorted_RQ3[1:100 ,]) 
OnlySignGenes_RQ3_ConStage_top100 = subset(OnlySignGenes_RQ3_ConStage, rownames(OnlySignGenes_RQ3_ConStage)%in%Padjscreen_sorted_RQ3_top100)

Next, we selected the same genes in the summary data frame of RQ1e2 (all_results_RQ1e2 from section 3.3):

# select top 100 interaction-effect genes in the response for each genotype (RQ2 - section 3.3)
all_results_RQ1e2_top100_RQ3 = subset(all_results_RQ1e2,
                                  rownames(all_results_RQ1e2)%in%rownames(OnlySignGenes_RQ3_ConStage_top100))

# select columns with logFC values
all_results_RQ1e2_top100_RQ3_logFC = all_results_RQ1e2_top100_RQ3[,grepl("logFC",
                                                      colnames(all_results_RQ1e2_top100_RQ3))]

# remove columns of the average effect
all_results_RQ1e2_top100_RQ3_logFC = all_results_RQ1e2_top100_RQ3_logFC[-c(25,26,27)] 

# select top 100 in OnlySignGenes_RQ3_ConStage object
OnlySignGenes_RQ1e2_ConStage_RQ3_top100 = subset(OnlySignGenes_RQ1e2_ConStage, rownames(OnlySignGenes_RQ1e2_ConStage)%in%rownames(OnlySignGenes_RQ3_ConStage_top100))

# remove columns of the average effect
OnlySignGenes_RQ1e2_ConStage_RQ3_top100_sel = OnlySignGenes_RQ1e2_ConStage_RQ3_top100[-c(1,26,27,28)]

# combine both data frames to replace all non-significant logFC values by NA
all_results_RQ1e2_top100_RQ3_logFC = type.convert(all_results_RQ1e2_top100_RQ3_logFC, as.is = TRUE)
all_results_RQ1e2_top100_RQ3_logFC_OnlySign = (0^(OnlySignGenes_RQ1e2_ConStage_RQ3_top100_sel == 0)) * all_results_RQ1e2_top100_RQ3_logFC

Next, we subdivided the top 100 set of interaction-effect genes in two categories: genes that differ in the direction of their response between genotypes, and genes that differ in the magnitude of their response between genotypes. We did this using the information retrieved in the code section above, thus using information on significance and logFC from RQ1e2 (section 3.3):

# select genes that are up- or downregulated in different directions regardless of logFC
RQ3_top100_DiffDir = subset(all_results_RQ1e2_top100_RQ3_logFC_OnlySign,
                            (rowSums(all_results_RQ1e2_top100_RQ3_logFC_OnlySign < 0) > 0) &
                            (rowSums(all_results_RQ1e2_top100_RQ3_logFC_OnlySign > 0) > 0))
length(rownames(RQ3_top100_DiffDir))

## [1] 91

# select genes that show effects in the same direction
RQ3_top100_SameDir = setdiff(rownames(OnlySignGenes_RQ3_ConStage_top100), rownames(RQ3_top100_DiffDir))
length(RQ3_top100_SameDir)

## [1] 9

Then, we checked whether the set of genes that differ in the direction of their response contain any genes that are significant in one genotype only:

# check for genes only DE in one genotype
RQ3_top100_DiffDir_uniqueDE = subset(RQ3_top100_DiffDir, rownames(RQ3_top100_DiffDir)%in%RQ1e2_uniqueDE) 
RQ3_top100_DiffDir_uniqueDE

##  [1] A8vsA16_logFC  A16vsA24_logFC A8vsA24_logFC  B8vsB16_logFC  B16vsB24_logFC
##  [6] B8vsB24_logFC  D8vsD16_logFC  D16vsD24_logFC D8vsD24_logFC  F8vsF16_logFC 
## [11] F16vsF24_logFC F8vsF24_logFC  I8vsI16_logFC  I16vsI24_logFC I8vsI24_logFC 
## [16] J8vsJ16_logFC  J16vsJ24_logFC J8vsJ24_logFC  K8vsK16_logFC  K16vsK24_logFC
## [21] K8vsK24_logFC  P8vsP16_logFC  P16vsP24_logFC P8vsP24_logFC 
## <0 rows> (or 0-length row.names)

3.4.5 GO enrichment: ORA analysis with TopGO

In this section, we performed GO enrichment using Fisher’s Exact test in TopGO. This GO enrichment was done on two sets of interaction-effect genes: genes that differ in the direction of their response between genotypes, and genes that differ in the magnitude of their response between genotypes.

We assigned all interaction-effect genes to these two categories using the same criteria as in section 3.4.4:

# create object with names of all genes that show interaction-effects
genes_OnlySign_Constage_RQ3 = rownames(OnlySignGenes_RQ3_ConStage)

# select columns with logFC values in the RQ1e2 summary data frame
all_results_RQ1e2_logFC = all_results_RQ1e2[,grepl("logFC", colnames(all_results_RQ1e2))]
all_results_RQ1e2_logFC  = all_results_RQ1e2_logFC[-c(25,26,27)]

# select only significant genes in RQ3
OnlySignGenes_RQ1e2_ConStage_RQ3_OnlySign = subset(OnlySignGenes_RQ1e2_ConStage,rownames(OnlySignGenes_RQ1e2_ConStage)%in%genes_OnlySign_Constage_RQ3)

OnlySignGenes_RQ1e2_ConStage_RQ3_OnlySign = OnlySignGenes_RQ1e2_ConStage_RQ3_OnlySign[-c(1,26,27,28)]

all_results_RQ1e2_all_logFC_RQ3_OnlySign = subset(all_results_RQ1e2_logFC, 
                                  rownames(all_results_RQ1e2_logFC)%in%genes_OnlySign_Constage_RQ3)

# remove genes that are not significant in RQ1e2
intersect_genes = intersect(rownames(OnlySignGenes_RQ1e2_ConStage_RQ3_OnlySign),
                            rownames(all_results_RQ1e2_all_logFC_RQ3_OnlySign))

all_results_RQ1e2_all_logFC_RQ3_OnlySign_sub = subset(all_results_RQ1e2_all_logFC_RQ3_OnlySign, 
                                                      rownames(all_results_RQ1e2_all_logFC_RQ3_OnlySign)
                                                      %in%intersect_genes)

# combine both data frames to replace all non-significant logFC values by NA
all_results_RQ1e2_all_logFC_RQ3_OnlySign = type.convert(all_results_RQ1e2_all_logFC_RQ3_OnlySign_sub, 
                                                        as.is = TRUE)
all_results_RQ1e2_all_RQ3_logFC_OnlySign = (0^(OnlySignGenes_RQ1e2_ConStage_RQ3_OnlySign == 0)) * all_results_RQ1e2_all_logFC_RQ3_OnlySign_sub

# select genes that are up- or downregulated in different directions regardless of logFC
RQ3_all_DiffDir = subset(all_results_RQ1e2_all_RQ3_logFC_OnlySign,
                         (rowSums(all_results_RQ1e2_all_RQ3_logFC_OnlySign < 0) > 0) & 
                           (rowSums(all_results_RQ1e2_all_RQ3_logFC_OnlySign > 0) > 0))
nrow(RQ3_all_DiffDir)

## [1] 1178

# select genes that are up- or downregulated in the same directions, but to a different extent or that are not significant in RQ1e2
RQ3_all_SameDir = setdiff(rownames(OnlySignGenes_RQ3_ConStage), rownames(RQ3_all_DiffDir))
length(RQ3_all_SameDir)

## [1] 2780

# check the DiffDir object for presence of genes that are uniquely DE in one genotype, as those do not actually represent differences in direction between genotypes, but actually represent differences in magnitude between genotypes. We did this using the RQ1e2_uniqueDE object created in section 3.3.4:
RQ3_all_DiffDir_uniqueDE = subset(RQ3_all_DiffDir, rownames(RQ3_all_DiffDir)%in%RQ1e2_uniqueDE)
nrow(RQ3_all_DiffDir_uniqueDE) # these special genes will be added to the SameDir object and removed from the DiffDir object

## [1] 39

# add the special genes identified above to the SameDir object
RQ3_all_SameDir2 = c(RQ3_all_SameDir,rownames(RQ3_all_DiffDir_uniqueDE))
length(RQ3_all_SameDir2)

## [1] 2819

# remove special genes from the DiffDir object
RQ3_all_DiffDir2 = RQ3_all_DiffDir[ ! rownames(RQ3_all_DiffDir) %in% rownames(RQ3_all_DiffDir_uniqueDE), ]
nrow(RQ3_all_DiffDir2)

## [1] 1139

Next, we performed GO enrichment on both sets separately.

First, the set of genes that differ in the direction of their response between genotypes:

# select the set of significant DE genes with GO terms
genesOfInterest_RQ3_all_DiffDir = rownames(subset(RQ3_all_DiffDir2, rownames(RQ3_all_DiffDir2) %in%geneUniverse))
length(genesOfInterest_RQ3_all_DiffDir)

# create gene list for input in topGO
geneList_RQ3_all_DiffDir = factor(as.integer(geneUniverse %in% genesOfInterest_RQ3_all_DiffDir))
names(geneList_RQ3_all_DiffDir) = geneUniverse

# create a topGO object (for biological process GOs)
GOdata_BP_RQ3_all_DiffDir = new("topGOdata", ontology="BP", allGenes=geneList_RQ3_all_DiffDir, 
                                annot = annFUN.gene2GO, gene2GO = geneID2GO)
# create a topGO object (for molecular function GOs)
GOdata_MF_RQ3_all_DiffDir = new("topGOdata", ontology="MF", allGenes=geneList_RQ3_all_DiffDir, 
                                annot = annFUN.gene2GO, gene2GO = geneID2GO)
# create a topGO object (for cellular component GOs)
GOdata_CC_RQ3_all_DiffDir = new("topGOdata", ontology="CC", allGenes=geneList_RQ3_all_DiffDir, 
                                annot = annFUN.gene2GO, gene2GO = geneID2GO)

# run Fisher's exact test
resultFisher_BP_RQ3_all_DiffDir = runTest(GOdata_BP_RQ3_all_DiffDir, algorithm = "elim", statistic = "fisher")
resultFisher_MF_RQ3_all_DiffDir = runTest(GOdata_MF_RQ3_all_DiffDir, algorithm = "elim", statistic = "fisher")
resultFisher_CC_RQ3_all_DiffDir = runTest(GOdata_CC_RQ3_all_DiffDir, algorithm = "elim", statistic = "fisher")

# extract the significant GO terms
allRes_BP_allDE_elim_RQ3_all_DiffDir  = GenTable(GOdata_BP_RQ3_all_DiffDir, 
                                                 classic = resultFisher_BP_RQ3_all_DiffDir , 
                                                  orderBy = "elim", ranksOf = "elim", topNodes = 45, 
                                                 numChar=1000)
allRes_MF_allDE_elim_RQ3_all_DiffDir  = GenTable(GOdata_MF_RQ3_all_DiffDir, 
                                                 classic = resultFisher_MF_RQ3_all_DiffDir , 
                                                  orderBy = "elim", ranksOf = "elim", topNodes = 20, 
                                                 numChar=1000)
allRes_CC_allDE_elim_RQ3_all_DiffDir  = GenTable(GOdata_CC_RQ3_all_DiffDir, 
                                                 classic = resultFisher_CC_RQ3_all_DiffDir , 
                                                  orderBy = "elim", ranksOf = "elim", topNodes = 10, 
                                                 numChar=1000)

Next, the set of genes that differ in the magnitude of their response between genotypes:

# select the set of significant DE genes with GO terms
genesOfInterest_RQ3_all_SameDir = intersect(RQ3_all_SameDir2, geneUniverse)
length(genesOfInterest_RQ3_all_SameDir)

# create gene list for input in topGO
geneList_RQ3_all_SameDir = factor(as.integer(geneUniverse %in% genesOfInterest_RQ3_all_SameDir))
names(geneList_RQ3_all_SameDir) = geneUniverse

# create a topGO object (for biological process GOs)
GOdata_BP_RQ3_all_SameDir = new("topGOdata", ontology="BP", allGenes=geneList_RQ3_all_SameDir, 
                                annot = annFUN.gene2GO, gene2GO = geneID2GO)
# create a topGO object (for molecular function GOs)
GOdata_MF_RQ3_all_SameDir = new("topGOdata", ontology="MF", allGenes=geneList_RQ3_all_SameDir, 
                                annot = annFUN.gene2GO, gene2GO = geneID2GO)
# create a topGO object (for cellular component GOs)
GOdata_CC_RQ3_all_SameDir = new("topGOdata", ontology="CC", allGenes=geneList_RQ3_all_SameDir, 
                                annot = annFUN.gene2GO, gene2GO = geneID2GO)

# run Fisher's exact test
resultFisher_BP_RQ3_all_SameDir = runTest(GOdata_BP_RQ3_all_SameDir, algorithm = "elim", statistic = "fisher")
resultFisher_MF_RQ3_all_SameDir = runTest(GOdata_MF_RQ3_all_SameDir, algorithm = "elim", statistic = "fisher")
resultFisher_CC_RQ3_all_SameDir = runTest(GOdata_CC_RQ3_all_SameDir, algorithm = "elim", statistic = "fisher")

# extract the significant GO terms
allRes_BP_allDE_elim_RQ3_all_SameDir  = GenTable(GOdata_BP_RQ3_all_SameDir, 
                                                 classic = resultFisher_BP_RQ3_all_SameDir , 
                                                  orderBy = "elim", ranksOf = "elim", topNodes = 75, 
                                                 numChar=1000)
allRes_MF_allDE_elim_RQ3_all_SameDir  = GenTable(GOdata_MF_RQ3_all_SameDir, 
                                                 classic = resultFisher_MF_RQ3_all_SameDir , 
                                                  orderBy = "elim", ranksOf = "elim", topNodes = 60, 
                                                 numChar=1000)
allRes_CC_allDE_elim_RQ3_all_SameDir  = GenTable(GOdata_CC_RQ3_all_SameDir, 
                                                 classic = resultFisher_CC_RQ3_all_SameDir , 
                                                  orderBy = "elim", ranksOf = "elim", topNodes = 10, 
                                                 numChar=1000)

REVIGO for the above analyses was accessed on August 18th 2021, and used the Gene Ontology database of July 2nd 2021 and the UniProt-to-GO mapping database from June 17th 2021.

3.4.6 Rerunning the interaction-effects for the 8-24 contrast only

In a last step, we reran the RQ3 model, but now only for the 8-24 contrast. This was done because we are interested in these logFC’s for comparison with allele frequencies. The model was reran to decrease the FDR penalty imposed by multiple testing correction for this set of genes:

# define contrasts to test
C_RQ3=matrix(0,nrow=ncol(fit_group_model$coefficients),ncol=28)
rownames(C_RQ3)=colnames(fit_group_model$coefficients)
colnames(C_RQ3)=c("8vs24_A-8vs24_B","8vs24_A-8vs24_D","8vs24_A-8vs24_F","8vs24_A-8vs24_I",
                  "8vs24_A-8vs24_J","8vs24_A-8vs24_K","8vs24_A-8vs24_P","8vs24_B-8vs24_D",
                  "8vs24_B-8vs24_F","8vs24_B-8vs24_I","8vs24_B-8vs24_J","8vs24_B-8vs24_K",
                  "8vs24_B-8vs24_P","8vs24_D-8vs24_F","8vs24_D-8vs24_I","8vs24_D-8vs24_J",
                  "8vs24_D-8vs24_K","8vs24_D-8vs24_P","8vs24_F-8vs24_I","8vs24_F-8vs24_J",
                  "8vs24_F-8vs24_K","8vs24_F-8vs24_P","8vs24_I-8vs24_J","8vs24_I-8vs24_K",
                  "8vs24_I-8vs24_P","8vs24_J-8vs24_K","8vs24_J-8vs24_P","8vs24_K-8vs24_P")

# 8vs24 contrast
C_RQ3[c("A.24ppt","A.8ppt","B.24ppt","B.8ppt"),"8vs24_A-8vs24_B"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.8ppt","D.24ppt","D.8ppt"),"8vs24_A-8vs24_D"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.8ppt","F.24ppt","F.8ppt"),"8vs24_A-8vs24_F"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.8ppt","I.24ppt","I.8ppt"),"8vs24_A-8vs24_I"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.8ppt","J.24ppt","J.8ppt"),"8vs24_A-8vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.8ppt","K.24ppt","K.8ppt"),"8vs24_A-8vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("A.24ppt","A.8ppt","P.24ppt","P.8ppt"),"8vs24_A-8vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.8ppt","D.24ppt","D.8ppt"),"8vs24_B-8vs24_D"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.8ppt","F.24ppt","F.8ppt"),"8vs24_B-8vs24_F"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.8ppt","I.24ppt","I.8ppt"),"8vs24_B-8vs24_I"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.8ppt","J.24ppt","J.8ppt"),"8vs24_B-8vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.8ppt","K.24ppt","K.8ppt"),"8vs24_B-8vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("B.24ppt","B.8ppt","P.24ppt","P.8ppt"),"8vs24_B-8vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.8ppt","F.24ppt","F.8ppt"),"8vs24_D-8vs24_F"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.8ppt","I.24ppt","I.8ppt"),"8vs24_D-8vs24_I"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.8ppt","J.24ppt","J.8ppt"),"8vs24_D-8vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.8ppt","K.24ppt","K.8ppt"),"8vs24_D-8vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("D.24ppt","D.8ppt","P.24ppt","P.8ppt"),"8vs24_D-8vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("F.24ppt","F.8ppt","I.24ppt","I.8ppt"),"8vs24_F-8vs24_I"]=c(-1,1,1,-1)
C_RQ3[c("F.24ppt","F.8ppt","J.24ppt","J.8ppt"),"8vs24_F-8vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("F.24ppt","F.8ppt","K.24ppt","K.8ppt"),"8vs24_F-8vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("F.24ppt","F.8ppt","P.24ppt","P.8ppt"),"8vs24_F-8vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("I.24ppt","I.8ppt","J.24ppt","J.8ppt"),"8vs24_I-8vs24_J"]=c(-1,1,1,-1)
C_RQ3[c("I.24ppt","I.8ppt","K.24ppt","K.8ppt"),"8vs24_I-8vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("I.24ppt","I.8ppt","P.24ppt","P.8ppt"),"8vs24_I-8vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("J.24ppt","J.8ppt","K.24ppt","K.8ppt"),"8vs24_J-8vs24_K"]=c(-1,1,1,-1)
C_RQ3[c("J.24ppt","J.8ppt","P.24ppt","P.8ppt"),"8vs24_J-8vs24_P"]=c(-1,1,1,-1)
C_RQ3[c("K.24ppt","K.8ppt","P.24ppt","P.8ppt"),"8vs24_K-8vs24_P"]=c(-1,1,1,-1)