This analysis will provide distribution of susceptibilities, distribution of complexities with statistics, pathotype frequency distribution, and individual isolate pathotypes as well as diversity indices for pathotypes.These scripts are meant to be a substitute for the Hagis spreadsheet previously used for Phytophthora sojae pathotype analysis and provide the same necessary data as the Hagis sheet.

Packages needed for analysis

TRUE indicates package was installed and loaded correctly.

##      plyr   ggplot2 tidyverse   plotrix   stringr    pander     vegan 
##      TRUE      TRUE      TRUE      TRUE      TRUE      TRUE      TRUE 
##  devtools 
##      TRUE

Reading in your data. Do not change “Pathotype.Data”, just the name of the file, we will use the Practice data provided as an example. The function > here() will find the .csv file relative to the R project, so there is no need to set working directory or provide full file path.

The input should be in .csv format with any NA values encoded as blanks

If NA values are encoded differently, replace the option na = “” to what your values are encoded as

Pathotype.Data <- read.csv(here("Practice data set.csv"), na = "")

This reads the file called functions_themes.R and runs it. This will enable you to use the functions and defines graphic themes.

source(here("functions_themes.R"))

Section 1: Distribution of Susceptibilities

Do not change Pathotype.Data in any of the functions. You need only change the three spaces after Pathotype.Data to your associated column headings and the fourth space, which is susceptibility cutoff percentage.

Instructions

“Isolate” should be renamed to the column header for the column which identifies the isolates tested “perc.susc” should be renamed to the column header for the column which identifies the percent susceptible plants for each gene “Gene” should be renamed to the column header for the column which identifies the genes tested

These will need to be changed in all functions within this .Rmd file for the code to work.

The value in “Distribution_of_Susceptibilities(Pathotype.Data,”Isolate:, “perc.susc”, “Gene”, 60)" (in this case, 60), sets the cutoff for susceptible reactions. For example, currently all genes with 60% or more of the plants rated susceptible will return a “1” in The following scripts (meaning it is susceptible). You can change this to whatever percentage you require for your study.

The output will return a list with the first element equal to the graphic, and the second with the table. You can parse the list by putting a $ and showing if you want the Data or a graphic

Suceptibilities <- Distribution_of_Susceptibilities(Pathotype.Data, "Isolate", "perc.susc", "Rps", 60)
pander::pander(Suceptibilities$Data)
Rps N percent_isolates_pathogenic
Rps 1a 21 100
Rps 1b 15 71.43
Rps 1c 20 95.24
Rps 1d 16 76.19
Rps 1k 18 85.71
Rps 2 14 66.67
Rps 3a 5 23.81
Rps 3b 20 95.24
Rps 3c 4 19.05
Rps 4 5 23.81
Rps 5 13 61.9
Rps 6 11 52.38
Rps 7 21 100
susceptible 21 100
Suceptibilities$Graphic

Section 2: Distribution of Complexities

You will need the change “Isolate”, “perc.susc”, and “Gene” again in this function to your correct column headers in your dataset. Again, you can change your susceptible cutoff value here for your dataset

complexities <- Distribution_of_Complexities(Pathotype.Data, "Isolate", "perc.susc", "Rps", 60)

Output the frequency data

pander::pander(complexities$FrequencyData)
Frequency_of_Complexities complexities
0 0
0 1
0 2
0 3
0 4
4.762 5
9.524 6
9.524 7
33.33 8
0 9
23.81 10
14.29 11
0 12
4.762 13

Output the distribution data

pander::pander(complexities$DistributionData)
Distribution_of_Complexities complexities
0 0
0 1
0 2
0 3
0 4
1 5
2 6
2 7
7 8
0 9
5 10
3 11
0 12
1 13

output the mean of the distribution

complexities$Mean
## [1] 8.714286

output the standard deviation of the output

complexities$StandardDev
## [1] 2.003568

output the standard error of the output

complexities$StandardErr
## [1] 0.4372144

Output the frequency plot

complexities$FrequencyPlot

Output the distribution plot

complexities$DistributionPlot

Section 3: Pathotype Frequency Distribution

You will need the change “Isolate”, “perc.susc”, and “Gene” again in this function to your correct column headers in your dataset. Again, you can change your susceptible cutoff value here for your dataset.

path.freq <- Pathotype.frequency.dist(Pathotype.Data, "Isolate", "perc.susc", "Rps", 60) 

You can parse the data in this chunk to show either the pathotype frequency for unique pathotypes, or to show each individual isolates pathotype that you tested.

frequency of unique pathotypes = $pathotypes_distribution

Individual pathotypes = $individual_pathotypes

Pathotype Isolate
1a, 1b, 1d, 1k, 2, 3a, 3b, 5, 6, 7 1
1a, 1b, 1c, 1k, 2, 3b, 3c, 4, 6, 7 2
1a, 1b, 1c, 1d, 1k, 2, 3b, 4, 6, 7 3
1a, 1c, 1d, 1k, 2, 3b, 5, 7 4
1a, 1c, 1d, 1k, 2, 3b, 6, 7 5
1a, 1c, 1d, 1k, 2, 3b, 5, 7 6
1a, 1b, 1c, 1d, 1k, 2, 3b, 7 7
1a, 1b, 1c, 1d, 1k, 2, 3b, 7 8
1a, 1c, 1d, 3b, 5, 7 9
1a, 1c, 3b, 5, 7 10
1a, 1c, 3b, 5, 6, 7 11
1a, 1b, 1c, 1d, 1k, 2, 6, 7 12
1a, 1b, 1c, 1d, 1k, 3b, 7 13
1a, 1b, 1c, 1k, 3b, 5, 6, 7 14
1a, 1b, 1c, 1d, 1k, 2, 3b, 4, 5, 6, 7 15
1a, 1b, 1c, 1k, 3b, 5, 7 16
1a, 1b, 1c, 1d, 1k, 2, 3b, 4, 5, 7 17
1a, 1b, 1c, 1d, 1k, 3a, 3b, 5, 6, 7 18
1a, 1b, 1c, 1d, 1k, 2, 3a, 3b, 3c, 6, 7 19
1a, 1b, 1c, 1d, 1k, 2, 3a, 3b, 3c, 5, 7 20
1a, 1b, 1c, 1d, 1k, 2, 3a, 3b, 3c, 4, 5, 6, 7 21

Section 4. Diversity index for Pathotypes

Diversity indices used to investigate pathotype diversity within and between states are shown below.

diversity <- Diversity_index(Pathotype.Data, "Isolate", "perc.susc", "Rps", 60)

Pathotype diversity indices can be parsed as shown:

Simple diversity = $Simple

Shannon diversity = $Shannon

Simpson diversity = $Simpson

Gleason diversity = $Gleason

Evenness = $Evenness

diversity$Evenness
## [1] 0.9891509

Recommendations are always appreciated!!

Chilvers Lab

Michigan State University

East Lansing, MI