This analysis will provide Distribution of Susceptibilities, Distribution of complexities with statistics, Pathotype frequency distribution, as well as diversity indices for pathotypes.These scripts are meant to be a substitute for the Hagis spreadsheet previously used for Phytophthroa sojae pathotype analysis and provide the same necessary data as the Hagis sheet.
To start, your data should be in a similar format to that of the sample data file provided. Most importantly, Having columns labelled “Isolate”“,”Rps“”, and “perc.susc”" are critical for the code to work with minimal to no edits on the users part.

Packages needed for analysis

TRUE indicates package was installed and loaded correctly.

##      plyr   ggplot2 tidyverse   plotrix   stringr    pander     vegan 
##      TRUE      TRUE      TRUE      TRUE      TRUE      TRUE      TRUE 
##  devtools 
##      TRUE

This reads the file called functions_themes.R and runs it. This will enable you to use the functions and defines graphic themes.

source(here("functions_themes.R"))

Reading in your data. Do not change “Pathotype.Data”, just the name of the file, we will use the Practice data provided as an example. The function here() will find the .csv file relative to the R project, so there is no need to set working directory or provide full file path.

The input should be in .csv format with any NA values encoded as blanks

If NA values are encoded differently, replace the option na = “” to what your values are encoded as

Pathotype.Data <- read_csv(here("Practice data set.csv"), na = "")

Section 1: Distribution of Susceptibilities

The value in “Distribution_of_Susceptibilities(60)” (in this case, 60), sets the cutoff for susceptible reactions. For example, currently all genes with 60% or more of the plants rated susceptible will return a “1” in previous scripts (see line 30).

The output will return a list with the first element equal to the graphic, and the second with the table. You can parse the list by putting a $ and showing if you want the Data or a graphic

Suceptibilities <- Distribution_of_Susceptibilities(60)
pander::pander(Suceptibilities$Data)
Rps N percent_isolates_pathogenic
Rps 1a 21 100
Rps 1b 15 71.43
Rps 1c 20 95.24
Rps 1d 16 76.19
Rps 1k 18 85.71
Rps 2 14 66.67
Rps 3a 5 23.81
Rps 3b 20 95.24
Rps 3c 4 19.05
Rps 4 5 23.81
Rps 5 13 61.9
Rps 6 11 52.38
Rps 7 21 100
susceptible 21 100
Suceptibilities$Graphic

Section 2: Distribution of Complexities

Again, you can change your susceptible cutoff value here for your dataset

complexities <- Distribution_of_Complexities(60)

Output the frequency data

pander::pander(complexities$FrequencyData)
Frequency_of_Complexities complexities
0 0
0 1
0 2
0 3
0 4
4.762 5
9.524 6
9.524 7
33.33 8
0 9
23.81 10
14.29 11
0 12
4.762 13

Output the distribution data

pander::pander(complexities$DistributionData)
Distribution_of_Complexities complexities
0 0
0 1
0 2
0 3
0 4
1 5
2 6
2 7
7 8
0 9
5 10
3 11
0 12
1 13

output the mean of the distribution

complexities$Mean
## [1] 8.714286

output the standard deviation of the output

complexities$StandardDev
## [1] 2.003568

output the standard error of the output

complexities$StandardErr
## [1] 0.4372144

Output the frequency plot

complexities$FrequencyPlot

Output the distribution plot

complexities$DistributionPlot

Section 3: Pathotype Frequency Distribution

path.freq <- Pathotype.frequency.dist(60) 
count Pathotype
1 1a, 1b, 1d, 1k, 2, 3a, 3b, 5, 6, 7
1 1a, 1b, 1c, 1k, 2, 3b, 3c, 4, 6, 7
1 1a, 1b, 1c, 1d, 1k, 2, 3b, 4, 6, 7
2 1a, 1c, 1d, 1k, 2, 3b, 5, 7
1 1a, 1c, 1d, 1k, 2, 3b, 6, 7
2 1a, 1b, 1c, 1d, 1k, 2, 3b, 7
1 1a, 1c, 1d, 3b, 5, 7
1 1a, 1c, 3b, 5, 7
1 1a, 1c, 3b, 5, 6, 7
1 1a, 1b, 1c, 1d, 1k, 2, 6, 7
1 1a, 1b, 1c, 1d, 1k, 3b, 7
1 1a, 1b, 1c, 1k, 3b, 5, 6, 7
1 1a, 1b, 1c, 1d, 1k, 2, 3b, 4, 5, 6, 7
1 1a, 1b, 1c, 1k, 3b, 5, 7
1 1a, 1b, 1c, 1d, 1k, 2, 3b, 4, 5, 7
1 1a, 1b, 1c, 1d, 1k, 3a, 3b, 5, 6, 7
1 1a, 1b, 1c, 1d, 1k, 2, 3a, 3b, 3c, 6, 7
1 1a, 1b, 1c, 1d, 1k, 2, 3a, 3b, 3c, 5, 7
1 1a, 1b, 1c, 1d, 1k, 2, 3a, 3b, 3c, 4, 5, 6, 7

Section 4. Diversity index for Pathotypes

Diversity indices used to investigate pathotype divversity within and between states are shown below. In Version 1 of this document, only code for analyzing a single state at a time is shown. In the future, scripts could be produced so that multiple states could be analyzed at once, independently of each other. Therefore, if analyzing multiple states pathotype data, each state must be analyzed from its own .csv document.

Determines the number of isolates within the data

Number_of_isolates <- length(levels(Pathotype.Data$Isolate))
Number_of_isolates
## [1] 21

Determining the number of unique pathotypes for this analysis

Number_of_pathotypes <- specnumber(path.freq$count)
Number_of_pathotypes
## [1] 19

Simple diversity will show the proportion of unique pathotypes to total isolates. As the values gets closer to 1, there is greater diversity in pathoypes within the population.

Simple <- Number_of_pathotypes/ Number_of_isolates
Simple
## [1] 0.9047619

An alternate version of Simple diversity index. This index is less sensitive to sample size than the simple index.

Gleason <- (Number_of_pathotypes - 1)/log(Number_of_isolates)
Gleason
## [1] 5.912257

Shannon diversity index is typically between 1.5 and 3.5. As richness and evenness of the population increase, so does the Shannon index value

Shannon <- diversity(path.freq[-1], index="shannon")
Shannon
## [1] 2.912494

Simpsom diversity index values range from 0 to 1. 1 represents high diversity and 0 represents no diversity.

Simpson <- diversity(path.freq[-1], index="simpson")
Simpson
## [1] 0.9433107

Evenness ranges from 0 to 1. As the Eveness value approaches 1, there is a more evene distribution of each pathoypes frequency within the population.

Evenness <- Shannon/ log(Number_of_pathotypes)
Evenness
## [1] 0.9891509

Recommendations are always appreciated!!

Chilvers Lab

Michigan State University

East Lansing, MI