Phytophthora sojae Pathotype Data Analysis

This analysis will provide distribution of susceptibilities, distribution of complexities with statistics, pathotype frequency distribution, and individual isolate pathotypes as well as diversity indices for pathotypes.These scripts are meant to be a substitute for the Hagis spreadsheet previously used for Phytophthora sojae pathotype analysis and provide the same necessary data as the Hagis sheet.

Packages needed for analysis

TRUE indicates package was installed and loaded correctly.

##      plyr   ggplot2 tidyverse   plotrix   stringr    pander     vegan 
##      TRUE      TRUE      TRUE      TRUE      TRUE      TRUE      TRUE 
##  devtools 
##      TRUE

Reading in your data. Do not change “Pathotype.Data”, just the name of the file, we will use the Practice data provided as an example. The function > here() will find the .csv file relative to the R project, so there is no need to set working directory or provide full file path.

The input should be in .csv format with any NA values encoded as blanks

If NA values are encoded differently, replace the option na = “” to what your values are encoded as

Pathotype.Data <- read.csv(here("Practice data set.csv"), na = "")

This reads the file called functions_themes.R and runs it. This will enable you to use the functions and defines graphic themes.

source(here("functions_themes.R"))

Section 1: Distribution of Susceptibilities

Do not change Pathotype.Data in any of the functions. You need only change the three spaces after Pathotype.Data to your associated column headings and the fourth space, which is susceptibility cutoff percentage.

Instructions

“Isolate” should be renamed to the column header for the column which identifies the isolates tested “perc.susc” should be renamed to the column header for the column which identifies the percent susceptible plants for each gene “Gene” should be renamed to the column header for the column which identifies the genes tested

These will need to be changed in all functions within this .Rmd file for the code to work.

The value in “Distribution_of_Susceptibilities(Pathotype.Data,”Isolate:, “perc.susc”, “Gene”, 60)" (in this case, 60), sets the cutoff for susceptible reactions. For example, currently all genes with 60% or more of the plants rated susceptible will return a “1” in The following scripts (meaning it is susceptible). You can change this to whatever percentage you require for your study.

The output will return a list with the first element equal to the graphic, and the second with the table. You can parse the list by putting a $ and showing if you want the Data or a graphic

Suceptibilities <- Distribution_of_Susceptibilities(Pathotype.Data, "Isolate", "perc.susc", "Rps", 60)

pander::pander(Suceptibilities$Data)

Rps	N	percent_isolates_pathogenic
Rps 1a	21	100
Rps 1b	15	71.43
Rps 1c	20	95.24
Rps 1d	16	76.19
Rps 1k	18	85.71
Rps 2	14	66.67
Rps 3a	5	23.81
Rps 3b	20	95.24
Rps 3c	4	19.05
Rps 4	5	23.81
Rps 5	13	61.9
Rps 6	11	52.38
Rps 7	21	100
susceptible	21	100

Suceptibilities$Graphic

Section 2: Distribution of Complexities

You will need the change “Isolate”, “perc.susc”, and “Gene” again in this function to your correct column headers in your dataset. Again, you can change your susceptible cutoff value here for your dataset

complexities <- Distribution_of_Complexities(Pathotype.Data, "Isolate", "perc.susc", "Rps", 60)

Output the frequency data

pander::pander(complexities$FrequencyData)

Frequency_of_Complexities	complexities
0	0
0	1
0	2
0	3
0	4
4.762	5
9.524	6
9.524	7
33.33	8
0	9
23.81	10
14.29	11
0	12
4.762	13

Output the distribution data

pander::pander(complexities$DistributionData)

Distribution_of_Complexities	complexities
0	0
0	1
0	2
0	3
0	4
1	5
2	6
2	7
7	8
0	9
5	10
3	11
0	12
1	13

output the mean of the distribution

complexities$Mean

## [1] 8.714286

output the standard deviation of the output

complexities$StandardDev

## [1] 2.003568

output the standard error of the output

complexities$StandardErr

## [1] 0.4372144

Output the frequency plot

complexities$FrequencyPlot

Output the distribution plot

complexities$DistributionPlot

Section 3: Pathotype Frequency Distribution

You will need the change “Isolate”, “perc.susc”, and “Gene” again in this function to your correct column headers in your dataset. Again, you can change your susceptible cutoff value here for your dataset.

path.freq <- Pathotype.frequency.dist(Pathotype.Data, "Isolate", "perc.susc", "Rps", 60)

You can parse the data in this chunk to show either the pathotype frequency for unique pathotypes, or to show each individual isolates pathotype that you tested.

frequency of unique pathotypes = $pathotypes_distribution

Individual pathotypes = $individual_pathotypes

Pathotype	Isolate
1a, 1b, 1d, 1k, 2, 3a, 3b, 5, 6, 7	1
1a, 1b, 1c, 1k, 2, 3b, 3c, 4, 6, 7	2
1a, 1b, 1c, 1d, 1k, 2, 3b, 4, 6, 7	3
1a, 1c, 1d, 1k, 2, 3b, 5, 7	4
1a, 1c, 1d, 1k, 2, 3b, 6, 7	5
1a, 1c, 1d, 1k, 2, 3b, 5, 7	6
1a, 1b, 1c, 1d, 1k, 2, 3b, 7	7
1a, 1b, 1c, 1d, 1k, 2, 3b, 7	8
1a, 1c, 1d, 3b, 5, 7	9
1a, 1c, 3b, 5, 7	10
1a, 1c, 3b, 5, 6, 7	11
1a, 1b, 1c, 1d, 1k, 2, 6, 7	12
1a, 1b, 1c, 1d, 1k, 3b, 7	13
1a, 1b, 1c, 1k, 3b, 5, 6, 7	14
1a, 1b, 1c, 1d, 1k, 2, 3b, 4, 5, 6, 7	15
1a, 1b, 1c, 1k, 3b, 5, 7	16
1a, 1b, 1c, 1d, 1k, 2, 3b, 4, 5, 7	17
1a, 1b, 1c, 1d, 1k, 3a, 3b, 5, 6, 7	18
1a, 1b, 1c, 1d, 1k, 2, 3a, 3b, 3c, 6, 7	19
1a, 1b, 1c, 1d, 1k, 2, 3a, 3b, 3c, 5, 7	20
1a, 1b, 1c, 1d, 1k, 2, 3a, 3b, 3c, 4, 5, 6, 7	21

Section 4. Diversity index for Pathotypes

Diversity indices used to investigate pathotype diversity within and between states are shown below.

diversity <- Diversity_index(Pathotype.Data, "Isolate", "perc.susc", "Rps", 60)

Pathotype diversity indices can be parsed as shown:

Simple diversity = $Simple

Shannon diversity = $Shannon

Simpson diversity = $Simpson

Gleason diversity = $Gleason

Evenness = $Evenness

diversity$Evenness

## [1] 0.9891509

Phytophthora sojae Pathotype Data Analysis

Austin McCoy, Zachary Noel

March 7th, 2019

Packages needed for analysis

Section 1: Distribution of Susceptibilities

Instructions

Section 2: Distribution of Complexities

Section 3: Pathotype Frequency Distribution

Section 4. Diversity index for Pathotypes

Recommendations are always appreciated!!

Chilvers Lab

Michigan State University

East Lansing, MI