Intro

A script to calculate the mean GF distance for empirical data.

Load the data and necessary libraries

Load the experimental data from Wu et al. 2016, for AMV infeciton of Nicotiana benthamiana. These data were chosen becuase there are a reasonable number of replicates for one condition (6 plants), and only the tissues which were present for all samples were included. I.e., Tissue 4 (remaining tissue) has been removed. Tissue 1 -> inoculated leaf, 2 -> middle leaf, and 3 -> upper leaf.

# Load libraries
library(magrittr)
library(utils)
library(plyr)
library(dplyr)

# Load data
setwd("C:/R_Data")
df_data <- read.csv("Dataset_1_AMV_Wu.csv") %>%
    as.data.frame()

Analysis

Calculate the GF distance using some simple R code, after creating an array containing only the GF data you want. Run the code for samples 1 to 3 to obtain all the results needed. In this example only the inoculated leaf is analyzed.

gf.data <- df_data %>%
  filter(sample == 1) %>%
  select(f_RNA1, f_RNA2, f_RNA3)

# Now determine the mean distance between all genome formula values.
n.row <- nrow(gf.data)
pw.comp <- array(data = NA, dim = c(n.row, n.row))

for(i in 1:n.row) {
for(j in 1:n.row) {
    pw.comp[i,j] = sqrt( sum ( (gf.data[i,] - gf.data[j,])^2)  )    
}
}

# Now determine the mean distance to neighbours for each value.
dist.results <- rep(NA, n.row)
comp <- 1:n.row

for(i in 1:n.row) {
    comp.now = comp[-i]
    dist.results[i] = mean(pw.comp[i, comp.now])
}

# Determine mean and distance values, the two goals of our analysis.
mean.gf.dist = mean(dist.results)
sd.gf.dist = sd(dist.results)
num.reps.d = nrow(gf.data)

print(mean.gf.dist)
## [1] 0.07697512
print(sd.gf.dist)
## [1] 0.01541503