Intro

A script to calculate the mean GF distance for empirical data.

Load the data and necessary libraries

Load the experimental data from Boezen et al. 2023 for CMV infection of Nicotiana tabacum. These data come from the Frontiers in Virology paper on mixed infections, from the single infecttion CMV control. Among the possible CMV datsets, these data were chosen becuase there are a reasonable number of replicates for one condition (9 plants).

# Load libraries
library(magrittr)
library(utils)
library(plyr)
library(dplyr)

# Load data
setwd("C:/R_Data")
df_data <- read.csv("Dataset_2_CMV_Boezen.csv") %>%
    as.data.frame()

Analysis of experimental data

Calculate the GF distance using some simple R code, after creating an array containing only the GF data you want.

gf.data <- df_data %>%
  select(f_RNA1, f_RNA2, f_RNA3)

# Now determine the mean distance between all genome formula values.
n.row <- nrow(gf.data)
pw.comp <- array(data = NA, dim = c(n.row, n.row))

for(i in 1:n.row) {
for(j in 1:n.row) {
    pw.comp[i,j] = sqrt( sum ( (gf.data[i,] - gf.data[j,])^2)  )    
}
}

# Now determine the mean distance to neighbors for each value.
dist.results <- rep(NA, n.row)
comp <- 1:n.row

for(i in 1:n.row) {
    comp.now = comp[-i]
    dist.results[i] = mean(pw.comp[i, comp.now])
}

# Determine mean and SD of the distance values, the two goals of our analysis.
mean.gf.dist = mean(dist.results)
sd.gf.dist = sd(dist.results)
num.reps.d = nrow(gf.data)

print(mean.gf.dist)
## [1] 0.2065407
print(sd.gf.dist)
## [1] 0.06894688