This supplementary is available under https://github.com/marruett/Supplementary_Ruett_Precision_Agriculture and outlines the Decision Analysis approaches used in Assessing expected utility and profitability to support decision-making for disease control strategies in ornamental heather production
Ruett et al. (under review). Here we provide our input table, code and results to allow for transparent reproduction of the model. To create and organize this supplementary file we used the packages devtools
(Wickham et al. 2019b), knitr
(Xie 2019a), kableExtra
(Zhu 2019) and bookdown
(Xie 2019b). All other applied R
packages are cited and listed in the References.
The input table Input_CBA.csv
contains: Variable names, unit, distributions (posnorm = positive normal distribution, const = constant, tnorm_0_1= truncated normal distribution), lower bound, mean, upper bound and definition of all the input variables for the model. The abbreviation CBA
refers to Cost Benefit Assessment
.
Here we provide our mathematical model that enables simulation of decision outcomes. The calculations in this document are based on random draws from the distributions in the input table Input_CBA.csv
. Explanations for code chunks are not part of the actual simulation. The following code descriptions will guide readers through our process of code development.
Load the input table Input_CBA.csv
and make variables.
input_table <- "Input_CBA.csv"
make_variables <- function(est,n = 1)
{ x <- random(rho = est, n = n)
for(i in colnames(x)) assign(i, as.numeric(x[1,i]),envir = .GlobalEnv)}
make_variables(estimate_read_csv(input_table))
We generated a function called Simulation
function wrapping another function called Cost_Benefit
to evaluate the decision options in decisionSupport
(Luedeling et al. 2021). The Cost_Benefit
function represents the code to compute the costs and benefits for a monitoring strategy over a determined time period. We were able to calculate the cashflow and the Net Present Value (NPV) for the three monitoring strategies since we wrapped the Simulation
function around the Cost_Benefit
function. The following calculations show how the functions are coded (see the entire function below).
Simulation <- function(){
Cost_Benefit <- function(n_years, # number of years to run the simulation
highrisk_year, # Years with a high risk of fungal infections
CV, # coefficient indicating variation into a time series
area, # area of the heather production system
sample_costs, # costs per lab sample
value_of_discarded_plant, # value of a sampled and then
# discarded heather plant
plant_value_of_A1_quality, # plant value of marketable heather
# plant that would not have achieved high quality without
# the respective monitoring strategy.
Initial_investment, # Mandatory investment in the first year
# to start then respective monitoring approach
additional_investment, # Occurring investments is the following
# years
labor_costs, # Labor costs to conduct monitoring
post_processing_costs, # Data processing of acquired data
sample_number, # Number of lab samples
additional_saved_plants, # number of heather plants that would
# not have achieved high marketable quality without the
# respective monitoring strategy.
adjustment_sample_size, # Adjustment to increase sample size
# in high risk years
increased_resource_use_costs, # Costs for increased resource
# use to protect heather plants in high risk years
resource_savings # Monetary savings for reduced resource
# use in normal risk years due to the respective monitoring
# strategy
){
First we defined the labor costs per area.
yearly_labor_costs_per_ha <- vv(labor_costs, CV, n = n_years)
Then we defined the number of samples per hectare. We used the value varier function vv()
from the decisionSupport
package (Luedeling et al. 2021) to account for the yearly variation of e.g. taking samples under different risk conditions.
samples_per_ha <- round(vv(sample_number, CV, n = n_years) *
(1+highrisk_year * (vv(adjustment_sample_size, CV, n = n_years))), digits = 0)
We defined the sample costs per hectare and the value of discarded plants per hectare for each year.
sample_costs_per_ha <- samples_per_ha * sample_costs
discarded_plant_value_per_ha <- samples_per_ha *
vv(value_of_discarded_plant, CV, n = n_years)
We defined the total labor and sample costs for the whole production system.
labor_and_sample_cost <- (yearly_labor_costs_per_ha +
sample_costs_per_ha +
discarded_plant_value_per_ha) * area
Here we defined the initial investment of the first year and additional investments (maintenance costs) of the following years. Then we summed up the labor and sample costs, the data post processing costs, the initial investment and the additional investments, and defined the result as the total costs.
total_cost <- labor_and_sample_cost +
vv(post_processing_costs, CV, n = n_years) +
c(Initial_investment, rep(0,n_years-1)) +
c(0,vv(additional_investment, CV, n = n_years-1))
Benefits of monitoring strategies consist of resource savings that are achieved in normal risk years and the value of saved plants in high risk years.
In normal risk years the number of frequent fungicide applications can be reduced to some extent, because more monitoring increases knowledge about plant health status in the field. Therefore, resource savings can be achieved in these years.
Although monitoring allows for resource savings in normal risk years, no resource savings are achieved in high risk years because a higher number of pesticide applications is required. However, thanks to more precise knowledge about spatially distributed plant vitality, producers are able to protect quality of more plants, which are safely cultivated until they are sold.
We defined the resource savings in normal risk years and the value of saved high quality plants in high risk years. We summed up these monetary values and defined the result as the total benefits.
resource_savings <- vv(resource_savings, CV, n = n_years) * area * (1-highrisk_year)
value_of_high_quality_plants <- vv(additional_saved_plants, CV, n = n_years) *
plant_value_of_A1_quality * area * highrisk_year
total_benefits <- resource_savings + value_of_high_quality_plants
In the return command we defined the output of the of the Cost_Benefit
function as cashflow.
return(cashflow = total_benefits - total_cost)}
High risk years can increase sample size and resource use. Occurrence of high risk years was defined using the chance_event function from decisionSupport
(Luedeling et al. 2021). The chance of a high-risk year occurring (chance_high_risk) was defined in the input table as a probability of 40% to 60%, reflecting the uncertainty for risky weather events.
highrisk_year <- chance_event(chance_high_risk, n = n_years)
Here we simulated the Cost_Benefit
function for standard monitoring (Baseline) and defined the result as 'cashflow_G'.
cashflow_G <- Cost_Benefit(n_years = n_years,
highrisk_year = highrisk_year,
CV = var_CV,
area = production_area,
sample_costs = lab_costs_per_sample,
value_of_discarded_plant = plant_value_of_discarded_plant,
plant_value_of_A1_quality = plant_value_of_A1_quality,
Initial_investment = Initial_investment_G ,
additional_investment = additional_investment_G ,
labor_costs = labor_costs_G,
post_processing_costs = post_processing_costs_G,
sample_number = sample_number_G,
additional_saved_plants = Number_of_saved_high_quality_plants_G,
adjustment_sample_size = adjustment_sample_size_G,
resource_savings = resource_savings_G)
Here we calculated the Net Present Value for 'Baseline' monitoring.
NPV_G <- discount(cashflow_G, discount_rate, calculate_NPV = TRUE)
Here we simulated the Cost_Benefit
function for more intense monitoring (Improved) and defined the result as 'cashflow_I'.
cashflow_I <- Cost_Benefit(n_years = n_years,
highrisk_year = highrisk_year,
CV = var_CV,
area = production_area,
sample_costs = lab_costs_per_sample,
value_of_discarded_plant = plant_value_of_discarded_plant,
plant_value_of_A1_quality = plant_value_of_A1_quality,
Initial_investment = Initial_investment_I ,
additional_investment = additional_investment_I ,
labor_costs = labor_costs_I,
post_processing_costs = post_processing_costs_I,
sample_number = sample_number_I,
additional_saved_plants = Number_of_saved_high_quality_plants_I,
adjustment_sample_size = adjustment_sample_size_I,
resource_savings = resource_savings_I)
Here we calculated the Net Present Value for 'Improved' monitoring.
NPV_I <- discount(cashflow_I, discount_rate, calculate_NPV = TRUE)
Here we calculated the Net Present Value for the decision to apply 'Improved' instead of 'Baseline' monitoring.
comp_NPV_IG <- NPV_I - NPV_G
Here we simulated the Cost_Benefit
function for monitoring with sensor technology (Sensor) and defined the result as 'cashflow_S'.
cashflow_S <- Cost_Benefit(n_years = n_years,
highrisk_year = highrisk_year,
CV = var_CV,
area = production_area,
sample_costs = lab_costs_per_sample,
value_of_discarded_plant = plant_value_of_discarded_plant,
plant_value_of_A1_quality = plant_value_of_A1_quality,
Initial_investment = Initial_investment_S ,
additional_investment = additional_investment_S ,
labor_costs = labor_costs_S,
post_processing_costs = post_processing_costs_S,
sample_number = sample_number_S,
additional_saved_plants = Number_of_saved_high_quality_plants_S,
adjustment_sample_size = adjustment_sample_size_S,
resource_savings = resource_savings_S)
Here we calculated the Net Present Value for 'Sensor' monitoring.
NPV_S <- discount(cashflow_S, discount_rate, calculate_NPV = TRUE)
Here we calculated the Net Present Value for the decision to apply 'Sensor' instead of 'Baseline' monitoring.
comp_NPV_SG <- NPV_S - NPV_G
At the end of the Simulation
function we listed all calculated outcomes.
return(list(cashflow_G = cashflow_G,
NPV_G = NPV_G,
cashflow_I = cashflow_I,
NPV_I = NPV_I,
cashflow_S = cashflow_S,
NPV_S = NPV_S,
comp_NPV_IG = comp_NPV_IG,
comp_NPV_SG = comp_NPV_SG))
}
Here we defined legend file, results folder and figures folder.
legend_file <- "Legend_CBA.csv"
MC_Results_folder <- "MC_Results_CBA"
EVPI_Results_folder <- "EVPI_Results_CBA"
The decisionSupport
function facilitated the start of the Monte Carlo simulation.
decisionSupport(inputFilePath = input_table,
outputPath = MC_Results_folder,
welfareFunction = Simulation,
write_table = TRUE,
numberOfModelRuns = 10000,
functionSyntax = "plainNames")
We produced all outcome variables with the decisionSupport
function. We stored the Monte Carlo Simulation results into the MC_file and generate the EVPI files.
MC_file <- read.csv(paste(MC_Results_folder,"/mcSimulationResults.csv",sep = ""))
MC_file_without_cashflow <- select(MC_file, -c(1, starts_with("cashflow")))
multi_EVPI(mc = MC_file_without_cashflow,first_out_var = "NPV_G",
write_table = TRUE,outfolder = EVPI_Results_folder)
welfare_summary <- read.csv(paste(MC_Results_folder,"/welfareDecisionSummary.csv",sep = ""))
Here our code is shown at one piece for an overview of all calculations.
library(decisionSupport)
input_table <- "Input_CBA.csv"
legend_file <- "Legend_CBA.csv"
MC_Results_folder <- "MC_Results_CBA"
EVPI_Results_folder <- "EVPI_Results_CBA"
make_variables <- function(est,n = 1)
{ x <- random(rho = est, n = n)
for(i in colnames(x)) assign(i, as.numeric(x[1,i]),envir = .GlobalEnv)}
make_variables(estimate_read_csv(input_table))
dir.create(MC_Results_folder)
Simulation <- function(){
Cost_Benefit <- function(n_years,
highrisk_year,
CV,
area,
sample_costs,
value_of_discarded_plant,
plant_value_of_A1_quality,
Initial_investment,
additional_investment,
labor_costs,
post_processing_costs,
sample_number,
additional_saved_plants,
adjustment_sample_size,
increased_resource_use_costs,
resource_savings){
yearly_labor_costs_per_ha <- vv(labor_costs, CV, n = n_years)
samples_per_ha <- round(vv(sample_number, CV, n = n_years) *
(1+highrisk_year * (vv(adjustment_sample_size, CV, n = n_years))), digits = 0)
sample_costs_per_ha <- samples_per_ha * sample_costs
discarded_plant_value_per_ha <- samples_per_ha *
vv(value_of_discarded_plant, CV, n = n_years)
labor_and_sample_cost <- (yearly_labor_costs_per_ha +
sample_costs_per_ha +
discarded_plant_value_per_ha) * area
total_cost <- labor_and_sample_cost +
vv(post_processing_costs, CV, n = n_years) +
c(Initial_investment, rep(0,n_years-1)) +
c(0,vv(additional_investment, CV, n = n_years-1))
resource_savings <- vv(resource_savings, CV, n = n_years) * area * (1-highrisk_year)
value_of_high_quality_plants <- vv(additional_saved_plants, CV, n = n_years) *
plant_value_of_A1_quality * area * highrisk_year
total_benefits <- resource_savings + value_of_high_quality_plants
return(cashflow = total_benefits - total_cost)}
highrisk_year <- chance_event(chance_high_risk, n = n_years)
cashflow_G <- Cost_Benefit(n_years = n_years,
highrisk_year = highrisk_year,
CV = var_CV,
area = production_area,
sample_costs = lab_costs_per_sample,
value_of_discarded_plant = plant_value_of_discarded_plant,
plant_value_of_A1_quality = plant_value_of_A1_quality,
Initial_investment = Initial_investment_G ,
additional_investment = additional_investment_G ,
labor_costs = labor_costs_G,
post_processing_costs = post_processing_costs_G,
sample_number = sample_number_G,
additional_saved_plants = Number_of_saved_high_quality_plants_G,
adjustment_sample_size = adjustment_sample_size_G,
resource_savings = resource_savings_G)
NPV_G <- discount(cashflow_G, discount_rate, calculate_NPV = TRUE)
cashflow_I <- Cost_Benefit(n_years = n_years,
highrisk_year = highrisk_year,
CV = var_CV,
area = production_area,
sample_costs = lab_costs_per_sample,
value_of_discarded_plant = plant_value_of_discarded_plant,
plant_value_of_A1_quality = plant_value_of_A1_quality,
Initial_investment = Initial_investment_I ,
additional_investment = additional_investment_I ,
labor_costs = labor_costs_I,
post_processing_costs = post_processing_costs_I,
sample_number = sample_number_I,
additional_saved_plants = Number_of_saved_high_quality_plants_I,
adjustment_sample_size = adjustment_sample_size_I,
resource_savings = resource_savings_I)
NPV_I <- discount(cashflow_I, discount_rate, calculate_NPV = TRUE)
comp_NPV_IG <- NPV_I - NPV_G
cashflow_S <- Cost_Benefit(n_years = n_years,
highrisk_year = highrisk_year,
CV = var_CV,
area = production_area,
sample_costs = lab_costs_per_sample,
value_of_discarded_plant = plant_value_of_discarded_plant,
plant_value_of_A1_quality = plant_value_of_A1_quality,
Initial_investment = Initial_investment_S ,
additional_investment = additional_investment_S ,
labor_costs = labor_costs_S,
post_processing_costs = post_processing_costs_S,
sample_number = sample_number_S,
additional_saved_plants = Number_of_saved_high_quality_plants_S,
adjustment_sample_size = adjustment_sample_size_S,
resource_savings = resource_savings_S)
NPV_S <- discount(cashflow_S, discount_rate, calculate_NPV = TRUE)
comp_NPV_SG <- NPV_S - NPV_G
return(list(cashflow_G = cashflow_G,
NPV_G = NPV_G,
cashflow_I = cashflow_I,
NPV_I = NPV_I,
cashflow_S = cashflow_S,
NPV_S = NPV_S,
comp_NPV_IG = comp_NPV_IG,
comp_NPV_SG = comp_NPV_SG))
}
decisionSupport(inputFilePath = input_table,
outputPath = MC_results_folder,
welfareFunction = Simulation,
write_table = TRUE,
numberOfModelRuns = 10000,
functionSyntax = "plainNames")
MC_file <- read.csv(paste(MC_Results_folder,"/mcSimulationResults.csv",sep = ""))
MC_file_without_cashflow <- select(MC_file, -c(1, starts_with("cashflow")))
multi_EVPI(mc = MC_file_without_cashflow,first_out_var = "NPV_G",
write_table = TRUE,outfolder = EVPI_Results_folder)
welfare_summary <- read.csv(paste(MC_Results_folder,"/welfareDecisionSummary.csv",sep = ""))
Here we calculated the expected utility for different monitoring strategies. First, we loaded the simulation results into the R environment. We used the package dyplyr
(Wickham et al. 2019a) to select the simulated Net Present Values from each monitoring strategy.
data_G <- read.csv("mcSimulationResults.csv")
data_G <- dplyr::select(data_G, starts_with("NPV_G")) %>%
stack(drop=FALSE)
data_G$values <- as.numeric(data_G$values)
data_I <- read.csv("mcSimulationResults.csv")
data_I <- dplyr::select(data_I, starts_with("NPV_I")) %>%
stack(drop=FALSE)
data_I$values <- as.numeric(data_I$values)
data_S <- read.csv("mcSimulationResults.csv")
data_S <- dplyr::select(data_S, starts_with("NPV_S")) %>%
stack(drop=FALSE)
data_S$values <- as.numeric(data_S$values)
Then we started the risk premium calculation for the monitoring strategies.
risk_premium_G <-(1/2)*0.0001*var(data_G$values)
risk_premium_I <-(1/2)*0.0001*var(data_I$values)
risk_premium_S <-(1/2)*0.0001*var(data_S$values)
We calculated the semivariance for the monitoring strategies.
for(z in 1 : 10000){
data_G$NewColumn[z] <- ifelse(min((data_G$values[z]- mean(data_G$values)),0) < 0,
(data_G$values[z]- mean(data_G$values))^2, 0)
}
Semivaricance_G <- mean(data_G$NewColumn)
for(z in 1 : 10000){
data_I$NewColumn[z] <- ifelse(min((data_I$values[z]- mean(data_I$values)),0) < 0,
(data_I$values[z]- mean(data_I$values))^2, 0)
}
Semivaricance_I <- mean(data_I$NewColumn)
for(z in 1 : 10000){
data_S$NewColumn[z] <- ifelse(min((data_S$values[z]- mean(data_S$values)),0) < 0,
(data_S$values[z]- mean(data_S$values))^2, 0)
}
Semivaricance_S <- mean(data_S$NewColumn)
Here we computed the certainty equivalents and the risk aversion coefficients and stored the results in a data frame called risk_aversion_data
.
for(r in c(-1e-01, -1e-02, -1e-03, -1e-04, -1e-05, -1e-06, 0,
1e-06, 1e-05, 1e-04, 1e-03, 1e-02, 1e-01)){
Certainty_equivalent_G_ <- mean(data_G$values) - Semivaricance_G * r
assign(paste("CE_G_", r, sep = ""), Certainty_equivalent_G_)
Certainty_equivalent_I_ <- mean(data_I$values) - Semivaricance_I * r
assign(paste("CE_I_", r, sep = ""), Certainty_equivalent_I_)
Certainty_equivalent_S_ <- mean(data_S$values) - Semivaricance_S * r
assign(paste("CE_S_", r, sep = ""), Certainty_equivalent_S_)
}
Certainty_equivalent <- c(`CE_G_-0.1`, `CE_G_-0.01`, `CE_G_-0.001`, `CE_G_-1e-04`,
`CE_G_-1e-05`, `CE_G_-1e-06`,`CE_G_0`, `CE_G_1e-06`,
`CE_G_1e-05`, `CE_G_1e-04`, `CE_G_0.001`, `CE_G_0.01`,
`CE_G_0.1`, `CE_I_-0.1`, `CE_I_-0.01`, `CE_I_-0.001`,
`CE_I_-1e-04`, `CE_I_-1e-05`, `CE_I_-1e-06`, `CE_I_0`,
`CE_I_1e-06`, `CE_I_1e-05`, `CE_I_1e-04`, `CE_I_0.001`,
`CE_I_0.01`, `CE_I_0.1`,`CE_S_-0.1`, `CE_S_-0.01`, `CE_S_-0.001`,
`CE_S_-1e-04`, `CE_S_-1e-05`, `CE_S_-1e-06`,`CE_S_0`, `CE_S_1e-06`,
`CE_S_1e-05`, `CE_S_1e-04`, `CE_S_0.001`, `CE_S_0.01`, `CE_S_0.1`)
monitoring_type <- c(rep("Baseline", 13),rep("Improved", 13),rep("Sensor", 13))
risk_aversion_coefficient <- c(-1e-01, -1e-02, -1e-03, -1e-04, -1e-05, -1e-06, 0, 1e-06,
1e-05, 1e-04, 1e-03, 1e-02, 1e-01)
options(scipen = 999)
risk_aversion_data <- data.frame(Certainty_equivalent, risk_aversion_coefficient, monitoring_type)
The following plots are generated from the mcSimulationResults.csv
from the decisionSupport
function.
We plotted probability distributions and merged them to illustrate the frequency of Monte Carlo simulation outcomes. Additionally, we integrated boxplots to compare Net Present Values of Baseline, Improved, and Sensor.
For data visualization we applied the ggplot2
(Wickham 2016), ggstance
(Henry et al. 2019) and dplyr
(Wickham et al. 2019a) packages. We used the function ggarrange
from the ggpubr
package (Kassambara 2019) to merge figures.
Probability density distributions (scaled density among 10,000 runs of a Monte Carlo simulation) of the Net Present Value (Euros) for three heather monitoring strategies. Baseline: Current visual monitoring regime. Improved: More intensive visual monitoring. Sensor: Sensor-based monitoring.
Stochastic efficiency with respect to a function (SERF) results for monitoring strategies Baseline, Improved, and Sensor. In the upper figure the y-axis is shown on a normal scale, while the lower figure uses a log-scale to zoom into the positive parts of the SERF functions. In the latter case negative values are dropped from the illustration, because negative certainty equivalent values imply that decision makers would not adopt an activity at all.
We plotted probability distributions and merged them to illustrate the frequency of Monte Carlo simulation outcomes. Additionally, we integrated boxplots to compare Net Present Values of DoMoreVisual and UseSensor.
Probability density distributions (scaled density among 10,000 runs of a Monte Carlo simulation) of Net Present Values (Euros) for two heather monitoring decisions. DoMoreVisual: More intensive visual monitoring compared to the current monitoring regime. UseSensor: Sensor-based monitoring compared to the current monitoring regime.
Comparison of Information Value and Variable Importance for DoMoreVisual and UseSensor.
Variable Importance in the Projection (VIP scores) and Information Value (EVPI values) for the DoMoreVisual and UseSensor decisions. For the Variable Importance, variables that have a positive impact on the projected NPVs are represented by blue bars, while those with a negative impact are represented by orange bars.
We particularly acknowledge the heather farmers Gerd Canders, Tom Canders (Europlant Canders GmbH, Straelen, Germany), Matthias Küppers and Verena Zachau Küppers (Jungpflanzen Küppers GbR, Wachtendonk, Germany) for their participation and contributions throughout the research process. We thank Peter Tiede-Arlt, Rainer Peters and Torsten Wolf (Landwirtschaftskammer Nordrhein-Westfalen) for their commitment and advice in this study.
This research was funded by Stiftung Zukunft NRW within the research project inruga (Innovationen für NRW zur Steigerung der Ressourceneffizienz und Umweltverträglichkeit im Gartenbau, „Entscheidungshilfen im Zierpflanzenbau“). The funders were not involved in the preparation of this article.
Henry, L., Wickham, H., & Chang, W. (2019). Ggstance: Horizontal ’ggplot2’ Components. https://CRAN.R-project.org/package=ggstance
Kassambara, A. (2019). Ggpubr: ’Ggplot2’ Based Publication Ready Plots. https://CRAN.R-project.org/package=ggpubr
Luedeling, E., Goehring, L., Schiffers, K., Whitney, C., & Fernandez, E. (2021). decisionSupport: Quantitative Support of Decision Making under Uncertainty. http://www.worldagroforestry.org/
Wickham, H. (2016). Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org
Wickham, H., François, R., Henry, L., & Müller, K. (2019a). Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr
Wickham, H., Hester, J., & Chang, W. (2019b). Devtools: Tools to Make Developing R Packages Easier. https://CRAN.R-project.org/package=devtools
Xie, Y. (2019a). Knitr: A General-Purpose Package for Dynamic Report Generation in R.
Xie, Y. (2019b). Bookdown: Authoring Books and Technical Documents with R Markdown.
Zhu, H. (2019). kableExtra: Construct Complex Table with ’kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra