This document contains a Bayesian analysis of the hypotheses investigated by Femmer et al.1.
Start by loading the prepared data.
source("../util/data-loading.R")
d <- load.data()
# print the data to ensure that all variables have the correct type
str(d)
## 'data.frame': 105 obs. of 21 variables:
## $ PID : chr "P1" "P1" "P1" "P1" ...
## $ RID : chr "R1" "R2" "R3" "R4" ...
## $ Age : Ord.factor w/ 3 levels "19-24"<"25-30"<..: 2 2 2 2 2 2 2 2 2 2 ...
## $ Program : Ord.factor w/ 4 levels "Unknown"<"Bachelor"<..: 3 3 3 3 3 3 3 3 3 3 ...
## $ REQuizPerformance : int 9 9 9 9 9 9 9 7 7 7 ...
## $ ExpProgAca : Ord.factor w/ 4 levels "no experience"<..: 3 3 3 3 3 3 3 2 2 2 ...
## $ ExpProgInd : Ord.factor w/ 4 levels "no experience"<..: 4 4 4 4 4 4 4 1 1 1 ...
## $ ExpSEAca : Ord.factor w/ 4 levels "no experience"<..: 4 4 4 4 4 4 4 3 3 3 ...
## $ ExpSEInd : Ord.factor w/ 4 levels "no experience"<..: 2 2 2 2 2 2 2 1 1 1 ...
## $ ExpREAca : Ord.factor w/ 4 levels "no experience"<..: 2 2 2 2 2 2 2 2 2 2 ...
## $ ExpREInd : Ord.factor w/ 4 levels "no experience"<..: 1 1 1 1 1 1 1 1 1 1 ...
## $ passive : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
## $ actors.expected : int 1 1 2 0 0 0 1 1 1 2 ...
## $ actors.found : int 1 1 2 0 0 0 1 1 1 2 ...
## $ actors.missing : int 0 0 0 0 0 0 0 0 0 0 ...
## $ associations.expected: int 2 2 2 1 2 3 2 2 2 2 ...
## $ associations.found : int 0 0 0 1 0 2 2 2 2 2 ...
## $ associations.missing : int 2 2 2 0 2 1 0 0 0 0 ...
## $ objects.expected : int 3 3 2 2 3 4 3 3 3 2 ...
## $ objects.found : int 1 2 1 2 2 4 2 3 3 2 ...
## $ objects.missing : int 2 1 1 0 1 0 1 0 0 0 ...
We perform the final step of the statistical causal inference framework by Siebert2, i.e., the estimation step, based on the two previous modeling and identification steps as documented in the causal assumptions notebook.
First, we define a regression model that predicts the number of missing actors based on a set of predictors identified during the causal assumptions. The most eligible probability distribution of the response variable is a Binomial distribution, since the response variable is a whole number bounded by the expected number of actors.
formula <- (actors.missing | trials(actors.expected) ~
1 + # standard success in the task of finding actors
passive + # impact of the use of passive voice
(1|RID) + # impact of the difficulty of the individual requirements
(1|PID) + # impact of the skill of individual participants
ExpREAca + # impact of the participants academic experience in RE
ExpREInd) # impact of the participants industrial experience in RE
get_prior(formula, family=binomial, data=d)
## prior class coef group resp dpar nlpar lb ub
## (flat) b
## (flat) b ExpREAca.C
## (flat) b ExpREAca.L
## (flat) b ExpREAca.Q
## (flat) b ExpREInd.C
## (flat) b ExpREInd.L
## (flat) b ExpREInd.Q
## (flat) b passiveTRUE
## student_t(3, 0, 2.5) Intercept
## student_t(3, 0, 2.5) sd 0
## student_t(3, 0, 2.5) sd PID 0
## student_t(3, 0, 2.5) sd Intercept PID 0
## student_t(3, 0, 2.5) sd RID 0
## student_t(3, 0, 2.5) sd Intercept RID 0
## source
## default
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## default
## default
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
For each of the selected predictors, we select an uninformative prior distribution that encodes our previous knowledge about the causal phenomenon.
priors <- c(prior(normal(-1, 1), class = Intercept),
prior(normal(0, 1), class = b),
prior(weibull(2, 1), class = sd, coef = Intercept, group = RID),
prior(weibull(2, 1), class = sd, coef = Intercept, group = PID),
prior(exponential(1), class = sd))
To assert that the priors are feasible, we sample from the priors without training the Bayesian model with the data.
m.prior <-
brm(data = d, family = binomial, formula, prior = priors,
iter = 4000, warmup = 1000, chains = 4, cores = 4,
seed = 4, sample_prior="only",
file = "fits/m.actors.prior"
)
We visualize the results to graphically check the priors eligibility.
ndraws <- 400
brms::pp_check(m.prior, type="bars", ndraws=ndraws)
The distribution of the predicted values for the response variable (\(y_{rep}\)) overlaps with the actually observed response variable distribution (\(y\)), confirming the feasibility of the priors.
Given the feasibility of the priors, we train the regression model. This process updates the prior distributions of all predictor coefficients.
m <-
brm(data = d, family = binomial, formula, prior = priors,
iter = 4000, warmup = 1000, chains = 4, cores = 4,
seed = 4,
file = "fits/m.actors"
)
To assert that the training process worked correctly, we plot the Markov chains.
plot(m)
All chains seem fine. Further, we sample from the posterior distributions similar to the prior predictive check.
brms::pp_check(m, type="bars", ndraws=ndraws)
The distribution of the predicted values for the response variable (\(y_{rep}\)) still overlaps with the actually observed response variable distribution (\(y\)). Additionally, the distribution shrunk around the actually observed values, which indicates that the model has improved its predictive power through training.
Finally, we evaluate the trained model. An initial glance at the predictor coefficients shows their posterior distribution.
summary(m)
## Family: binomial
## Links: mu = logit
## Formula: actors.missing | trials(actors.expected) ~ 1 + passive + (1 | RID) + (1 | PID) + ExpREAca + ExpREInd
## Data: d (Number of observations: 105)
## Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
## total post-warmup draws = 12000
##
## Group-Level Effects:
## ~PID (Number of levels: 15)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.64 0.35 0.11 1.47 1.00 10380 8442
##
## ~RID (Number of levels: 7)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.24 0.44 0.46 2.17 1.00 10328 6386
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -2.08 0.77 -3.60 -0.57 1.00 14518 9182
## passiveTRUE 0.47 0.70 -0.89 1.86 1.00 20187 9354
## ExpREAca.L -0.30 0.73 -1.73 1.11 1.00 20933 9443
## ExpREAca.Q -0.13 0.72 -1.56 1.29 1.00 17709 9765
## ExpREAca.C -0.50 0.64 -1.75 0.79 1.00 19624 9651
## ExpREInd.L 0.40 0.66 -0.91 1.70 1.00 17326 9715
## ExpREInd.Q -0.53 0.74 -1.98 0.94 1.00 18333 9331
## ExpREInd.C -0.31 0.75 -1.81 1.18 1.00 22740 9391
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
These coefficient distributions do not show the full uncertainty of the impact of each predictor, though. Therefore, we plot the marginal effect of relevant predictors. The most relevant predictor is the treatment of the experiment, the use of passive voice. The plot shows that the use of passive voice (passive=TRUE
) slightly increases the likelihood of missing an actor, but since the confidence intervals overlap, this effect is not significant.
marginal.passive <- conditional_effects(m, effects="passive")
## Setting all 'trials' variables to 1 by default if not specified otherwise.
marginal.passive
Export the results to a CSV
sheet to enable a figure of all three marginal plots in the end.
marginal.passive.values <- data.frame(marginal.passive[1]) %>%
select(passive.passive, passive.estimate__, passive.lower__, passive.upper__) %>%
rename(all_of(c(
passive = "passive.passive",
estimate = "passive.estimate__",
ci.lower = "passive.lower__",
ci.upper = "passive.upper__"))) %>%
mutate(
response = "actors"
) %>%
write_csv(file = "../../data/results/marginal-actors.csv")
Similarly, we can visualize the isolated effect of industrial experience in requirements engineering. Again, the overlapping confidence intervals do not suggest a significant effect.
conditional_effects(m, effects="ExpREInd")
## Setting all 'trials' variables to 1 by default if not specified otherwise.
Finally, we can evaluate the model numerically. This is similar to the marginal effect, but can be summarized to the amount of times (in percent) that the use of passive voice produces fewer (-1), equal (0), or more (+1) missing actors.
evaluate.model(m)
##
## -1 0 1
## 0.02508333 0.93775000 0.03716667
The evaluation also supports that the effect of passive voice on the number of missing actors is not significant.
Femmer, H., Kučera, J., & Vetrò, A. (2014, September). On the impact of passive voice requirements on domain modelling. In Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (pp. 1-4).↩︎
Siebert, J. (2023). Applications of statistical causal inference in software engineering. Information and Software Technology, 107198.↩︎