This document contains a Bayesian analysis of the hypotheses investigated by Femmer et al.1.

Data Loading

Start by loading the prepared data.

source("../util/data-loading.R")
d <- load.data()

# print the data to ensure that all variables have the correct type
str(d)
## 'data.frame':    105 obs. of  21 variables:
##  $ PID                  : chr  "P1" "P1" "P1" "P1" ...
##  $ RID                  : chr  "R1" "R2" "R3" "R4" ...
##  $ Age                  : Ord.factor w/ 3 levels "19-24"<"25-30"<..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Program              : Ord.factor w/ 4 levels "Unknown"<"Bachelor"<..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ REQuizPerformance    : int  9 9 9 9 9 9 9 7 7 7 ...
##  $ ExpProgAca           : Ord.factor w/ 4 levels "no experience"<..: 3 3 3 3 3 3 3 2 2 2 ...
##  $ ExpProgInd           : Ord.factor w/ 4 levels "no experience"<..: 4 4 4 4 4 4 4 1 1 1 ...
##  $ ExpSEAca             : Ord.factor w/ 4 levels "no experience"<..: 4 4 4 4 4 4 4 3 3 3 ...
##  $ ExpSEInd             : Ord.factor w/ 4 levels "no experience"<..: 2 2 2 2 2 2 2 1 1 1 ...
##  $ ExpREAca             : Ord.factor w/ 4 levels "no experience"<..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ ExpREInd             : Ord.factor w/ 4 levels "no experience"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ passive              : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ actors.expected      : int  1 1 2 0 0 0 1 1 1 2 ...
##  $ actors.found         : int  1 1 2 0 0 0 1 1 1 2 ...
##  $ actors.missing       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ associations.expected: int  2 2 2 1 2 3 2 2 2 2 ...
##  $ associations.found   : int  0 0 0 1 0 2 2 2 2 2 ...
##  $ associations.missing : int  2 2 2 0 2 1 0 0 0 0 ...
##  $ objects.expected     : int  3 3 2 2 3 4 3 3 3 2 ...
##  $ objects.found        : int  1 2 1 2 2 4 2 3 3 2 ...
##  $ objects.missing      : int  2 1 1 0 1 0 1 0 0 0 ...

Bayesian Analysis

We perform the final step of the statistical causal inference framework by Siebert2, i.e., the estimation step, based on the two previous modeling and identification steps as documented in (the causal assumptions notebook)[./causal-assumptions.Rmd].

Formula

First, we define a regression model that predicts the number of missing domain objects based on a set of predictors identified during the causal assumptions. The most eligible probability distribution of the response variable is a Binomial distribution, since the response variable is a whole number bounded by the expected number of domain objects.

formula <- (objects.missing | trials(objects.expected) ~ 
              1 + # standard success in the task of finding actors
              passive + # impact of the use of passive voice
              (1|RID) + # impact of the difficulty of the individual requirements
              (1|PID) + # impact of the skill of individual participants
              ExpREAca + # impact of the participants academic experience in RE
              ExpREInd) # impact of the participants industrial experience in RE

get_prior(formula, family=binomial, data=d)
##                 prior     class        coef group resp dpar nlpar lb ub
##                (flat)         b                                        
##                (flat)         b  ExpREAca.C                            
##                (flat)         b  ExpREAca.L                            
##                (flat)         b  ExpREAca.Q                            
##                (flat)         b  ExpREInd.C                            
##                (flat)         b  ExpREInd.L                            
##                (flat)         b  ExpREInd.Q                            
##                (flat)         b passiveTRUE                            
##  student_t(3, 0, 2.5) Intercept                                        
##  student_t(3, 0, 2.5)        sd                                    0   
##  student_t(3, 0, 2.5)        sd               PID                  0   
##  student_t(3, 0, 2.5)        sd   Intercept   PID                  0   
##  student_t(3, 0, 2.5)        sd               RID                  0   
##  student_t(3, 0, 2.5)        sd   Intercept   RID                  0   
##        source
##       default
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##       default
##       default
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)

Priors

For each of the selected predictors, we select an uninformative prior distribution that encodes our previous knowledge about the causal phenomenon.

priors <- c(prior(normal(-1.5, 1), class = Intercept),
            prior(normal(0, 1), class = b),
            prior(weibull(2, 1), class = sd, coef = Intercept, group = RID),
            prior(weibull(2, 1), class = sd, coef = Intercept, group = PID),
            prior(exponential(1), class = sd))

To assert that the priors are feasible, we sample from the priors without training the Bayesian model with the data.

m.prior <-
  brm(data = d, family = binomial, formula, prior = priors,
    iter = 4000, warmup = 1000, chains = 4, cores = 4,
    seed = 4, sample_prior="only",
    file = "fits/m.objects.prior"
  )

We visualize the results to graphically check the priors eligibility.

ndraws <- 400
brms::pp_check(m.prior, type="bars", ndraws=ndraws)

The distribution of the predicted values for the response variable (\(y_{rep}\)) overlaps with the actually observed response variable distribution (\(y\)), confirming the feasibility of the priors.

Training

Given the feasibility of the priors, we train the regression model. This process updates the prior distributions of all predictor coefficients.

m <-
  brm(data = d, family = binomial, formula, prior = priors,
    iter = 4000, warmup = 1000, chains = 4, cores = 4,
    seed = 4,
    file = "fits/m.objects"
  )

To assert that the training process worked correctly, we plot the Markov chains.

plot(m)

All chains seem fine. Further, we sample from the posterior distributions similar to the prior predictive check.

brms::pp_check(m, type="bars", ndraws=ndraws)

The distribution of the predicted values for the response variable (\(y_{rep}\)) still overlaps with the actually observed response variable distribution (\(y\)). Additionally, the distribution shrunk around the actually observed values, which indicates that the model has improved its predictive power through training.

Evaluation

Finally, we evaluate the trained model. An initial glance at the predictor coefficients shows their posterior distribution.

summary(m)
##  Family: binomial 
##   Links: mu = logit 
## Formula: objects.missing | trials(objects.expected) ~ 1 + passive + (1 | RID) + (1 | PID) + ExpREAca + ExpREInd 
##    Data: d (Number of observations: 105) 
##   Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
##          total post-warmup draws = 12000
## 
## Group-Level Effects: 
## ~PID (Number of levels: 15) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.89      0.34     0.29     1.61 1.00     6126     5679
## 
## ~RID (Number of levels: 7) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     1.10      0.34     0.54     1.89 1.00     7420     7602
## 
## Population-Level Effects: 
##             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept      -3.09      0.64    -4.36    -1.85 1.00     7631     7665
## passiveTRUE     0.29      0.61    -0.91     1.48 1.00     8678     8578
## ExpREAca.L     -0.04      0.69    -1.43     1.33 1.00    13019     9237
## ExpREAca.Q     -0.59      0.67    -1.89     0.75 1.00    10026     9214
## ExpREAca.C     -0.18      0.55    -1.30     0.86 1.00     9115     8927
## ExpREInd.L     -0.31      0.61    -1.53     0.87 1.00     9350     9528
## ExpREInd.Q     -0.01      0.68    -1.35     1.32 1.00     9520     9266
## ExpREInd.C     -0.34      0.69    -1.69     1.03 1.00    11287     9587
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

These coefficient distributions do not show the full uncertainty of the impact of each predictor, though. Therefore, we plot the marginal effect of relevant predictors. The most relevant predictor is the treatment of the experiment, the use of passive voice. The plot shows that the use of passive voice (passive=TRUE) slightly increases the likelihood of missing a domain object, but since the confidence intervals overlap, this effect is not significant.

marginal.passive <- conditional_effects(m, effects="passive")
## Setting all 'trials' variables to 1 by default if not specified otherwise.
marginal.passive

Export the results to a CSV sheet to enable a figure of all three marginal plots in the end.

marginal.passive.values <- data.frame(marginal.passive[1]) %>% 
  select(passive.passive, passive.estimate__, passive.lower__, passive.upper__) %>% 
  rename(all_of(c(
    passive = "passive.passive",
    estimate = "passive.estimate__", 
    ci.lower = "passive.lower__", 
    ci.upper = "passive.upper__"))) %>% 
  mutate(
    response = "objects"
  ) %>% 
  write_csv(file = "../../data/results/marginal-objects.csv")

Finally, we can evaluate the model numerically. This is similar to the marginal effect, but can be summarized to the amount of times (in percent) that the use of passive voice produces fewer (-1), equal (0), or more (+1) missing domain objects.

evaluate.model(m)
## 
##     -1      0      1 
## 0.2550 0.4015 0.3435

The evaluation also supports that the effect of passive voice on the number of missing domain objects is not significant.


  1. Femmer, H., Kučera, J., & Vetrò, A. (2014, September). On the impact of passive voice requirements on domain modelling. In Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (pp. 1-4).↩︎

  2. Siebert, J. (2023). Applications of statistical causal inference in software engineering. Information and Software Technology, 107198.↩︎