This document contains a Bayesian analysis of the hypotheses investigated by Femmer et al.1.

Data Loading

Start by loading the prepared data.

source("../util/data-loading.R")
d <- load.data()

# print the data to ensure that all variables have the correct type
str(d)
## 'data.frame':    105 obs. of  21 variables:
##  $ PID                  : chr  "P1" "P1" "P1" "P1" ...
##  $ RID                  : chr  "R1" "R2" "R3" "R4" ...
##  $ Age                  : Ord.factor w/ 3 levels "19-24"<"25-30"<..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Program              : Ord.factor w/ 4 levels "Unknown"<"Bachelor"<..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ REQuizPerformance    : int  9 9 9 9 9 9 9 7 7 7 ...
##  $ ExpProgAca           : Ord.factor w/ 4 levels "no experience"<..: 3 3 3 3 3 3 3 2 2 2 ...
##  $ ExpProgInd           : Ord.factor w/ 4 levels "no experience"<..: 4 4 4 4 4 4 4 1 1 1 ...
##  $ ExpSEAca             : Ord.factor w/ 4 levels "no experience"<..: 4 4 4 4 4 4 4 3 3 3 ...
##  $ ExpSEInd             : Ord.factor w/ 4 levels "no experience"<..: 2 2 2 2 2 2 2 1 1 1 ...
##  $ ExpREAca             : Ord.factor w/ 4 levels "no experience"<..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ ExpREInd             : Ord.factor w/ 4 levels "no experience"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ passive              : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ actors.expected      : int  1 1 2 0 0 0 1 1 1 2 ...
##  $ actors.found         : int  1 1 2 0 0 0 1 1 1 2 ...
##  $ actors.missing       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ associations.expected: int  2 2 2 1 2 3 2 2 2 2 ...
##  $ associations.found   : int  0 0 0 1 0 2 2 2 2 2 ...
##  $ associations.missing : int  2 2 2 0 2 1 0 0 0 0 ...
##  $ objects.expected     : int  3 3 2 2 3 4 3 3 3 2 ...
##  $ objects.found        : int  1 2 1 2 2 4 2 3 3 2 ...
##  $ objects.missing      : int  2 1 1 0 1 0 1 0 0 0 ...

Bayesian Analysis

We perform the final step of the statistical causal inference framework by Siebert2, i.e., the estimation step, based on the two previous modeling and identification steps as documented in the causal assumptions notebook.

Formula

First, we define a regression model that predicts the number of missing actors based on a set of predictors identified during the causal assumptions. The most eligible probability distribution of the response variable is a Binomial distribution, since the response variable is a whole number bounded by the expected number of actors.

formula <- (actors.missing | trials(actors.expected) ~ 
              1 + # standard success in the task of finding actors
              passive + # impact of the use of passive voice
              (1|RID) + # impact of the difficulty of the individual requirements
              (1|PID) + # impact of the skill of individual participants
              ExpREAca + # impact of the participants academic experience in RE
              ExpREInd) # impact of the participants industrial experience in RE

get_prior(formula, family=binomial, data=d)
##                 prior     class        coef group resp dpar nlpar lb ub
##                (flat)         b                                        
##                (flat)         b  ExpREAca.C                            
##                (flat)         b  ExpREAca.L                            
##                (flat)         b  ExpREAca.Q                            
##                (flat)         b  ExpREInd.C                            
##                (flat)         b  ExpREInd.L                            
##                (flat)         b  ExpREInd.Q                            
##                (flat)         b passiveTRUE                            
##  student_t(3, 0, 2.5) Intercept                                        
##  student_t(3, 0, 2.5)        sd                                    0   
##  student_t(3, 0, 2.5)        sd               PID                  0   
##  student_t(3, 0, 2.5)        sd   Intercept   PID                  0   
##  student_t(3, 0, 2.5)        sd               RID                  0   
##  student_t(3, 0, 2.5)        sd   Intercept   RID                  0   
##        source
##       default
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##       default
##       default
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)

Priors

For each of the selected predictors, we select an uninformative prior distribution that encodes our previous knowledge about the causal phenomenon.

priors <- c(prior(normal(-1, 1), class = Intercept),
            prior(normal(0, 1), class = b),
            prior(weibull(2, 1), class = sd, coef = Intercept, group = RID),
            prior(weibull(2, 1), class = sd, coef = Intercept, group = PID),
            prior(exponential(1), class = sd))

To assert that the priors are feasible, we sample from the priors without training the Bayesian model with the data.

m.prior <-
  brm(data = d, family = binomial, formula, prior = priors,
    iter = 4000, warmup = 1000, chains = 4, cores = 4,
    seed = 4, sample_prior="only",
    file = "fits/m.actors.prior"
  )

We visualize the results to graphically check the priors eligibility.

ndraws <- 400
brms::pp_check(m.prior, type="bars", ndraws=ndraws)

The distribution of the predicted values for the response variable (\(y_{rep}\)) overlaps with the actually observed response variable distribution (\(y\)), confirming the feasibility of the priors.

Training

Given the feasibility of the priors, we train the regression model. This process updates the prior distributions of all predictor coefficients.

m <-
  brm(data = d, family = binomial, formula, prior = priors,
    iter = 4000, warmup = 1000, chains = 4, cores = 4,
    seed = 4,
    file = "fits/m.actors"
  )

To assert that the training process worked correctly, we plot the Markov chains.

plot(m)

All chains seem fine. Further, we sample from the posterior distributions similar to the prior predictive check.

brms::pp_check(m, type="bars", ndraws=ndraws)

The distribution of the predicted values for the response variable (\(y_{rep}\)) still overlaps with the actually observed response variable distribution (\(y\)). Additionally, the distribution shrunk around the actually observed values, which indicates that the model has improved its predictive power through training.

Evaluation

Finally, we evaluate the trained model. An initial glance at the predictor coefficients shows their posterior distribution.

summary(m)
##  Family: binomial 
##   Links: mu = logit 
## Formula: actors.missing | trials(actors.expected) ~ 1 + passive + (1 | RID) + (1 | PID) + ExpREAca + ExpREInd 
##    Data: d (Number of observations: 105) 
##   Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
##          total post-warmup draws = 12000
## 
## Group-Level Effects: 
## ~PID (Number of levels: 15) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.64      0.35     0.11     1.47 1.00    10380     8442
## 
## ~RID (Number of levels: 7) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     1.24      0.44     0.46     2.17 1.00    10328     6386
## 
## Population-Level Effects: 
##             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept      -2.08      0.77    -3.60    -0.57 1.00    14518     9182
## passiveTRUE     0.47      0.70    -0.89     1.86 1.00    20187     9354
## ExpREAca.L     -0.30      0.73    -1.73     1.11 1.00    20933     9443
## ExpREAca.Q     -0.13      0.72    -1.56     1.29 1.00    17709     9765
## ExpREAca.C     -0.50      0.64    -1.75     0.79 1.00    19624     9651
## ExpREInd.L      0.40      0.66    -0.91     1.70 1.00    17326     9715
## ExpREInd.Q     -0.53      0.74    -1.98     0.94 1.00    18333     9331
## ExpREInd.C     -0.31      0.75    -1.81     1.18 1.00    22740     9391
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

These coefficient distributions do not show the full uncertainty of the impact of each predictor, though. Therefore, we plot the marginal effect of relevant predictors. The most relevant predictor is the treatment of the experiment, the use of passive voice. The plot shows that the use of passive voice (passive=TRUE) slightly increases the likelihood of missing an actor, but since the confidence intervals overlap, this effect is not significant.

marginal.passive <- conditional_effects(m, effects="passive")
## Setting all 'trials' variables to 1 by default if not specified otherwise.
marginal.passive

Export the results to a CSV sheet to enable a figure of all three marginal plots in the end.

marginal.passive.values <- data.frame(marginal.passive[1]) %>% 
  select(passive.passive, passive.estimate__, passive.lower__, passive.upper__) %>% 
  rename(all_of(c(
    passive = "passive.passive",
    estimate = "passive.estimate__", 
    ci.lower = "passive.lower__", 
    ci.upper = "passive.upper__"))) %>% 
  mutate(
    response = "actors"
  ) %>% 
  write_csv(file = "../../data/results/marginal-actors.csv")

Similarly, we can visualize the isolated effect of industrial experience in requirements engineering. Again, the overlapping confidence intervals do not suggest a significant effect.

conditional_effects(m, effects="ExpREInd")
## Setting all 'trials' variables to 1 by default if not specified otherwise.

Finally, we can evaluate the model numerically. This is similar to the marginal effect, but can be summarized to the amount of times (in percent) that the use of passive voice produces fewer (-1), equal (0), or more (+1) missing actors.

evaluate.model(m)
## 
##         -1          0          1 
## 0.02508333 0.93775000 0.03716667

The evaluation also supports that the effect of passive voice on the number of missing actors is not significant.


  1. Femmer, H., Kučera, J., & Vetrò, A. (2014, September). On the impact of passive voice requirements on domain modelling. In Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (pp. 1-4).↩︎

  2. Siebert, J. (2023). Applications of statistical causal inference in software engineering. Information and Software Technology, 107198.↩︎