4  Gender disparity or demographic inertia?

Author

Rebecca S. Chen

As previously noted (Hinsley, Sutherland, and Johnston 2017)), the gender disparity in question-asking could be explained by age-related effects. More specifically, if senior researchers ask more questions compared to junior researchers, and because there are more senior men present than senior women, we might observe that women ask less questions than men because of these age-related effects.

4.1 How many female senior scientists?

First, we explored the potential for age-related effects to bias our interpretation of gender disparity in question asking by calculating the proportion of senior women who attended the congress, based on collected data on career stage and pronouns during registration. We defined a female senior scientist as someone who uses she/her pronouns and has a “Professor” or “Associate Professor” title, as a male senior scientist as someone who uses he/him pronouns and has a “Professor” or “Associate Professor” title.

# registration
all_reg <- fread("../data/pre_survey/Registration_clean.tsv", quote="")

# exclude gender queer and no answer because this gets too complex and 
# sample size is low
all_reg <- subset(all_reg, (Pronouns == "Female" | Pronouns == "Male"))

# look at career stage data across the entire congress
summary(as.factor(all_reg$Career))
"Senior position, lecturers, researchers" 
                                      103 
               "Students (BSc, MSc, PhD)" 
                                      337 
                 Post doctorate or almost 
                                      157 
                   Professor or associate 
                                       87 
# we assessed age categories in practise by dividing age into three classes: 
# < 35, 35-50 and >50. 
# Since we will line up the observational data with the registration data, 
# we only define "Professor or associate" as the oldest age class, as they 
# are most likely to be put in this age category. 

all_reg <- all_reg %>% mutate(age = case_when(
  Career == "Professor or associate" ~ "senior",
  TRUE ~ "junior"
))

# number of registrants per pronoun/gender
reg_pronoun <- table(all_reg$Pronouns) %>% as.data.frame()

# number of registrants per pronoun/gender
reg_age <- table(all_reg$age) %>% as.data.frame()

# number of registrants per pronoun/gender and age
reg <- table(all_reg$Pronouns, all_reg$age) %>% as.data.frame()

# combine and rename
reg <- left_join(reg, reg_pronoun, by = "Var1")
reg <- left_join(reg, reg_age, by = c("Var2" = "Var1"))
names(reg) <- c("gender", "age", "n", "n_gender", "n_age")

# calculate the proportion of attendees per gender and age (prop) and per gender only (prop_gender)
reg$prop <- reg$n / sum(reg$n) 
reg$prop_gender <- reg$n / reg$n_gender
reg$prop_age <- reg$n / reg$n_age

reg 
  gender    age   n n_gender n_age       prop prop_gender  prop_age
1 Female junior 398      451   597 0.58187135   0.8824834 0.6666667
2   Male junior 199      233   597 0.29093567   0.8540773 0.3333333
3 Female senior  53      451    87 0.07748538   0.1175166 0.6091954
4   Male senior  34      233    87 0.04970760   0.1459227 0.3908046

Here, the ‘prop’ indicates the proportion across the entire congress, whereas the ‘prop_age’ indicates the proportion within that age class.

So: the majority of senior scientists was female.

Since there were more female senior scientists than male senior scientists, we would expect more women to ask questions than men if the majority of questions are asked by senior scientists regardless of gender. Even though demographic inertia was therefore unlikely to be relevant for potential biases in questioning gender disparities caused by career stage, we investigated whether 1) senior scientists ask more questions than junior scientists and 2) whether the gender disparity in question-asking was similar when stratifying our analysis by juniors and seniors.

4.2 Do seniors ask more questions than juniors?

The model that we’re testing here looks like follows:

glmer(age_questioner_senior ~ questioner_gender + (1|session_id/talk_id), family = "binomial", offset=boot::logit(audience_senior_prop))

First, we calculate the proportion of the audience that was a senior based on the registration data.

reg_prop_junior = reg$n_age[which(reg$age=="junior")][1] / nrow(all_reg)
reg_prop_senior = 1-reg_prop_junior

## reformat the data to indicate the seniority of the questioner based on age (senior = age class 3 = age)

data_control <- data_control %>% 
  mutate(audience_junior_prop = (audience_total * reg_prop_junior) / audience_total,
         audience_senior_prop = (audience_total * reg_prop_senior) / audience_total,
         age_questioner_junior = case_when(questioner_age == 1 | questioner_age == 2 ~1, questioner_age == 3 ~ 0),
         age_questioner_senior = case_when(questioner_age == 1 | questioner_age == 2 ~0, questioner_age == 3 ~ 1))

age_questioner_senior <- glmer(age_questioner_senior ~ 1 + (1|session_id/talk_id), family = "binomial", offset=boot::logit(audience_senior_prop), data = data_control)
boundary (singular) fit: see help('isSingular')
summary(age_questioner_senior)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: age_questioner_senior ~ 1 + (1 | session_id/talk_id)
   Data: data_control
 Offset: boot::logit(audience_senior_prop)

     AIC      BIC   logLik deviance df.resid 
   156.8    167.4    -75.4    150.8      252 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-0.9850 -0.3076 -0.1924 -0.1707  4.3849 

Random effects:
 Groups             Name        Variance  Std.Dev. 
 talk_id:session_id (Intercept) 7.825e-10 2.797e-05
 session_id         (Intercept) 1.720e+00 1.312e+00
Number of obs: 255, groups:  talk_id:session_id, 105; session_id, 23

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)  -0.8796     0.4712  -1.867   0.0619 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
optimizer (Nelder_Mead) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

So, although the intercept is marginally significant (p = 0.06), it seems like there is a trend for the lower probability of seniors asking a question compared to juniors.

4.3 Is the gender disparity in question-asking dependent on seniority?

For this question, we split up the dataset between juniors and seniors. To do so, we correct the observed number of perceived women in the audience by the number of junior and senior women taken from the registration. This is a complex conversion which is outlined in detail below with mock data.

# calculate proportions required for the correction based on registration

# get proportions of junior women, junior men, senior women, senior men based on registration
prop_junior_women_registration_women = reg$prop_gender[which(reg$gender=="Female" & reg$age == "junior")]

prop_senior_women_registration_women = reg$prop_gender[which(reg$gender=="Female" & reg$age == "senior")]

prop_junior_men_registration_men = reg$prop_gender[which(reg$gender=="Male" & reg$age == "junior")]

prop_senior_men_registration_men = reg$prop_gender[which(reg$gender=="Male" & reg$age == "senior")]

# try with hypothetical talk data to ensure the correction is done correctly
total_audience_no = 100
audience_prop_women = 0.6
audience_prop_men = 1 - audience_prop_women

# for the below: we first calculate the number of women/men in the audience and multiply that by the proportion of junior/senior women/men of the registration, and then divide that by the total audience number again

prop_junior_women_talk = ((audience_prop_women * total_audience_no) * prop_junior_women_registration_women)/total_audience_no 

prop_senior_women_talk = ((audience_prop_women * total_audience_no) * prop_senior_women_registration_women)/total_audience_no

prop_junior_men_talk = ((audience_prop_men * total_audience_no) * prop_junior_men_registration_men)/total_audience_no

prop_senior_men_talk = ((audience_prop_men * total_audience_no) * prop_senior_men_registration_men)/total_audience_no

# should add up to 1
prop_junior_women_talk + prop_senior_women_talk + prop_junior_men_talk + prop_senior_men_talk
[1] 1
# then get gender proportions by age, which is what we need to do the correction
prop_junior_women_talk_junior = prop_junior_women_talk*total_audience_no / (prop_junior_women_talk*total_audience_no + prop_junior_men_talk*total_audience_no)

prop_junior_women_talk_junior
[1] 0.6078261
prop_junior_women_talk_senior = prop_senior_women_talk*total_audience_no / (prop_senior_women_talk*total_audience_no + prop_senior_men_talk*total_audience_no)

prop_junior_women_talk_senior
[1] 0.5471018
# if we want to put that in one simplified formula:

(audience_prop_women*prop_junior_women_registration_women) / ((audience_prop_women*prop_junior_women_registration_women) + (audience_prop_men * prop_junior_men_registration_men))
[1] 0.6078261
(audience_prop_women*prop_senior_women_registration_women) / ((audience_prop_women*prop_senior_women_registration_women) + (audience_prop_men * prop_senior_men_registration_men))
[1] 0.5471018

Then we can model the junior and senior data separately:

# how many questions were asked by women (1) and men (0) per age class? 
# age class  1 = < 35 years, 2 = 35-50, 3 = > 50

table(data_control$questioner_age, data_control$gender_questioner_female) %>% kbl() %>%  kable_classic_2()
0 1
53 76
65 38
19 10
# make two dataframes: junior and senior data
junior <- subset(data_control, (questioner_age == 1 | questioner_age == 2))
senior <- subset(data_control, questioner_age == 3)

# add column with corrected gender proportion
junior$audience_women_prop_junior <- (junior$audience_women_prop*prop_junior_women_registration_women) /
  ((junior$audience_women_prop*prop_junior_women_registration_women) +
     (junior$audience_men_prop * prop_junior_men_registration_men))

senior$audience_women_prop_senior <- (senior$audience_women_prop*prop_senior_women_registration_women) /
  ((senior$audience_women_prop*prop_senior_women_registration_women) +
     (senior$audience_men_prop * prop_senior_men_registration_men))

# build model for junior scientists

m_qa_junior <- glmer(gender_questioner_female ~ 1 + (1|session_id/talk_id), 
                     family = "binomial",offset=boot::logit(audience_women_prop_junior), 
                     data = junior)

# build model for senior scientists

m_qa_senior <- glmer(gender_questioner_female ~ 1 + (1|session_id/talk_id), 
                     family = "binomial", offset=boot::logit(audience_women_prop_senior), 
                     data = senior)

# model output
summary(m_qa_junior)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: gender_questioner_female ~ 1 + (1 | session_id/talk_id)
   Data: junior
 Offset: boot::logit(audience_women_prop_junior)

     AIC      BIC   logLik deviance df.resid 
   312.3    322.6   -153.1    306.3      226 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.4497 -0.9291 -0.6876  1.0189  1.4086 

Random effects:
 Groups             Name        Variance Std.Dev.
 talk_id:session_id (Intercept) 0        0       
 session_id         (Intercept) 0        0       
Number of obs: 229, groups:  talk_id:session_id, 100; session_id, 23

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -0.6763     0.1344  -5.034 4.81e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
optimizer (Nelder_Mead) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
summary(m_qa_senior)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: gender_questioner_female ~ 1 + (1 | session_id/talk_id)
   Data: senior
 Offset: boot::logit(audience_women_prop_senior)

     AIC      BIC   logLik deviance df.resid 
    39.6     43.4    -16.8     33.6       23 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-0.8578 -0.7517 -0.6574  1.2507  1.7378 

Random effects:
 Groups             Name        Variance Std.Dev.
 talk_id:session_id (Intercept) 0        0       
 session_id         (Intercept) 0        0       
Number of obs: 26, groups:  talk_id:session_id, 21; session_id, 11

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)  -0.7833     0.4143  -1.891   0.0586 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
optimizer (Nelder_Mead) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

It therefore appears that the gender disparity is apparent in both junior and senior attendees, but it is stronger in seniors although with less significance.

Hinsley, Amy, William J. Sutherland, and Alison Johnston. 2017. “Men Ask More Questions Than Women at a Scientific Conference.” Edited by Marina A. Pavlova. PLOS ONE 12 (10): e0185534. https://doi.org/10.1371/journal.pone.0185534.