As previously noted (Hinsley, Sutherland, and Johnston 2017)), the gender disparity in question-asking could be explained by age-related effects. More specifically, if senior researchers ask more questions compared to junior researchers, and because there are more senior men present than senior women, we might observe that women ask less questions than men because of these age-related effects.
4.1 How many female senior scientists?
First, we explored the potential for age-related effects to bias our interpretation of gender disparity in question asking by calculating the proportion of senior women who attended the congress, based on collected data on career stage and pronouns during registration. We defined a female senior scientist as someone who uses she/her pronouns and has a “Professor” or “Associate Professor” title, as a male senior scientist as someone who uses he/him pronouns and has a “Professor” or “Associate Professor” title.
# registrationall_reg <-fread("../data/pre_survey/Registration_clean.tsv", quote="")# exclude gender queer and no answer because this gets too complex and # sample size is lowall_reg <-subset(all_reg, (Pronouns =="Female"| Pronouns =="Male"))# look at career stage data across the entire congresssummary(as.factor(all_reg$Career))
"Senior position, lecturers, researchers"
103
"Students (BSc, MSc, PhD)"
337
Post doctorate or almost
157
Professor or associate
87
# we assessed age categories in practise by dividing age into three classes: # < 35, 35-50 and >50. # Since we will line up the observational data with the registration data, # we only define "Professor or associate" as the oldest age class, as they # are most likely to be put in this age category. all_reg <- all_reg %>%mutate(age =case_when( Career =="Professor or associate"~"senior",TRUE~"junior"))# number of registrants per pronoun/genderreg_pronoun <-table(all_reg$Pronouns) %>%as.data.frame()# number of registrants per pronoun/genderreg_age <-table(all_reg$age) %>%as.data.frame()# number of registrants per pronoun/gender and agereg <-table(all_reg$Pronouns, all_reg$age) %>%as.data.frame()# combine and renamereg <-left_join(reg, reg_pronoun, by ="Var1")reg <-left_join(reg, reg_age, by =c("Var2"="Var1"))names(reg) <-c("gender", "age", "n", "n_gender", "n_age")# calculate the proportion of attendees per gender and age (prop) and per gender only (prop_gender)reg$prop <- reg$n /sum(reg$n) reg$prop_gender <- reg$n / reg$n_genderreg$prop_age <- reg$n / reg$n_agereg
Here, the ‘prop’ indicates the proportion across the entire congress, whereas the ‘prop_age’ indicates the proportion within that age class.
So: the majority of senior scientists was female.
Since there were more female senior scientists than male senior scientists, we would expect more women to ask questions than men if the majority of questions are asked by senior scientists regardless of gender. Even though demographic inertia was therefore unlikely to be relevant for potential biases in questioning gender disparities caused by career stage, we investigated whether 1) senior scientists ask more questions than junior scientists and 2) whether the gender disparity in question-asking was similar when stratifying our analysis by juniors and seniors.
4.2 Do seniors ask more questions than juniors?
The model that we’re testing here looks like follows:
glmer(age_questioner_senior ~ questioner_gender + (1|session_id/talk_id), family = "binomial", offset=boot::logit(audience_senior_prop))
First, we calculate the proportion of the audience that was a senior based on the registration data.
reg_prop_junior = reg$n_age[which(reg$age=="junior")][1] /nrow(all_reg)reg_prop_senior =1-reg_prop_junior## reformat the data to indicate the seniority of the questioner based on age (senior = age class 3 = age)data_control <- data_control %>%mutate(audience_junior_prop = (audience_total * reg_prop_junior) / audience_total,audience_senior_prop = (audience_total * reg_prop_senior) / audience_total,age_questioner_junior =case_when(questioner_age ==1| questioner_age ==2~1, questioner_age ==3~0),age_questioner_senior =case_when(questioner_age ==1| questioner_age ==2~0, questioner_age ==3~1))age_questioner_senior <-glmer(age_questioner_senior ~1+ (1|session_id/talk_id), family ="binomial", offset=boot::logit(audience_senior_prop), data = data_control)
boundary (singular) fit: see help('isSingular')
summary(age_questioner_senior)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: age_questioner_senior ~ 1 + (1 | session_id/talk_id)
Data: data_control
Offset: boot::logit(audience_senior_prop)
AIC BIC logLik deviance df.resid
156.8 167.4 -75.4 150.8 252
Scaled residuals:
Min 1Q Median 3Q Max
-0.9850 -0.3076 -0.1924 -0.1707 4.3849
Random effects:
Groups Name Variance Std.Dev.
talk_id:session_id (Intercept) 7.825e-10 2.797e-05
session_id (Intercept) 1.720e+00 1.312e+00
Number of obs: 255, groups: talk_id:session_id, 105; session_id, 23
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.8796 0.4712 -1.867 0.0619 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
optimizer (Nelder_Mead) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
So, although the intercept is marginally significant (p = 0.06), it seems like there is a trend for the lower probability of seniors asking a question compared to juniors.
4.3 Is the gender disparity in question-asking dependent on seniority?
For this question, we split up the dataset between juniors and seniors. To do so, we correct the observed number of perceived women in the audience by the number of junior and senior women taken from the registration. This is a complex conversion which is outlined in detail below with mock data.
# calculate proportions required for the correction based on registration# get proportions of junior women, junior men, senior women, senior men based on registrationprop_junior_women_registration_women = reg$prop_gender[which(reg$gender=="Female"& reg$age =="junior")]prop_senior_women_registration_women = reg$prop_gender[which(reg$gender=="Female"& reg$age =="senior")]prop_junior_men_registration_men = reg$prop_gender[which(reg$gender=="Male"& reg$age =="junior")]prop_senior_men_registration_men = reg$prop_gender[which(reg$gender=="Male"& reg$age =="senior")]# try with hypothetical talk data to ensure the correction is done correctlytotal_audience_no =100audience_prop_women =0.6audience_prop_men =1- audience_prop_women# for the below: we first calculate the number of women/men in the audience and multiply that by the proportion of junior/senior women/men of the registration, and then divide that by the total audience number againprop_junior_women_talk = ((audience_prop_women * total_audience_no) * prop_junior_women_registration_women)/total_audience_no prop_senior_women_talk = ((audience_prop_women * total_audience_no) * prop_senior_women_registration_women)/total_audience_noprop_junior_men_talk = ((audience_prop_men * total_audience_no) * prop_junior_men_registration_men)/total_audience_noprop_senior_men_talk = ((audience_prop_men * total_audience_no) * prop_senior_men_registration_men)/total_audience_no# should add up to 1prop_junior_women_talk + prop_senior_women_talk + prop_junior_men_talk + prop_senior_men_talk
[1] 1
# then get gender proportions by age, which is what we need to do the correctionprop_junior_women_talk_junior = prop_junior_women_talk*total_audience_no / (prop_junior_women_talk*total_audience_no + prop_junior_men_talk*total_audience_no)prop_junior_women_talk_junior
# if we want to put that in one simplified formula:(audience_prop_women*prop_junior_women_registration_women) / ((audience_prop_women*prop_junior_women_registration_women) + (audience_prop_men * prop_junior_men_registration_men))
Then we can model the junior and senior data separately:
# how many questions were asked by women (1) and men (0) per age class? # age class 1 = < 35 years, 2 = 35-50, 3 = > 50table(data_control$questioner_age, data_control$gender_questioner_female) %>%kbl() %>%kable_classic_2()
0
1
53
76
65
38
19
10
# make two dataframes: junior and senior datajunior <-subset(data_control, (questioner_age ==1| questioner_age ==2))senior <-subset(data_control, questioner_age ==3)# add column with corrected gender proportionjunior$audience_women_prop_junior <- (junior$audience_women_prop*prop_junior_women_registration_women) / ((junior$audience_women_prop*prop_junior_women_registration_women) + (junior$audience_men_prop * prop_junior_men_registration_men))senior$audience_women_prop_senior <- (senior$audience_women_prop*prop_senior_women_registration_women) / ((senior$audience_women_prop*prop_senior_women_registration_women) + (senior$audience_men_prop * prop_senior_men_registration_men))# build model for junior scientistsm_qa_junior <-glmer(gender_questioner_female ~1+ (1|session_id/talk_id), family ="binomial",offset=boot::logit(audience_women_prop_junior), data = junior)# build model for senior scientistsm_qa_senior <-glmer(gender_questioner_female ~1+ (1|session_id/talk_id), family ="binomial", offset=boot::logit(audience_women_prop_senior), data = senior)# model outputsummary(m_qa_junior)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: gender_questioner_female ~ 1 + (1 | session_id/talk_id)
Data: junior
Offset: boot::logit(audience_women_prop_junior)
AIC BIC logLik deviance df.resid
312.3 322.6 -153.1 306.3 226
Scaled residuals:
Min 1Q Median 3Q Max
-1.4497 -0.9291 -0.6876 1.0189 1.4086
Random effects:
Groups Name Variance Std.Dev.
talk_id:session_id (Intercept) 0 0
session_id (Intercept) 0 0
Number of obs: 229, groups: talk_id:session_id, 100; session_id, 23
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.6763 0.1344 -5.034 4.81e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
optimizer (Nelder_Mead) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
summary(m_qa_senior)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: gender_questioner_female ~ 1 + (1 | session_id/talk_id)
Data: senior
Offset: boot::logit(audience_women_prop_senior)
AIC BIC logLik deviance df.resid
39.6 43.4 -16.8 33.6 23
Scaled residuals:
Min 1Q Median 3Q Max
-0.8578 -0.7517 -0.6574 1.2507 1.7378
Random effects:
Groups Name Variance Std.Dev.
talk_id:session_id (Intercept) 0 0
session_id (Intercept) 0 0
Number of obs: 26, groups: talk_id:session_id, 21; session_id, 11
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.7833 0.4143 -1.891 0.0586 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
optimizer (Nelder_Mead) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
It therefore appears that the gender disparity is apparent in both junior and senior attendees, but it is stronger in seniors although with less significance.
Hinsley, Amy, William J. Sutherland, and Alison Johnston. 2017. “Men Ask More Questions Than Women at a Scientific Conference.” Edited by Marina A. Pavlova. PLOS ONE 12 (10): e0185534. https://doi.org/10.1371/journal.pone.0185534.