Using the Implicit Relational Assessment Procedure (IRAP) to Examine Implicit Gender Stereotypes in Science, Technology, Engineering and Maths (STEM)

Women are often subject to gender stereotyping in the fields of science, technology, engineering, and mathematics (STEM). The Implicit Relational Assessment Procedure (IRAP) was used to determine directionality of any implicit gender-STEM bias detected. In addition, the IRAP was used to explore the possibility of implicit ageism bias, because there is anecdotal evidence of high levels of ageism in the STEM areas. Thus two IRAPs (one with adult pictorial stimuli and one with child pictorial stimuli) were employed to assess implicit gender bias toward STEM with a sample of undergraduates (N = 33). Results indicated a gender STEM bias in both IRAPs and the directionality in both IRAPs was pro-male and not anti-female. Participant gender was not shown to impact results in either IRAP. Gender bias effects were more pronounced in the Adult-IRAP results. Comparison of bias toward older versus young pictorial stimuli was exploratory thus findings are preliminary but may suggest ageism and potential negative interaction effects between age and gender warrant further research.

International research has shown that fewer females excel, pursue degrees, and hold jobs in the fields of science, technology, engineering and mathematics (STEM) when compared to males (see World Economic Forum, 2018;Bench, Lench, Liew, Miner, & Flores, 2015;Moakler & Kin, 2014). Despite efforts in some countries to recruit and retain women in these fields, in general the number of males in STEM remains much higher in comparison to females (Ceci & Williams, 2011;Handelsman et al., 2005). Gender bias in STEM has been linked also to higher female drop-out rates in natural and physical science courses at university level (e.g., Grunspan, Eddy, Brownell, Wiggins, Crowe, & Goodreau, 2016), and to the overestimation of math performance in males of all ages (e.g., Bench et al., 2015). Furthermore, studies have shown lower starting salaries for females when they do enter STEM (e.g., Moss-Racusin, Dovidio, Brescoll, Graham, & Handelsman, 2012).
The research demonstrates that many influencers in a child's life (e.g., parents and teachers) are more likely to see males rather than females as better suited to STEM subjects and careers (Gunderson, Ramirez, Levine, & Beilock, 2012), albeit that there is some evidence linking personal characteristics (e.g., self-concepts) rather than gender to success in STEM (Helwig, Anderson, & Tindal, 2001;Robnett & Leaper, 2012). Recent research continues to demonstrate a pro-male bias in STEM subjects and careers. For example, a study of university STEM students found that males overnominated fellow male students as the most knowledgeable of course content, whereas females were nominated based on academic performance, regardless of student gender (Grunspan et al., 2016). Female interns were viewed as having lower field aptitudes in STEM compared to their male counterparts (see Reilly, Rackley, & Awad, 2017), and similar attitudes favoring the competence of males versus females have been demonstrated in other studies (Leslie, Cimpian, Mayer, & Freeland, 2015;Reuben, Sapienza, & Zingales, 2014). These findings may be due to firmly held beliefs (or biases) that there are real, factual differences between men and women's capacity in STEM; however, evidence suggests that there is typically no difference between STEM ability for boys and girls at school age (e.g., Blažev, Karabegović, Burušić, & Selimbegović, 2017;Gibb, Fergusson, & Horwood, 2008). Indeed, negative gender bias against women in STEM fields may affect female self-esteem and work performance in turn affecting their perseverance in STEM fields, such that even after women have engaged in STEM careers, there may be little incentive to stay long term (O'Brien, Garcia, Adams, Villalobos, Hammer, & Gilbert, 2015). And women exposed to such negative biases in STEM careers may anticipate more discrimination and feel a lower sense of belonging at work (Moss-Racusin, Sanzari, Caluori, & Rabasco, 2018).
Much of the extant data regarding gender bias in STEM are explicit self-report measures such as questionnaires or rating scales. These are an efficient means of data collection and facilitate high participant numbers; however, the limitations of research based on introspection have long been documented in psychology. The accuracy of self-report measures assumes that the participant is aware of their prejudicial attitudes, and that they can and are willing to report honestly and accurately (Sudman & Bradburn, 1982). Implicit measures of bias (or stereotyping) aim to avoid problems of introspection and social desirability, and may be complementary to self-report data, which are vulnerable to introspection and social desirability problems. Results from Implicit Association Test (IAT; Greenwald, McGhee, & Schwartz, 1998) experiments investigating gender and science have shown that 70% of more than half a million IATs have shown faster response latencies when pairing "male" and science subjects and "female" with arts subjects (Nosek et al., 2009). In general, implicit measures require participants to respond rapidly under time pressure, and the basic premise is that participants will more rapidly affirm relations that are learned preexperimentally compared to relations that are the converse.
The IAT can indicate participant bias by showing that responding was faster toward one set of pairings compared to another (e.g., shorter time latencies for responding toward pairings of "men-science" vs. "women-science"), but these data do not indicate if the bias is anti-women, pro-male, or a combination of both. The Implicit Relational Assessment Procedure (IRAP; Barnes-Holmes et al., 2006) is a behavioral implicit measure that can provide data indicating the directionality of bias. The IRAP has been successfully used to highlight biases in a range of sensitive areas, for example, implicit bias toward child stimuli for sexual offenders (Dawson, Barnes-Holmes, Gresswell, Hart & Gore, 2009), implicit body-weight bias (Nolan, Murphy & Barnes-Holmes, 2013;Ritzert, Anderson, Reilly, Gorrell, Forsyth & Anderson, 2016), and negative bias toward older individuals (Cullen, Barnes-Holmes, Barnes-Holmes & Stewart, 2009).
Studies have emerged using the IRAP to investigate implicit gender-based attitudes regarding STEM. In particular, Farrell, Cochrane, and McHugh (2015) used the IRAP and the IAT to explore differences in gender and science attitudes in a group of adults. The IAT found gender bias among all participants where men were more likely to be associated with STEM subjects compared to females, whereas the authors reported neutral responding for male participants on the IRAP but a pro-male STEM and pro-male arts effect for the female participants (Farrell et al., 2015). A follow-up study aimed to investigate the gender bias for STEM as before, but included STEM and non-STEM university students as participants (Farrell & McHugh, 2017). The results from this study found a pro-male STEM bias across all participants on the IAT (similar to that reported by Farrell et al., 2015). All participant groups revealed a pro-male STEM bias. However, the female STEM students exhibited a significant pro-female STEM bias. Overall, these studies suggest that more research is needed to disentangle gender biases in STEM and to determine if there are differences for males and females.
Previous research manipulated the gender of fictional profiles of girls and boys and demonstrated a stereotypical gender-science bias among 81 adult university students (Newall et al., 2018). The current study aimed to extend the research literature by conducting an IRAP examination of university students' implicit gender-STEM bias using face image stimuli in the implicit measures. Pictorial stimuli rather than verbal stimuli were used because previous research showed more substantial and consistent effects of a double standard of gendered ageism with photographic stimuli rather than verbal descriptions (Kogan & Mills, 1992). Furthermore, an exploratory aspect of the study also sought to compare STEM biases toward children versus adults across two IRAPs (i.e., the Child-IRAP and the Adult-IRAP). The addition of the Child-IRAP aimed to detect (1) whether an implicit gender-STEM bias would be shown toward individuals even at an early developmental stage, and (2) to explore potential differences in the level of gender-STEM bias shown toward children versus adults. In particular, if the Adult-IRAP showed a greater implicit gender-STEM bias, further investigation of age effects in gender-STEM bias may be warranted.

Method
Participants Thirty-seven students, 21 females and 16 males, aged between 18 and 22 (M = 20.7, SD = 1.3) were recruited from the first author's university. Participants were recruited by convenience sampling through the use of social media and a notice board at the university. All participants were university students, were fluent in English, and had normal or corrected-tonormal vision. The participant's course type (STEM or non-STEM) was not recorded because this was not the focus of the current study. The experiment was counterbalanced so that the first participant received the Adult-IRAP first, whereas the second participant received the Child-IRAP first (followed by the third participant receiving the Adult-IRAP first, etc.). This resulted in 17 participants completing the Adult-IRAP first and 16 participants completing the Child-IRAP first. Participants were required to reach the performance criteria on the IRAP (accuracy greater than 80% and a response latency criterion equal to or less than 2,100 ms on two practice blocks for inclusion in the current study). It should be noted that the latency is more commonly set at 2,000 ms, and the additional 100 ms was a minor error that went unnoticed initially, but is unlikely to have had any significant impact on the results. The data of four participants were excluded from analysis because they failed to reach the required performance criteria for the IRAP. A final sample of 33 participants took part in the study (19 females and 14 males). Informed consent was obtained from all participants. The study was approved by the Ethics Committee at the National University of Ireland Maynooth.

Instruments
Two types of measures were implemented: an explicit measure (Career Suitability Rating Scale) and an implicit measure (two IRAPs: one Adult-IRAP and one Child-IRAP). The IRAPs used pictorial stimuli (e.g., image of an adult/child) that were rated, reviewed, and made available online by previous researchers. For the Adult-IRAP, eight pictorial stimuli (four male and four female) of older adults with neutral expressions were randomly selected from the Face Database, a collection of various facial images (see Minear & Park, 2004). Eight pictorial stimuli (four male and four female) were selected for the Child-IRAP (see Nosek et al., 2007). The IRAP software used was the 2016 version.
The Career Suitability Rating Scale This was adapted with permission from authors of the previous study by Farrell and McHugh (2017). It included an 11-point rating system designed to explicitly measure suitability ratings for males/ females among a range of 12 university subjects. These subjects were removed and replaced with 12 careers. The adapted scale will be referred to as the "Career Suitability Rating Scale." The six STEM careers were selected based on the National Science Foundation's definition of STEM (Breiner, Harkness, Johnson & Koehler, 2012; see Table 1). The six arts careers were selected by the research supervisor and members of the first author's psychology tutorial group based on common assumptions of non-STEM/Arts fields with reference to the Museum, Arts, and Humanities Division (Bierbaum, 1988).
The 12 combined STEM and arts careers were listed in random order on the Career Suitability Rating Scale. Participants were required to rate the suitability of each profession to gender. "Males more suitable" and "Females more suitable" were used as gender indicators on the left and right extremities of the 11-point scale, with the middle point indicating a "neutral" opinion (i.e., both males and females are equally suitable to the career). By placing an "X" at point 1 on the scale, the participant would have a score of 1, indicating that they believed that males were strongly suited to that career. By placing an "X" on the opposite end of the scale, at point 11, the participant indicated that they believed females were strongly suited to that career. Two scoring procedures were used for each participant to get: (1) Total STEM score (minimum 6, maximum 66) and (2) Total Arts score (minimum 6, maximum 66). For each participant, a low score (e.g., Total STEM score: 6-25) indicated a preference for male suitability to STEM careers whereas a high score (e.g., Total STEM score: 40-66) indicated a preference for female suitability. A neutral Total STEM or Total Arts score (26-39) indicated a neutral view that males and females are equally suited to STEM or Arts careers.
The Implicit Relational Assessment Procedure (IRAP) Two IRAP computer programs were used in this study (one Adult-IRAP and one Child-IRAP). For each trial of the IRAPs, participants were presented with one of eight pictorial stimuli (e.g., image of an adult/child), 1 of 12 target stimuli (e.g., science/arts) and two response options (e.g., similar/opposite; see Fig. 1). The pictorial stimuli used across both IRAPs were rated, reviewed, and made available online by previous researchers. For the Adult-IRAP eight images of older-adults with neutral expressions were randomly selected from the database (four male and four female; see Minear & Park, 2004). In a similar manner, eight pictorial stimuli were selected for use in the Child-IRAP (four male and four female; see Nosek et al., 2007). The careers involved either STEM or Arts careers taken from the Career Suitability Rating Scale (see Table 1). Participants were required to alternately affirm or deny relations between the pictorial stimuli and target stimuli using the terms "Similar" ("d" key) and "Opposite" ("k") key. These response options were preferred over responses such as "true" and "false" because preliminary research has shown that relational terms (e.g., similar/opposite; same/difference) may produce different or stronger IRAP effects compared to "natural language" terms such as true/false (see Maloney & Barnes-Holmes, 2016). The relational terms "similar" and "opposite" were considered appropriate in the current research context, which is essentially testing for prelearned equivalence relations between, for example, "men-STEMsimilar/ women/ARTS-opposite." The response option locations remained static on-screen in order to reduce complexity of responding and thus reduce participant attrition rates. Four IRAP trial-types presented relations over 24 trials: two trial-types were "consistent" with a STEM gender bias (male-STEM-similar/female-STEM-opposite), and two trial-types were "inconsistent" with a STEM gender bias (female-STEM-similar/male-STEM-opposite).

Procedure
All experiments were carried out in a quiet experimental cubicle on the university campus. Participants were seated at a desk with a Pentium 4 personal computer with a Windows 7 operating system. They were provided with information sheets and written consent was obtained for all participants. The experimenter remained in the experimental cubicle while administrating the information sheet, consent form, and the Career Suitability Scale. The experimenter waited outside the experimental cubicle during both IRAPs, instructing the participant to open the door when they were finished.
IRAP procedure Participants were required to complete two practice blocks for each IRAP: one consistent and one inconsistent. The instructor stayed in the room for these practice blocks in case the participants had any questions, but left before the participants started on the test blocks. All blocks began with an on-screen rule: either Rule A (the consistent rule, respond in a stereotype conforming manner, e.g., male-STEM/female-arts) or Rule B (the inconsistent rule, respond in a stereotype disconfirming manner, e.g., male-arts/female-STEM; see Fig. 2). Each block consisted of 24 trials, with each pictorial appearing three times and each target stimulus appearing four times (label stimuli and target stimuli were paired randomly by the IRAP). Stimuli were presented simultaneously and remained on-screen until the participant selected one of two response options: "d" for similar or "k" for opposite. For example, for the Rule A (consistent) blocks for both IRAPs, the participant was required to respond "similar" when a male image appeared with a STEM career label (e.g., male image, SCIENTIST, similar; see Fig. 1) and "opposite" when a female image appeared with a STEM label. Participants were also required to answer "opposite" when a male image appeared with an arts label (e.g., male image, CHILD MINDER, opposite) and "similar" for a female image with an arts label. If the participant was too slow, the message "Too Slow!" appeared in red on the screen. If the participant responded incorrectly, a red "X" would appear on-screen and remain until the participant made the correct response. If a participant failed to meet the performance criteria (accuracy greater than 80% and a response latency of less than 2,100 ms) two subsequent practice blocks were presented. Failure to meet the performance criteria for these practice blocks resulted in an on-screen notification thanking the participant for their participation and terminating the IRAP.
If the participant reached the required accuracy and response latency criterion, they then completed six test blocks (each constructed in the same manner as the test blocks, with blocks 1, 3, and 5 as consistent and 2, 4, and 6 as inconsistent). The experimenter left the room while the participants completed the test IRAPs. It has been suggested that block order does not affect response latency in the IRAP and it is unnecessary to examine when focusing on individual trial-type results (Barnes-Holmes, Barnes-Holmes, Stewart, & Boles, 2010a). Therefore, consistent and inconsistent block orders were not counterbalanced in the current study. Upon completion, an onscreen message appeared, informing the participant that the task was completed and to alert the researcher. The participant was then offered a 10-min break if they wished, before completing the second IRAP in the same manner as above. Once the second IRAP was completed, the participant was thanked, debriefed, and any queries they had were addressed.

Analytic Strategy
Participant rating scores on the Career Suitability Scale were combined and analyzed at group level for differences in participant gender using t-tests in SPSS. The IRAP data were analyzed according to standard calculation procedures outlined below. Single-sample t-tests were conducted to determine the significance of the D-IRAP scores from zero for each trial type and the overall D-scores across both IRAPs. The data from each IRAP was analyzed using a 2x4 repeated measures ANOVA to investigate participant gender and IRAP responding across the four trial types. A series of t-tests was conducted to determine if there were differences between D-IRAP scores on each trial type across the two IRAPs (adult vs. child). The relationship between the scores on the Career Suitability Rating Scale and the overall D-IRAP score was investigated using Spearman's rho correlation coefficient.

Career Suitability Rating Scale
Participant rating scores (N = 33) were combined and analyzed at the group level. For STEM careers, 51.5% explicitly reported males and females equally (M = 32.76, SD = 5.1) indicating a neutral view, 45.5% of the overall sample felt that males were more suitable for careers in STEM whereas only 3% rated females as more suitable. In contrast, 72.7% of the sample reported females as more suited to careers in the Arts (M = 41.5, SD = 5.59) and 27.3% rated males and females are Fig. 2 Directions for consistent (Rule A) and inconsistent (Rule B) trial-types used for both the Adult-IRAP and Child-IRAP equally suited to this career area. No participants rated males as more suited to careers in Arts. A paired sample t-test found a significant difference for gender and career suitability to STEM and Arts subjects (p < .001, standard error = 1.8) where participants were mostly neutral in terms of gender suitability to STEM but biased towards females for Arts careers. A further independent t-test found no gender differences between the career suitability rating scores of male and female participants (both ps > .05).

Implicit Relational Assessment Procedure (IRAP)
Participant response latencies, defined as the time (in milliseconds) from the start of a trial to the participant's correct response, are the primary data produced by the IRAP, and data are combined and analyzed at a group level. Response latencies were calculated for each participant using an adaptation of Cohen's d; the D-IRAP scoring algorithm (Barnes-Holmes, Murtagh, Barnes-Holmes, & Stewart, 2010b). Provided that the participant reached the performance criteria, D-IRAP scores were calculated using only data from the six test blocks. D-IRAP scores were calculated using the following five steps: (1) First, 12 standard deviations are calculated from the fourindividual trial-types; (2) Twenty-four mean latencies for each of the four trial-types in both the adult-and Child-IRAP are calculated; (3) The mean latencies of the consistent blocks described as male-STEM-similar/female-STEM-opposite are subtracted from the mean latencies of the corresponding inconsistent blocks (male-STEM-opposite/female-STEM-similar) resulting in a difference score for each of the four trialtypes; (4) These difference scores (D-scores) are then divided by the standard deviations calculated previously; (5) This results in 12 D-IRAP scores that are reduced to four trial-type D-IRAP scores by finding the average score for each individual trial-type across the six test blocks (see Barnes-Holmes et al., 2010a, b for an in-depth account on calculating D-IRAP scores). The four individual IRAP trial-types were male-STEM-similar, female-STEM-opposite, male-arts-opposite, and female-arts-similar for both the Adult-IRAP and Child-IRAP. An overall D-IRAP score was also calculated by averaging these four trial-type D-scores and subtracting the Dscores for consistent trial-types from the D-scores for inconsistent trial-types. If the result was a positive D-score it represented faster responding to the consistent trial-types (Rule A) versus inconsistent trial-types (Rule B), indicating a stereotype bias in participant responding. On the other hand, a negative D-score represented faster responding to the inconsistent or anti-stereotype trial-types.

Adult-IRAP
The D-scores for both IRAPs are presented in Table 2. Singlesample t-tests were conducted with the four D-IRAP trial-types. Results revealed a statistically significant difference from zero for the following trial-types: male-STEM-similar (p < 0.001), male-arts-opposite (p < 0.01; see Table 2). D-scores were positive for all individual trial-types, indicating that participants responded more rapidly during consistent male-STEM/female-arts trial-types compared with inconsistent male-arts/female-STEM trials. Although the four IRAP trial-types D-scores tended toward a pro-male bias, the Dscores on only two out of four trial-types were different from zero at a statistically significant level, whereas there was a nonsignificant difference on the other two trial-types.
The D-scores from each of the four IRAP trial-types on the Adult-IRAP were analyzed using a mixed between-within 2x4 repeated measures analysis of variance (ANOVA) with participant gender and IRAP trial-type as between-subjects IVs, and trial-type D-scores as the DV. Results indicated a statistically significant main effect for trial-type (Wilks Λ = 0.752, F (3, 29) = 3.19, p = .04, partial ή 2 = 0.23). However, there was no significant main effect for gender and no interaction effect (both ps > .22; see Fig. 3).

Child-IRAP
Results for the Child-IRAP found that the male-STEM-similar trial-type was the only trial-type to be statistically significant against zero (p < .001; see Table 2) whereas the other trialtypes were not. A 2x4 mixed-between-within ANOVA was conducted to assess whether participant gender influenced any stereotype evident in IRAP D-scores. There was a statistically significant effect for trial-type [Wilks Λ = 0.57, F (3, 29) = 7.27, p < 0.001, partial ή 2 = 0.43] (see Fig. 4). However, there was no significant main effect for gender or interaction (both ps > .37).

Comparisons Between the Adult-and Child-IRAPs
A series of t-tests was conducted to determine if there were differences between D-IRAP scores on each trial-type across the two IRAPs (adult vs. child). Results revealed a statistically significant difference between the overall D-IRAP scores for both IRAPs (p < .05), and a statistically significant difference on the male-Arts-opposite trial-type (p < .05; see Table 2).
Univariate analysis of variance for each of the four IRAP trial-types (dependent variable), participant gender, and order (categorical variables) were conducted to determine if there was an effect for gender, order of IRAP presentations (i.e., Child-IRAP first and Adult-IRAP second or vice versa), or the interaction of the two. Significance levels were set to .01 to counter the potential of type 1 error occurring with several ANOVAs. Results revealed no significant effect for gender, order of IRAP presentation, or the interaction between the two on any of the IRAP trial-types.

Implicit-Explicit Correlations
The relationship between the scores on the Career Suitability Rating Scale and the overall D-IRAP scores was investigated using Spearman's ρ correlation coefficient. There were no significant correlations between any D-IRAP scores and the explicit measure (p > .05 in all cases).

Discussion
The current study sought to examine STEM gender bias with adult participants using the IRAP methodology using facial images of adults and children (across two IRAPs). For the Adult-IRAP, overall results showed a positive statistically significant D-score indicating stereotyped responding. Analysis  of the four trial-types on the Adult-IRAP showed that the direction of bias was in favor of male-STEM rather than negative toward female-STEM (two trial-types statistically significant). The Adult-IRAP results presented here indicate that participants not only showed an implicit preference for adult males in terms of STEM career suitability, but that they also showed an implicit negative bias for adult males in terms of career suitability to the Arts. The results were different on the Child-IRAP, which found that the male-STEM-similar trialtype was the only statistically significant trial-type. A further difference was that participants' D-scores did not show the same negative bias towards male-ARTS on the Child-IRAP. Gender of participants did not affect results in either IRAP. A Spearman's ρ correlation coefficient revealed no significant correlation effects between the explicit and implicit measures used in the current study. It remains unclear why the explicit data did not correlate with implicit results; however, the explicit and implicit procedures are not designed to measure exactly the same phenomena, and both may be subject to a different range of influences not excluding measurement artifact.
Overall, the IRAP findings are interesting, in that it seems that the implicit STEM-gender bias was less pronounced (i.e., pro male-STEM but not anti-female-STEM and not anti-male-ARTS) when participants were evaluating child facial images in relation to STEM/ARTS, and more pronounced when participants were evaluating adult facial images in relation to STEM/ARTS (i.e., pro-male-STEM, anti-female-STEM, antimale-ARTS). This research has extended the literature by examining directionality of gender-STEM bias using the IRAP's four trial-type methodology and illustrating various nuances in participant responding. Results of an exploratory feature pertaining to the variable of age of facial image stimuli across IRAPs (adult vs. child facial image stimuli) may suggest an additional area of interest that could be investigated via the IRAP. Indeed, it may be of interest to discover if age and gender combine to influence participant responding and stereotype in the STEM area. Findings of more pronounced IRAP bias toward adult versus child facial images may be spurious, however, and it should be noted that a potential confounding variable was that child face stimuli were depicted as smiling whereas adult face stimuli were nonsmiling. Future research should hold this variable constant across both face stimuli.
The results indicated that participants (regardless of their own gender) displayed a pro-male-STEM stereotype for younger children. Further analysis found significant differences between the D-scores for some IRAP trial-type on the  IRAP scores for  the four individual trial-types on  the Child-IRAP for male and  female participants Child-IRAP versus Adult-IRAP, indicating a significant difference between the groups on the male-Arts-opposite trialtype. In this case, responding on the Adult-IRAP indicated an anti-male-Arts bias, whereas conversely, there was a promale-Arts bias for the Child-IRAP. Thus, the IRAP fourtrial-type methodology facilitated an analysis of directionality of STEM-gender bias, which elucidated some interesting differences in results across both IRAPs.
As hypothesized, a pro-male STEM bias was revealed across the Adult-IRAP and the Child-IRAP, and was also found on the explicit measure of career suitability regarding STEM/ARTS and gender. This pro-male STEM bias is consistent with findings in in previous studies (Farrell et al., 2015;Farrell & McHugh, 2017), suggesting that males are preferred over females in terms of suitability to STEM careers.
The current research has also extended the research literature in that an implicit pro-male-STEM bias has now been shown with pictorial stimuli, whereas previous research used text/sentences as label stimuli (e.g., "men more suited to/men better at"; Farrell et al., 2015;Farrell & McHugh, 2017). The issue of whether pictorial or textual stimuli is more effective in detecting IRAP effects may need further clarification, and needs to be considered in light of recent research and theoretical discussions related to the orienting functions and other potentially relevant aspects of various stimuli used in IRAP research (Maloney & Barnes-Holmes, 2016).
A limitation in the current study is that the provision of specific rules (e.g., Rule A, select similar for male-STEM/ Rule B, select similar for male-ARTS) for consistent and inconsistent trial-blocks, has been shown to dramatically affect IRAP effects (Finn, Barnes-Holmes, Hussey, & Graddy, 2016), particularly with relations learned preexperimentally within the social community. Related to this, the level of coherence in relations presented may influence the IRAP effect and indeed the single trial-type dominance effect frequently found in IRAP studies. For example, relations such as knife with knife may have greater coherence than knife with fork (see Finn et al. for an expanded discussion)-this is not necessarily a limitation of the IRAP but rather a legitimate aspect of a methodology aiming to detect prelearned stereotyped relations, as in the current context. Future research is recommended to use generic rules (e.g., relations in these trialblocks are reversed), or to avoid providing rules and allow participants to learn to respond correctly across block-trials through feedback only.
Of note was the lack of statistically significant differences between the male and female participant responding on both the explicit and implicit measures. This is in contrast to previous research using the IRAP, which, for example, found that female participants demonstrated a significantly stronger male-arts bias compared to males (Farrell et al., 2015). In another study using the IRAP, researchers found that female participants who were enrolled in STEM courses also demonstrated a significantly stronger female-STEM bias compared to the participant groups, which consisted of non-STEM females and STEM and non-STEM males (Farrell & McHugh, 2017). It seems likely that current as well as historical contextual variables may influence participant responding in studies of stereotyped responding; however, the precise conditions in which gender effects might be expected requires further clarification. An exploratory aspect of the current IRAP research suggested an age dimension in that greater gender-science bias was shown toward young (child) image stimuli compared to adult stimuli. These findings are preliminary and should be viewed with caution, however, multiple replications are needed, and there were potential confounds, as mentioned above. Although child stimuli were used in the current research to determine if a gender-science bias was shown toward children, future research may compare gender-science bias toward young and adults, for ecological validity purposes related to age/gender bias in the field of science.
In conclusion, the importance of understanding the multiple, complex aspects in implicit gender-STEM bias cannot be overstated-as reported by Nosek et al. (2009) national differences in gender-science stereotypes predict national sex differences in science and math achievement, perpetuating the underrepresentation of women in STEM areas. This likely detracts from the field, as well as the prospects of potential female scientists. It is important in behavior analysis to identify a problem with clarity and precision, and research in the area of STEM stereotypes may be beneficial toward this end.
Availability of Data and Materials All data and materials used in this study are available on request from the corresponding author.
Funding The second author is funded by the Marie Skłodowska-Curie Actions COFUND Collaborative Research Fellowships for a Responsive and Innovative Europe (CAROLINE).

Compliance with Ethical Standards
Conflict of Interest On behalf of all authors, the corresponding author states that there is no conflict of interest.
Ethical Approval All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed Consent Informed consent was obtained from all individual participants included in the study.