CODING SKILLS ARE ACQUIRED GENDER-INDEPENDENTLY IN THE K-12 SYSTEM: THE TOOLBOX.ACADEMY EXPERIENCE

Gender inequality in accessing STEM studies is a serious problem that higher education needs to solve. But the reasons for this are not clear yet. Among the arguments considered, a prevalent one has to do with a supposed innate male competence for science and engineering. In a controlled experiment involving 356 female and 394 male participants, of ages ranging from 6 to 18 years, subjects were asked to solve tasks by writing computer programs, and performance was measured as the number of tasks solved in a given time. Statistical analysis of the learning curves showed that female participants performed as well as male participants in acquiring coding skills and problem-solving. This result contradicts this innate bias, showing that both genders qualify at the same level for computer programming, and suggests policies to gender-balance affiliation to STEM careers, since an early exposure to the computer sciences might combat stereotypes like "coding is only for boys". In the paper we analyze in detail the main results of this experiment, as well as direct feedback from female and male subjects in solving the tasks.


INTRODUCTION
The underrepresentation of girls and women in science, technology, engineering, and mathematics (STEM) fields is a worldwide phenomenon. Female participation is falling in a field that is expanding globally as its importance for national economies grows [1]. In particular, the field of computer science is expected to grow by 13 percent over the next eight years [2]. By 2026, it is predicted that there will be approximately 3.5 million new computer science jobs in America. Based on current graduation rates, approximately 17 percent of these jobs will be filled by computer science graduates [3]. This anticipated growth in the field, and its related increase in job opportunities, is promising for those with computer science degrees. In terms of salary, work-life balance, and expected employment growth, computer science related jobs rank among the top jobs in the world [4]. It is unfortunate that the growth of the field has not been accompanied by an equally compelling improvement in the role of women within it. Despite considerable efforts toward understanding and changing this pattern, the gender gap in STEM engagement has remained stable for decades [5]. Even if from all S&E (science and engineering) degrees awarded in 2014, women earned 41,98% of bachelor's degrees over the world and 42,3% in Spain [6], the proportion of degrees awarded to women varies across and within broad fields of study. Women's highest degree shares are in Psychology and the Biosciences; the lowest, in Computer Science and Engineering.
Various explanations for the underrepresentation of women in math-intensive fields have been given: (a) sex differences in mathematical and spatial ability; (b) sex discrimination in publishing, funding, and hiring; and (c) occupational/lifestyle preferences and choices that reduce women's participation in mathintensive fields Roli Varma [7], finds bias in early socialization and anxiety toward technology as two main factors responsible for the under-representation of women in CS/CE education. Sylvia Beyer [6], indicates that gender differences in computer self-efficacy, stereotypes, interests, values, interpersonal orientation, and personality exist. Lack of opportunities for early familiarization with computing in the home and the scholastic environment is the factor that mainly differentiates boys' and girls' motivation against studying CS, having a greater impact on girls [8]. In this paper, we study the performance of a group of boys and girls aged 6 to 18 to solve computer programming tasks with ToolboX.Academy.

METHODOLOGY
An Ex post facto study was performed in order to assess how much does gender affect the degree of knowledge acquisition in a coding task. The study was done with students from 6 to 18 years-old, from primary and secondary education centers in Andalusia (Spain), in May 2018. A convenience sampling was adopted to assure that any center interested in participating in the experiment would have the chance to join. In order to fulfill the target of comparing performance by gender, the minimal sample size was set to 480 participants (20 subjects per group × 12 courses × 2 genders). Some more details of the experiment design are described in [9].
A descriptive research was done, using measures of central tendency for the quantitative variables, and frequency distribution for the qualitative ones. A multivariate analysis of variance was performed from a univariate generalized linear model, considering as fixed factors both gender and the academic course, and taking as the outcome variable the concepts that had been learned (as the number of tasks that each subject solved during the first 30 minutes of the experiment). In order to deal with the target of this study, the marginal means estimated by gender will be described, at 95% confidence intervals. The level of statistical significance was set at p<0.05.

RESULTS
After filtering a number of sessions according to duration, data for 750 subjects was obtained from the 12 educative grades: 1 st course of primary education to 2 nd course of the Spanish Bachillerato (Table  1), with a minimum of 20 students in the course where less subjects were recruited (labeled b2), and 112 in the course (labeled s2) with more subjects, and where 52.5% of the sample were male.

CONCLUSIONS
In a sample of students, representative of the twelve courses of Primary and Secondary education, it has been shown that the capacity to develop coding skills is independent of gender. Only one course (b1) there was a slight difference in performance while solving tasks in favor of male students.
A limitation in this study is the overall duration of the experiment, which had to be restricted to 30 minutes for technical restrictions (even if it was planned to evaluate performance in computer programming during one Hour of Code). However, this it does not introduce a differential misclassification bias, given that all students were evaluated under the same controlled conditions. On one hand, shortening overall duration for the study favors the comparison in the execution of tasks among courses, since in young students it is expected a higher degree of exhaustion (and consequent abandonment of the experiment) in a complete hour of code. On the other hand, a strength of the study is having obtained an adequate statistical power in different educational centers and levels, and with a standardized evaluation methodology to answer a question which is practically unprecedented in the educational field.