MANAGING E-LEARNING METADATA WITHIN TOOLBOX.ACADEMY: KNOWLEDGE ACQUISITION WITH BIG DATA ALGORITHMS AT THE STUDENT AND GROUP LEVELS

E-learning is not only providing new means to access knowledge, and changing the way teaching takes place at the school and at home, it is also paving the way to learn more and know better about the students' skills, competences, attitude or disorders. Statistics and AI-based big data are the methods to acquire this knowledge, that is of great value for educators and managing teams. Indeed, what looks like a simple interaction between the student and yet another application interface, actually generates a considerable amount of metadata which holds the clues of how the student's cognitive processes deal with the task in question. This metadata, when treated properly, reveals useful knowledge to improve education, as a recent experience with the ToolboX Academy environment has shown. While atypical slopes in learning curves might point toward ADHD or intellectual giftedness, recurrent typing errors might reveal dyslexia, or discalculia might be detected by numeric errors over the average. What is more, tasks in this environment are specially designed for detecting other disorders, like daltonism. These are but examples of what coding environments can reveal with basic big data analysis, but the fine-grain interaction does actually store even more data, and group-level processing can also disclose deficiencies in the teaching process, in general or particular subjects, mainly in the STEM area. In this paper, we describe the approach followed by ToolboX Academy to disclose this information.


INTRODUCTION
The use of Internet in education has created a new context known as e-learning or web-based education in which large amounts of information about teaching-learning interaction are endlessly generated and ubiquitously available [1]. E-Learning enables people to access the online learning resources and provides a flexible learning environment. Within the flexible learning environment, people could learn without the restriction of time and space [2]. E-learning is the alternative of traditional education and it can also be a complementary to it. With most Information Technology systems, the student's interactions with their online learning activities are captured and stored. These digital traces (log data) can then be 'mined' and analysed to identify patterns of learning behaviour that can provide insights into education practice. This process has been described as learning analytics [3].
Learning analytics are being used within the field of teaching and learning for many purposes: identifying students at risk in their academic programming [4], early detection of ADHD [5] or providing personalized and adapted support to university students with dyslexia [6]. Also, learning curves have been used for some time to perform summative evaluations of educational systems, including comparing multiple versions of a system to evaluate whether new features are beneficial or detrimental to learning performance [7]. In this paper, we show how the metadata collected with ToolboX.Academy can detect possible students with problems in learning and advantaged students.

METHODOLOGY AND RESULTS
A controlled experiment was done to validate the use of ToolboX Academy [1] as a platform to teach coding in primary and secondary education. The design, sample and results of this trial have been described and discussed extensively in [1]. As an educational activity, participation in this experiment did not require an informed consent from parents or tutors, since it was not intended to scrutinize the special skills, disorders or any other aspect of the students, apart from the platform's capacity to teach coding. Even that, a thorough scanning of usage data was done as to identify potential technical problems in using ToolboX Academy massively in the future. Actually, the platform records every single event in the interaction between the user and the interface, providing rich information on the particular manner in which the tasks are solved.
The first descriptor of students' performance is, of course, the learning curve. This is given as the curve described by the set of points in time where a task has been successfully completed (i.e. the student has developed and run a program that solves the task). Fig. 2 to 5 show learning curves for different groups of students, each one coloured and labelled for a better identification. The slope of this curve indicates the speed to which the student is learning new concepts, but concrete inflexions in the curve itself provides useful information about the problems that the students have found in solving the tasks (e.g. an atypically high average duration to solve a given task, might indicate that it was not described properly, or that the complexity of the task increased abruptly).
When analysed by grouping the data for the different grades, it can be seen that some students were significantly ahead or delayed with respect to the rest of the subjects. Fig. 1 shows the box-andwhisker plot of the data separated by grades in the K-12 system (p1 to p6 corresponding to primary education, and s1 to b2 to secondary education), where the outliers (Q3 + 1.5·IQR) and extreme values (Q3 + 3·IQR, with IQR = Q3 − Q1 being the interquartile range) have also been represented.
These extreme values, both in the upper side, describe learning curves with slopes much higher than those in the main group (Figs. 2 and 3) and are definitely candidates for diagnose of some kind of intellectual ability, most probably gifted students. Outliers in the upper side might also be good candidates, but here it could also be that there has been a previous experience gained with coding tools. On the other hand, in the lower side (Figs 4 and 5), outliers and extreme values (the latter not detected in this experiment) could be good candidates for some kind of mental disorder, like ADHD, as they show learning curves with extremely low slopes.  In Fig. 3 it is shown the learning curve of the extreme value for the s3 grade, which evolves atypically for the first 15 minutes, but then the slope changes dramatically for some reason, which does not seem to be related to an increase in difficulty, since other students (D04 and B33) pass the 50-task limit, completing a higher number of tasks. Finally, Figs. 4 and 5 show outliers in the lower side of the boxes, which could be students with special needs (Down syndrome in Fig. 4) or attentional disorders.  But learning curves are not the only means to detect special cases. The metadata generated by a platform like ToolboX.Academy show, for example, problems in writing certain words, like repeated compilation errors when the user tries to write a concrete command. For example, a student of the s3 grade run scripts with the word 'letf' (for 'left') 8 times, and also tried 'rigth' (for 'right'), far from the mean level of orthographic mistakes made by students in the same group, what could point towards a case of dyslexia. Similarly, errors in counting the number of iterations with loops, or other computing tasks could reveal cases of dyscalculia. Even visual disorders like color blindness can be detected in tasks that use certain combinations (e.g. an object of a color with a given background color) that cannot be distinguished by children with some type of Daltonism. Repeatedly incurring in errors where these objects have not been detected might suggest a diagnose.

CONCLUSIONS
Children are using e-learning platforms in more and more educational setups. And educators, who can detect special skills and disorders, because they continuously interact with children at the school and know their progress and behavior, can benefit from big data analysis of the metadata generated by such tools. Here it has been shown how statistical analysis of learning curves and usage data collection can turn into a way of screening potential cases.
The use of educational tools like ToolboX.Academy can help in detecting these disorders much earlier, providing treatment or special means at the right time. In the case of giftedness, formal assessment can start at the age of 5 or 6, what is a recommended age to start coding experiences, and early testing can nurture exceptional talents from the very beginning of schooling time. Similarly, students with ADHD or dyslexia do benefit from an early screening at basically no extra cost.
Big data algorithms can operate on massive amounts of usage data, not just learning rates or typos, but going into the fine-grain relations between behavior and symptoms, like slightly deviated response times, or subtle habits in writing or reading processes. This metadata, compared against already diagnosed children will provide reliable screening (and, in some cases, may be also diagnostic) methods.