A Framework towards Educational Scalability of Open Online Courses

: Although the terms scale and scalable are often used in the context of Open Online Education (OOE), there is no clear definition about these concepts from an educational perspective on the course level. This paper critically discusses the origins of these concepts and provides a working definition for educational scalability. A heuristic framework, which integrates four common educational design principles, is introduced, in order to study support and formative assessment and feedback at large scale. The proposed framework is presented, discussed and applied to five case studies. First qualitative results of the case studies show that the designs are relatively similar. The detailed study of their units of learning, however, indicates practices which can potentially be interesting for other MOOC developers to enhance their design and their scalability. Further research will apply the framework to zoom in on scalable best practices in MOOCs with a focus on scalable practices of formative assessment and feedback.


Introduction
Since the offering of the first Massive Open Online Course (MOOC) in 2008 the originally positive attitude towards MOOCs got challenged by low completion rates and criticism about their (educational) quality.Although MOOCs did impress with their massive reach and the potential for testing educational innovations within a course, voices could be heard that, from an instructional design perspective, MOOCs are not innovative and implement norms of existing classroom education [Kalz & Specht,13;Anderson,01].Next to this, authors recently started to discuss and criticize the assumption of MOOCs being able to provide learning at scale.When using the term 'MOOC' we refer to all online courses that are open, free and have participant numbers beyond traditional courses.This paper focuses on the underlying assumptions of learning at scale and relates it to economical as well as educational models/theories.As [Terwiesch & Ulrich,14] argue, the massiveness and scale paradigm closely resembles economies of scale principles earlier used in the context of educational administration for example with regard to optimal class and school size.The author argues that this approach is based on "repeatable processes delivering identical products or services with an emphasis on efficiency and effectiveness assuming a simple in-out delivery model of learning".Exactly the same arguments are often used by adepts of the MOOC movement in connection with the idea of "zero marginal costs" [Rifkin,14].The term "marginal costs" stems from business and economics and refers to a situation in which an increase of producing an additional unit of an item is possible without a substantial increase in costs.If these additional costs are close to zero, the concept of "zero marginal costs" is used.[Rifkin,14] provides several examples of economies in which this phenomenon can play an important role in the future and he explicitly refers to "MOOCs and Zero Marginal Costs Education".Paradoxically, [Rifkin,14] argues that in this type of education the "authoritarian, top down model of instruction is beginning to give way to a more collaborative learning experience" while later providing examples of actual classical top-down teaching approaches in MOOCs.Several authors [e.g.Wulf,Blohm,Brenner,Leimeister,14;Belleflamme & Jacqmin,15] have reproduced this argument without taking into account that a simplified delivery model is equated with education and that higher-order learning objectives will lead to increase of (human) resources.
Therefore, we will introduce in this paper an alternative view on learning at scale that takes a more holistic approach to scalability and includes quantitative aspects of education (delivery at low cost) but also qualitative aspects regarding the complexity of educational processes and instructional design choices (sections 2 to 4).Following the introduction of the concept of educational scalability in section 3 in section 5 we present a heuristic framework to assess MOOCs and their scalability at the course level.Building on and combining four umbrella concepts -constructive alignment, complexity, interaction, formative assessment and feedback -enables us not only to get insights into the scalability of the educational design but also to identify interesting or best practice to steer improvement.Following these concepts, section 6 introduces our design analysis instrument which incorporates the educational design theories into practice.The final sections describe and discuss the findings of the validation of the instrument and our conclusions and future work.

Scalability and scale in conference papers
To assess the current state of scientific discourse on scale and scalability we have reviewed the full papers of the two most prominent conferences in the field (Learning at Scale and eMOOCs).To analyse the conceptualization of scale in the Learning at Scale conference series, we have downloaded all 48 full papers of the years 2014-2016 and manually inspected a.) if the terms scale or scalability have been mentioned, and b.) what underlying definition of scale is used in the papers.In nearly half of the papers the concepts of scale or scalability are not further explained beyond a reference to the name of the conference.The majority of the papers explaining their understanding of scale present a purely quantitative perspective on scale.These concepts fall in the categories large-scale of student numbers, size of dataset, study size, amount of users in classes, courses or learning environments.The term scalability is not used in any paper.Only two papers actually mention limits to scale or refer to qualitative aspects.This points to an understanding of scale that might also be based on a simple delivery model of education without taking into account different qualities and complexity levels of education.
The same analysis has been conducted on the proceedings of eMOOCs for the years 2014 -2016.Out of 124 papers, 31 papers refer to the concept of scale.The majority of the papers referred to scale as number of students, number of data and massiveness of the course.Some papers used the term scale as a capability of a system which can be scaled.The term scalability was mentioned in seven papers without explaining the underlying meaning of it.In most cases scalability was seen as a given or desirable characteristic of a MOOC in terms of quantity.The quality aspect was not included.
This brief scan of the two core conferences about MOOCs shows us that the concept of scale is not operationalised and that an implicit quantitative understanding dominates.The term scalability, although used, is not explained in any papers.This is one of the important blind spots in the discussion about MOOCs.To clarify the terminology in this discussion we propose to differentiate between the concepts of scale and (educational) scalability.

Scale versus scalability
The terms scale and scalability are not consistently used in the literature about open education and MOOCs.Most authors define scale in a purely quantitative way equal to the number of participants and massiveness aspect of open online courses.In the Information Technology domain for example, scalability is defined as the characteristic of a system to cope with increasing workload if its context is increasing in volume.Scholarly thinking about scale and scalability has been severely influenced by an economic perspective on scalability.[Spencer,74] stated, with his "economies of scale" concept, that a cost advantage can be obtained due to large scale."The idea of economies of scale references a curvilinear relationship between management costs and items produced, with cost advantages obtained due to increased size" [Stewart,13].Economies of scale are achieved if production at large scale can take place with less/low input costs.An increase in production leads to a decrease of costs, since fixed costs can be spread over a larger number of products.A focus on costs and scale can also be seen in previous research in the educational sector that has focused on class size and school size [Hirsch,95;Cohn,68] or provision of content [Mulder,13].
The limitations of applying these economic concepts to Technology-Enhanced Learning have been earlier discussed by [Morris,08].The author differentiates between internal and external economies of scale and describes several conditions under which such effects can be achieved (for example through inter-institutional sharing of staff, technical solutions, etc.).However, there also are some doubts about economies of scale effects on the course level.Although the production costs for learning materials can be kept low through standardization, some authors point out the fact that the actual potential for economies of scale on the course level depends on the course design and more specifically on methods of student support and assessment.Thus, there is a natural limit for economies of scale due to the implemented (or required) learning design.For example, a MOOC that mainly contains reusable and sharable content such as videos, texts and multiple choice quizzes can be scaled up easier than a MOOC in which educational services such as personalized feedback and interaction are provided.
A simple translation of the economies of scale model to education does not take into account the complexity of educational processes and the different cognitive and meta-cognitive levels of education.While (pure) knowledge transfer can be applied at large scale, complex problem-solving and written argumentation require procedural knowledge, skills and competences, which are much harder to realize in a MOOC due to different needs of feedback and support for a large numbers of students [O'Toole,13].[Clarke and Dede, 09] describe this reductionist understanding of education as to "treat learning as if it were akin to designing a fast food restaurant, with very limited menu of pedagogical alternatives".The same reductionist view of education is underlying a purely quantitative perspective of scale.There is a correlation between the complexity level and intensity of education and the need for personalized feedback.The so called teacher bandwidth gets challenged by a high need of personalized feedback that often involves a 1-to-1 feedback situation [Yousef,Chatti,Schroeder & Wosnitza,14].
Translating the economies of scale approach to the educational context would mean that a teacher can support an unlimited number of students without increase of time investment or decrease of quality.To differentiate between the quantitative understanding of scale and an approach that takes into account the complexity of education, we propose to use the term 'educational scalability'.Educational scalability is the capacity of an educational format to maintain high quality despite increasing or large numbers of learners at a stable level of total costs.This approach combines a quantitative and a qualitative perspective.
Earlier, [Laws,Howell and Lindsey,03] have discussed the concept of scalability of distance education programs.The authors differentiate between high-tech and hightouch approaches to distance education to take into account the different qualities of educational formats that can be partially scaled easily via means of technology, but sometimes also have limitations in scalability due to unavoidable involvement of teachers.
That scalability is not only an issue in open online/distance education but becomes clear when one looks at the higher education system in general where costs are tremendously increasing through the large inflow of students [Bowen,12;Hülsmann,16;UNESCO,09;Piech et al.,13;Watson,14], and where scalability challenges are nowadays a daily challenge for lecturers.Some courses in the social science at bachelor level hold hundreds of students which makes it nearly impossible for the teacher to provide personal feedback.The course design consists of individual learning activities, passive lectures and has a focus on knowledge transfer.Assessment at large scale is often done in a summative way, such as simple factual knowledge based on multiple-choice questions.It is important to note that although factual knowledge forms a base and is needed at the beginning of the courses, course designs should also consist of more complex learning and personalized feedback at large scale.In this sense, the analysis of best practices of scalable educational design can also impact future models of higher education institutions, in which digital and face-to-face situations are combined in an intelligent way.To better understand educational scalability, we introduce in the next section the Iron Triangle as an important concept that relates costs, quality and scale to each other, which allows a more systematic perspective on scalability.

4
The iron triangle in education Originally, the Iron Triangle was introduced in the healthcare sector by [Kissick,94].
Compared to the economies of scale idea, the Iron Triangle focuses both on the qualitative as well as the quantitative perspective.It is a model with which the goal of widening access (scale) of high quality (higher) education with low(er) costs can be visualized 11;Lane,14].The original Iron Triangle consists of three equal sides and is based on regular classroom education, where the possibilities of improvement of the three sides are limited due to the relatively fixed costs (classroom size) and scale (teacher -student ratio) (Figure 1).In classroom education, an improvement in scale (more students in one class) would possibly result in lower educational quality due to teacher time divided per student.While the Iron Triangle is a good working metaphor, in practice, it is less rigid, in particular for educational innovations.Examples are the digitalization of education, including online access, open content and open education, which can lead to an increase in scale without necessarily resulting in a linear increase of cost.This has led some authors to the expectation that technology and open education can lead to a more accessible education system as a whole 11;Nye,14].11] state that "technology is able to stretch this triangle so that you can achieve the revolution of wider access, higher quality and lower cost".
Next to the idea that online education is able to break the Iron Triangle, a widely spread idea is that of MOOCs transforming education and being a disruptive innovation [Christensen,17;Conole,13;Horn,14;Bouwer,95].The term "disruptive innovation" indicates a product or service innovation that competes with existing products or services at the market by creating a new more appealing market with for example lower costs [Bouwer,95].Disruptive innovations often start at the low(er) end of the market, with a lower quality than that is already existing and slowly move their way up to the high(er) end of the market until the common products or services are exchanged by the formerly disruptive innovation.In the literature the comparison is often made between MOOCs and disruptive innovations.Several authors have formulated critiques why MOOCs cannot be regarded as a disruptive innovation [Rabin,Kalman & Kalz,submitted;De Langen & Van den Bosch,13].Although MOOCs do have characteristics of disruptive innovations they do and probably never will completely replace traditional education [Yuan & Powell,13].Furthermore, a MOOC in itself is not an educational unit/course with fixed rules and guidelines which makes it more complex to state that a MOOC in itself is a disruptive innovation.
As the Iron Triangle states, quality, costs and scale cannot be optimized all at the same time (Figure 1).There is always a trade-off between those three dimensions and educational scalability is influenced by the same tripartite relationship of scale, quality and costs as in sectors like economics or health care.When it comes to education at large scale, Ferguson and Sharples [14] refer to a "two-sided network" in which both teachers as well as students should gain benefit from scale.Yet, the benefits of increasing access and providing education on a large scale, are often of economical nature rather than pedagogical.Although courses for the mass enable students to access various resources and perspectives, they are not (fully) able to support students and teachers to benefit from the access [Ferguson & Sharples,14].In practice, it is unclear under which conditions large numbers of students can receive high quality education with low time investment by teachers.Translated to the open online education context, in which the scale dimension is the dominating increasing factor, this means that challenges are expected on the cost and quality dimension.
However, as mentioned before, increasing the quantity, the access, to high quality education material at low cost is only half the question.What matters (at the course level) is the educational approach with which these large amounts of educational materials are applied by the teachers and students.However, there are no guidelines for MOOC designers and teachers on how to address scale, cost and quality issues 14].The most important counter-argument for the idea that MOOCs broke down the Iron Triangle, and are fully developed disruptive innovations has to do with the role of the teacher and the knowledge, skills or competences to be studied and developed.
Additionally, it is not clear whether and to what extent the idea of 'zero marginal costs' can be achieved at the course level in MOOCs.Marginal costs are the costs that increase if, for instance, a company is producing one extra product.Zero marginal costs in MOOCs would imply that adding one student to the course would be without any increase in the total costs.At the macro level, high one-time production costs of expensive videos, quizzes and assignments pay off in the end if used by thousands of students.Yet, to achieve zero marginal costs for teachers at the course level, teacher time-intensive activities such as providing (personal) feedback and formative assessment, need to be automated or alternatives need to be developed.We propose in the next section a heuristic framework to enable the analysis of MOOCs regarding the scalability of their design.

A heuristic framework for scalability analysis
The framework has been developed to provide insight into the educational scalability of educational designs of MOOCs in particular.While massive open online courses are expected to allow learning at scale, the framework is used to analyse the potential bottlenecks for scalability and to identify best practices.Literature that combines economical (such as the Iron Triangle) with educational aspects (course design) in the context of scale or scalability appears to be sparse.Our framework incorporates both perspectives (the scalability aspect from an economical perspective and the course design aspect from a teacher/course designer perspective) and is used as the theoretical basis for the design analysis instrument.In theory, there are no limits to the numbers of students that can enroll in a MOOC.Similar to large scale having an impact on quality and cost, it has an impact on the course design, since teaching large numbers of students asks for different measures regarding learning activities and feedback.The design analysis instrument was developed to detect scalable best practices in MOOC design and give insight in the design elements that do play a role in scalable design.Identifying scalable best practices does not only enrich online or blended course design but is also of value for face-to-face education.
The heuristic framework (see Figure 2) has been developed in an iterative process over several months in which we gathered information from the literature and experts in the field of educational design and MOOCs.A literature scan about relevant course design elements brought us to related research on (MOOC) design [Anderson,04;Rosewell,14;Margaryan,Bianco & Littlejohn,15] which we used and adapted to our research purpose by clustering them to four umbrella categories: constructive alignment, complexity, interaction, formative assessment and feedback.Conference workshops (EMOOCs and Welten conference) with elements of the framework [Kasch,Rosmalen & Kalz,16] and an initial trial in a MOOC on the EMMA platform called 'Assessment for learning in practice' were used to gather more information and feedback on our framework.In all cases, experts did agree with the selection of the four main design categories and the definition of educational scalability.Further refinement of items via a pilot study has been conducted, too [Van Rosmalen,Kasch,Kalz,Firssova & Brouns,17].
Since the interpretation of the dimensions depends on the context in which the Iron Triangle is applied, we first discuss the dimensions at the course level and then relate them to our heuristic framework.

Educational Scalability: Scale, Costs and Quality
In an ideal situation, high quality will be linked with low cost at a large scale.How to achieve this sizing of dimensions should be considered during the planning and design of a course.It is expected that the large scale influences the educational design of a MOOC and has an impact on teachers' as well as students' roles and responsibilities [Downes,11;Kop et al.,11;Masters,11].In the majority of the literature, the impact of scale on costs has been approached in relation to the use of information and communication technology and the development costs of learning materials [Nye,14;Stewart,13; UNESCO, 09].However, little is known about the influence of scale on costs regarding the support that teachers need to provide to MOOC students.Within our framework an important distinction is made between scale and scalability (i.e.educational scalability).While scale (on the level) refers to the number of students, scalability refers to a desired capacity of an educational format to maintain high quality at large scale with stable teacher costs.MOOCs are said to be scalable by their very nature due to their potential of providing education at large scale."Massiveness is frequently used as a synonym for the concept of scale, or the vast growth potential reputedly offered by digital technologies."[Stewart,13].However, as mentioned earlier, it is not only the quantity that matters, but, equally important, the educational quality that is provided at large scale.
The costs of MOOCs can, for example, be divided between costs for content, ICT, tools and staff.While these costs have been studied and practices have been developed, as discussed above, controlling the cost of staff requires further attention.We define the costs as the time a teacher has to put into teaching practices, such as supporting students, providing feedback, and assigning grades.These costs are influenced by the other two scalability dimensions -scale (number of students) and quality (educational design aspects).Depending on the scale, costs can increase, however the quality aspect can decrease or shift costs from teacher time to technological support [Nye,14] or promote the use of other resources such as peers [Piech et al.,13;Van Rosmalen et al.,08].Technological support of students such as automated feedback or intelligent tutors can also decrease costs in teacher time.Structured discussion forums such as the "discussion bus" in the FutureLearn platform enable students to comment, reflect and respond on shared documents or proposed questions without needing a teacher to regulate the discussions [Ferguson & Sharples,14].
However, since it is not feasible to determine/observe the actual time a teacher invests in providing students with support and feedback, we will estimate the teacher costs by means of several factors in particular the feedback role a teacher has and how sensitive this role is regarding the number of students.
Similar categories are used as in recent MOOC design research [Hood & Littlejohn,15;Margaryan et al.,15] and literature focusing on design guidelines for online education [Anderson,01,04;Rosewell,14;Rosewell & Jansen,14;Saltzman,14].The four categories we propose should help us to focus on the question of which educational methods, means for assessment and feedback and interaction possibilities are able to scale.Of particular interest are methods regarding scalable interactions and feedback in MOOCs.Looking at the complexity of learning activities and how they embed support and feedback helps us to find out if and how there is sufficient spread and possibilities to engage in high(er) complex learning.Finally, course design alignment assures that there is 'common' (or educational) sense in the design.
The literature on quality of and design guidelines for MOOCs is extensively taking into account categories which start from regular educational practice, expecting policies similar to those expected by accreditation bodies with regard to institutional support, a fixed staff student ratio, and alike [Cohn,68;Rosewell,14;Rosewell & Jansen,14;Hood & Littlejohn,15;Lackner et al.,14;Yuan & Powell,13;Laws,03].As addition to these studies and guidelines, this study will take a slightly different focus by analysing MOOC design in the context of educational scalability and four basic educational design categories.In the following section the four designcategories are discussed in more detail.

5.2
Educational Design Categories: Constructive Alignment, Complexity, Interaction, Formative Assessment and Feedback Our framework focuses on four educational design categories discussed below (constructive alignment, complexity, interaction, formative assessment and feedback), which are expected to influence the design quality of MOOCs and which can be influenced by the scale and costs.All four design categories originate from regular education and are evenly important in (open) online education [Margaryan et al.,15;Carless et al.,17].Despite all the differences between (open) online and offline course (such as duration, students' demographics, technological support), in each case there is -yet to be defined how much-need for support and formative assessment and feedback.The struggle of interacting with large numbers of students and providing them the needed support and feedback without increasing teacher time involvement is not solved in offline education either.Therefore, new educational design approaches are needed 15].This heuristic framework differs from other frameworks/models/approaches presented in [Margaryan et al.,15;Rosewell and Jansen,14], in a sense that it does not aim to evaluate the educational design as such nor is it prescriptive to favour a design approach for open online courses.In addition, it is not prescriptive towards policies similar to those expected by accreditation bodies with regard to institutional support, a fixed staff student ratio and alike.The framework is used primarily to identify best practices in scalable open online courses.This framework brings four fundamental design categories together and enables teachers, researchers and students to analyse and improve the scalability of their course design.
Constructive alignment is an important quality aspect of scalability since it ensures the quality of the educational design and forms one dimension of the framework at hand.According to this concept, learning goals should be aligned with learning activities and assessment [Anderson,04;Biggs,03;Blumberg,09].There should be consistency between what students are ought to learn and the way students are taught and assessed.
The constructive aspect refers to the students' part, who should have access to learning activities that are relevant to construct meaning in line with the pursued goals [Biggs,03].The alignment aspect refers to the teachers' part, who have to ensure an active learning environment to support this [Biggs,03].For example, when striving for learning goals on a high complexity level such as applying skills in (simulated) real-life situations students should be offered learning activities that match such a level in order to acquire these learning goals.When providing education on a large scale it is not only challenging for teachers to support students in achieving their learning goals but also to assess whether they did (not) achieve them.As becomes clear, the constructive alignment aspect is an overarching quality aspect of the course design in its entirety.It is important to focus on the alignment between intended learning outcomes, learning activities and assessment because it is the backbone of the course design and it enables learners to choose a MOOC in line with their goals and helps to keep them motivated and regulate their learning.
The interplay between the complexity levels of learning goals, learning activities, assessment, feedback and interaction is another dimension of this heuristic framework.Miller's Pyramid is used to indicate the complexity level on which assessment, learning activities and goals were provided [Miller,90].The idea behind Miller's Pyramid is that education should be provided on several complexity levels [Van Berkel et al.,14].The need for authentic learning activities and a variety of learning activities is also stated in the literature [Anderson,04;Merrill,13].While there are several alternative models for assessment of complexity levels such as Bloom's taxonomy [Krathwohl,02;Bloom,56] the pyramid combines, from our perspective, the least complex way to assess learning tasks and their complexity.The pyramid does not suggest any order in which the levels should be provided but pleads for a balance in which the levels should occur.Although [Miller,90] focused on professional competence, the complexity levels of his pyramid can also be translated to regular education.According to [Miller,90], professional competence can be differentiated on four levels.On the lowest complexity level, the "knows" level, the focus lies on reproducing factual knowledge, whereas on the "knows how" level, students are asked to apply their factual knowledge in a specific context.On the 'shows how' level a student demonstrates learning and shows that (s)he can apply knowledge and skills within a defined context.On the highest level 'does' the competence to perform in a real-life situation is learned [Miller,90].Students perform in practice and apply knowledge and skills in (related) real world problems that even may be ill-structured.We aim to examine on which complexity level learning activities are provided and how this correlates with the interaction in the course.How and to what extent do MOOCs offer high complex learning activities and how are these supported by the teacher, technology and/or peers?
The interaction model of [Anderson and Garrison,98] is used to examine what types of interaction are offered in MOOCs.According to [Anderson,02,04] the three most common interactions in distance education are: student-student (S-S), studentteacher (S-T) and student-content (S-C) interaction.The type and way of implementation of the educational interactions influences the scale, cost and quality dimension.It can, for instance, make large numbers of students manageable for teachers by applying various peer assessment activities.According to Anderson's' equivalency theorem: "Sufficient levels of deep and meaningful learning can be developed as long as one of the three forms of interaction (S-S, S-T, S-C) are at very high levels.The other two may be offered at minimal levels or even eliminated without degrading the educational experience.High levels of more than one of these three modes will likely deliver a more satisfying educational experience, though these experiences may not be as cost or time effective as less interactive learning sequences."[Anderson,02] Translated to the MOOC context this could mean that limited S-T interaction can be compensated by high S-S and S-C interaction.In MOOCs where efficient S-T interaction is limited anyway, the design does not have to provide all interaction types in order to be of high quality.Since the teacher bandwidth is limited we are interested in improving S-S and S-C.However, instead of merely focusing on ways that can limit S-T it is equally interesting to investigate how S-T could be optimized by, for instance, technological aspects or a change in teacher role/function [Ferguson & Sharples,14;Kop et al.,11;Masters,11]Moreover, for each of the three forms of student interaction, this study specifically focuses on formative assessment and feedback.Within this study, we define formative assessment and feedback as a process in which the students' learning process is supported by feedback (loops) [Sluijsmans et al.,13].The provided feedback type and quality has a great impact on students' learning process and needs to be provided during or in between learning activities.To what extent and how do students receive support and formative assessment and feedback by teachers, peers and/or the learning environment?Formative assessment provides learners with information that allows them to improve their performance and learning [Hattie & Timperley,07].Formative assessment and feedback is time consuming and -already in regular education -easily leads to a work overload for teachers [Berlanga et al.,12].The core question is which methods for support and for formative assessment and feedback are suitable without impacting the teacher bandwidth [Hattie & Timperley,07;Mory,04].

Methodology -testing the design analysis instrument
We have designed an instrument that translates the theoretical framework to practice and aims to bring to the surface how open courses instantiate the aforementioned design categories (constructive alignment, task complexity, interaction, formative assessment and feedback), and put forward best or interesting practice.An initial version of the instrument was constructed based on items derived from the theoretical dimensions.Before the instrument was applied in this study it was tested and revised during two pilot studies.The instrument was first tested by colleagues, and second as an assignment in the context of an open course for educational science professionals [Van Rosmalen,Kasch,Kalz,Firssova & Brouns 17].The responses and the feedback of the professionals have been used to revise the instrument in terms of item clearness and length.The final instrument consists of 48 items, 16 closed, 18 open and 14 mixed (both open and closed) questions with which the selected educational design aspects of MOOCs can be analysed.The above mentioned design categories (constructive alignment, task complexity, interaction and formative assessment and feedback) were all incorporated in the instrument (see Appendix).
To ensure that the instrument is reliable and consistent in analysing the design categories it was tested by two raters.Via a small stratified random sampling strategy we selected five MOOCs of the following domains: Teaching & Education, Programming, Health & Medicine, Business & Management and Humanities.The MOOCs were provided on the Coursera and FutureLearn platform.The five courses were selected according to the following criteria: listed in the repository www.classcentral.com,delivered in English, containing some form of formative assessment and feedback and available during the time of the analysis.The first MOOC that matched the criteria in each of the selected domains was analysed.
To make the analysis manageable and to enable a fairly detailed analysis the decision was made to focus the analysis on just a Unit of Learning (one week or one module) in the middle of each of the selected MOOCs.Therefore, the Unit of Learning halfway through the MOOC was analysed.Before the raters used the instrument we ensured that both of them were familiar with the concepts used and the meaning of all items.We tested the inter-rater agreement between both raters for the initial version of the instrument leading to a value of Cohen's kappa of 0.32.The instrument was revised again this time adding additional information on how to interpret certain answer options.We also clarified the three interaction types S-S, S-T and S-C and in which cases they should be labelled.All discrepancies between the raters were resolved by discussing the results of the first analyses in detail and clarifying some of the items and answer descriptions.In the end, 28 of the 48 items of the design analysis instrument were used to calculate a Cohen's kappa.The remaining items were not suitable/valuable for a Cohen's kappa calculation e.g.open questions, general questions such as name of the rater and overall background information of the MOOC.For the sample of five MOOCs a Cohen's kappa of .94 was achieved.The findings of the five case studies will be discussed in the following sections.
During the MOOC design analysis, all items of the instrument were answered by surveying the corresponding UoL and general course description of the MOOC.The MOOC analysis took an average of one hour for each course.For the qualitative analysis (see Figure 3) we selected those design analysis items that would give a rough indication of the scalability.All remaining items were used as background information.Detail questions were added to the instrument to provide more insight in underlying design choices and therefore provide more contexts to the analysis instrument (see Appendix).A quantitative perspective, mirroring the educational scalability of the UoLs studied, i.e. a spider web chart (Figure 3) and section 7.1 was created based on a selection of items: Learning goals (item Q6b), Learning activity (item Q7), S-T interaction (items Q9 + Q10 + Q11), S-S interaction (items Q16 + Q17 + Q18) and for S-C interaction items (Q25 + Q26) see for more information the instrument document (see Appendix).

Findings and Discussion
Overall the educational design in the five UoLs was very similar regarding the learning activities, interaction possibilities and feedback methods.All UoLs provided videos, reading materials and quizzes.In two out of the five UoLs, we found (potential) scalable support and feedback methods that were able to provide large numbers of students with support and feedback, without increasing or depending on a high teacher time involvement.

Short overview results
The spider web chart below (see Figure 3) provides an overview of the educational scalability of the interactions in the five UoLs studied.For the learning goals and the learning activities the levels correspond directly to the four Miller levels.In principle, there is constructive alignment between goals and activities if the values are equal.For each of the interaction types, i.e.S-C, S-S, S-T, we calculated a value between 1 and 4 representing their level of educational scalability.The following example illustrates how we derive the educational scalability of S-S interaction.It starts with the observation whether or not there are activities or assignments upon which students are requested to give feedback.Next, if so, it takes into account to which degree the feedback process is structured and supported, e.g. with rubrics and an exercise to train the use of the rubrics (4 points max); the level of the feedback, i.e. ranging from a simple correct/incorrect to an elaborated response (4 points max).The values are added and divided by the number of scores.The resulting values will range from 0, if there is no feedback, to 4, if the feedback process is well structured and supported and the feedback is expected to be elaborated.Learning goals and learning activities: First of all we examined whether learning goals were stated in the UoL and, if so, at which complexity level.As can be seen in Figure 3, learning goals were provided in all (except one) UoL.However, overall, learning goals were only formulated on a very provisional manner, which will make it difficult for the student (and us) to understand what (s)he will learn in the UoL.Therefore, we analysed the complexity of the learning goals in the context of the given learning activities.Although analysing the learning goals in the context of the given learning activities enabled us to label their complexity level it also gives a distorted image of the constructive alignment in a course.Ideally, you would be able to label learning goal complexity independently from the learning activities; however, this seemed not feasible given the vague formulations of the learning goals.More information on how the learning goal complexity is analysed and labelled can be found in the instrument (Appendix).Unspecified learning goals do have a negative influence on student learning because they do not know what to expect and therefore are not able to prepare sufficiently 14;Carless et al.,17].These findings support previous findings of [Margaryan et al.,15] who found that nearly half of the analysed MOOCs did not specify learning goals and most of the remaining MOOCs provided immeasurable learning goals.Interaction & Feedback: We have examined whether and to what extent the teacher(s), peers and learning environment had the role of providing (formative) feedback.As Figure 3 shows, we found that all three interaction types (studentteacher, student-student, student-content) were provided in the educational design of the five UoLs.Interaction was provided via discussion fora, discussion prompts and automated answers, hints and comments in quizzes.With regard to the interaction quality we found that the underlying organization and clarity of all three interaction types was poor, hence the low levels in Figure 3.The feedback role of the teachers, peers and learning environment was unspecified and students were not informed whether, when, how and on what criteria they will receive and/or provide feedback.Additionally, the design did not provide information about the feedback type, intensity and directness that will be provided.The feedback itself was corrective and did not elaborate on weak and strong points.Only in one case, the use of fora was supported by a pre-structure.The educational design did not provide any information on which criteria students had to discuss with each other, nor did they get informed about why the discussion forum was useful.With one exception, the one guided by a structure, a general lack of interactivity between students and between students and teachers was found in the discussion fora.All MOOCs provided simple multiple choice quizzes that the students could use to test their factual knowledge.Students had several attempts (at least 3 in one hour) to make the quiz which often counted for the final grade.The number of items in the quizzes varied across the MOOCs ranging from 3 items to 15.

Detail items results
The next section of the instrument focussed on the feedback quality of the three interaction types and the overall information that was provided to support the students' learning process.In general, the MOOCs provided students with only very limited information about the expected prior knowledge they need to have in order to follow the MOOC, and, in some cases, about the target group of the MOOCs.By informing students about required prior knowledge they get the chance to prepare themselves for the MOOC which enhances learning [Anderson,04].None of the MOOCs did inform students about how to use and reflect on feedback they would receive.Providing and receiving feedback is a process, in which students do not only receive feedback with regard to the learning goals but also look back on the discrepancy between the learning goals and their current learning outcomes [Sluijsmans et al.,13].Without closing the feedback-loop the student does not get the chance to reflect on and improve his/her learning process [Sluijsmans et al.,13].Additionally, students did not receive any guidelines on how to provide formative feedback.Feedback can only be given efficiently if it is supported with general criteria that are clearly communicated and explained not only to the feedback provider but also to the receiver [Carless et al.,17;Hattie & Timperley,07].

Student-teacher interaction: Teacher feedback quality
We focused on the sensitivity of the teacher feedback towards the numbers of students, the structuredness, the goal of the feedback and the information density.In one UoL, teachers were highly engaged in the forum and supported students during the learning activity.Students had to install an app and perform several programming tasks.In this UoL, students were highly active in the discussion forum and posted comments and questions about the tasks.The teacher provided personalized feedback to individual student problems with their tasks and this way supported the students during the learning process.The feedback contained suggestions to address the problem indicated.In this UoL, we found an example of scalable good practice.The teachers' made a sum-up video at the end of the week, which provided students with support and feedback about the learned material and activities.Providing students with summaries is not only an effective way of ending a Unit of Learning but it also helps students to process the learning material better 14].
Students could ask questions about the learning materials and activities by writing them down in the discussion forum.They were also asked to read their peers' questions and to like those questions that they found interesting too.The most highly liked and therefore requested questions were discussed by the teacher in the sum-up video.That way the students had input on the content of the teacher feedback, which enabled the teacher to focus and tailor his/her feedback to the students' needs despite the large numbers of students.This feedback method differs from the common approach in which MOOCs are pre-produced, i.e. without adding anything based on the feedback by learners during the runtime of the MOOCs.Student-teacher interaction in the other UoLs depended merely on the fora and was almost nonexistent.

Student-student interaction: Peer-feedback quality
All UoLs facilitated student-student interaction via forums and discussion prompts.However, with exception of the 'liking' of the most pertinent questions in the example discussed above, students were not specifically instructed or assigned to provide peers with hints, comments or questions.There were no guidelines on how to discuss in the forum which lead to rather unrelated comments.As mentioned earlier, in four out of the five UoLs, fora were not used; the lack of a relevant purpose might be related to this.

Student-content interaction: Automated feedback quality
In all UoLs, the learning environment had the most dominant feedback role.Automated answers and hints were provided as a reaction to students' submitted answers in multiple choice quizzes.The feedback was given on the task level for each item of the multiple-choice quiz.The specificity of the feedback varied across the UoLs, from specific content hints to 'correct/incorrect' notifications.The formative feedback given during the quizzes wasn't provided via specific tools nor supported by worked examples or rubrics.A scalable good practice was found in the UoL of the business and management MOOC.In a series of video lectures the students were introduced to 'time value of money' and its underlying concepts and calculations in different contexts.For each context the corresponding calculation was demonstrated in the video lectures with the help of an Excel sheet.The video lectures were interactive and "communicated" with the students by guiding them through the calculations and their meanings.In the quizzes, the students were confronted with a problem which they could solve by analyzing the problem and applying the spreadsheet.The feedback given briefly explained the right way to use the spreadsheet.

General Discussion
The instrument applied provided insight into the educational design of MOOCs in particular in the applied support and formative assessment methods.The selected sample showed that four out of the five UoLs pursued learning goals and activities at low(er) complexity that focus on students' acquiring factual knowledge.These results match with related research about MOOC, design which points out that many MOOCs aim to educate students through video lectures and multiple-choice quizzes [Conole,13;Margaryan et al.,15;Terwiesch & Ulrich,14].The use of video lectures, multiple choice quizzes and articles itself is not necessarily a demonstration for low quality but rather a common design practice in MOOCs.Although norms of regular classroom and online education can provide guidance in determining the design quality [Margaryan et al.,15;Ossiannilsson et al.,15;Rosewell & Jansen,14], it is not likely that MOOCs can adhere to the same criteria.For finding best practices, a better way is to analyse the educational design of MOOCs and to compare them with other MOOCs rather than with classroom education [Weller, 2014].As stated by Philippa Levy in [Bayne and Ross,14]: "in thinking about the pedagogy of MOOCs, it will be important to continue to avoid preconceptions, in particular about what teachers in higher education 'should' and 'should not' be doing, as these assumptions may not be helpful in new environments".The results of this study indicate that the interactivity, collaboration and support in the five analysed MOOCs are rather low, although the analysed MOOCs provide the possibilities for student-student and student-teacher interaction.Students did not engage in discussion fora nor did teachers activate students to ask questions.Poorly instructed discussion fora lead to unorganized und unstructured interaction.The qualitative findings of the five case studies confirm that a purely quantitative approach in MOOC design does not necessarily lead to high quality education in terms of support and formative assessment and feedback.Simply providing discussion fora, which in theory supports large scale interaction, does not lead to high quality interaction that students can benefit from, unless the educational design provides information and feedback criteria.Applying tools such as discussion fora need to be structured and explained in order to support interaction and learning.
The five case studies showed us that designing a course at large scale and with low teacher costs seems not to be the challenge.However, what our analysis showed is that the challenge of providing quality with low teacher costs at large scale already appears at low complex learning activities such as simple multiple-choice quizzes.The content and organisational quality of interaction and feedback can be improved by adding more depth and variety.Automated feedback can be more elaborated by, for example, providing students with explanations on why their answers were correct or incorrect and/or strong or weak.The scalability of feedback can be enhanced by providing a variety of feedback types (feed up, feedback, feed forward) that are more elaborate and focus on several levels (task level, performance level, etc.).Scalable feedback methods such as the sum-up video, which took into account student needs, and the lecture video, which guided students through several scenarios and exercises, can be applied in all MOOCs independent of their domain.These two examples enable teachers to provide large numbers of students with feedback in a cost efficient way.Overall, multiple-choice quizzes were used in a very simplistic way, enabling students to assess their factual knowledge.A study about the quality of multiplechoice quizzes in MOOCs showed that nearly half of the analysed questions contained item writing flaws [Costello,Brown & Holland,16)].Although multiple-choice questions are commonly used in MOOCs as formative assessment and feedback, improvement is needed.Instead of providing students with factual questions, the content and feedback quality of multiple-choice quizzes could easily be improved by increasing the diversity of question types.The same applied for the offered answer options.
The instrument was designed to provide insights on important educational design aspects regarding student support and formative assessment and feedback.The results of this study are clearly not representative for MOOCs in general but they show the value and reliability of our instrument when applying it.An advantage of this instrument is that it enables us to detect scalable best practices on a detailed level.A MOOC that in general has a weak design can still contain interesting and/or best practices in certain design elements.This instrument enables us to detect these practices.Teachers, designers but also students can use the framework and instrument to reflect about the educational design in (open online) courses.The instrument enables us not only to map the different design aspects of a MOOC, but also sheds light on their underlying structure and connectedness.

8
Conclusions and future work The purpose of the current study was to introduce and to validate a framework for scalability of educational designs for Massive Open Online Courses and to determine scalable practices in MOOC design regarding support and formative assessment and feedback.The heuristic framework and the design analysis instrument enabled us to analyse educational design aspects of MOOCs in detail.The instrument proofed to be applicable in a reliable way.More important, already the small sample of MOOCs studied for validation of the instrument did show to be useful to identify scalable practice.We identified two interesting design examples that show how large numbers of students can be provided with high quality feedback in a teacher time efficient way: a sum-up video made by the teacher that was tailored to the needs of the students and an instructional video that provided several examples and included exercises and guidelines for the students.While MOOCs were initially designed and implemented without any adaptations during the runtime of the course, the first example shows us that some designers deviate from this initial practice and adapt the course design based on the feedback and activities of participants.The second example represents an interesting and transferable practice how to manage expectations of learners in MOOCs and how to react to different levels of academic study experience and selfregulation.Non content-related hints and support mechanisms contribute to a more balanced course design for more diverse participant profiles.
Identifying such examples can help other teachers and designers to improve their own course design.The instrument not only identifies best practice examples but also provides the underlying information and design patterns, which than can be implemented or adapted in other settings.
In general, the educational scalability of the provided formative assessment and feedback methods was low, i.e. the analysed designs can deal with scale but the quality was low.Also in this case the framework might be useful.In the general discussion, based on the analysis, we indicated some fairly straightforward ways to improve the designs.
This study has valuable implications for educational design practice by introducing a framework and instrument that takes into account common course design criteria and relates them to educational scalability issues.By means of the instrument, teachers, designers and students can analyse the scalability of their course design and gain insights on the interaction and formative assessment and feedback quality and, hopefully, will use them to accommodate their design.
After having tested the instrument we will apply it in a future study by taking into account two important limitations of this current study.First in our next study we will take a larger, more representative sample.Second, we will ensure that there is a balance between the different MOOC platforms.In this study, due to a stratified sampling strategy and the relatively small number of MOOCs, only two platforms were included.Although we do not focus on platform characteristics we are aware that they differ in possibilities and limitations, which of course influence the educational design.Finally, we aim to use the instrument to do both a study covering various domains and an in-depth study of one domain to analyse to what degree best practice identified are domain dependent or can be generalized over domains.

Figure 1 :
Figure 1: The Iron Triangle "Placing Students at the Heart of the Iron Triangle and the Interaction Equivalence Theorem Models", Lane, A., 2014, p2.

Figure 3 :
Figure 3: Score of each UoL regarding the complexity of the learning goals and learning activities and the educational scalability of the interactions provided.

Figure 3
Figure3also depicts the highest complexity level of the learning goals and activities that could be found in the UoL.Learning goals and activities of the highest complexity level were given at level 3 implying a medium complexity in which students demonstrate learning by applying knowledge and skills within a defined context.The learning activities with the complexity level 3 were quizzes with open and closed questions and the application of a software tool.Interaction & Feedback: We have examined whether and to what extent the teacher(s), peers and learning environment had the role of providing (formative) feedback.As Figure3shows, we found that all three interaction types (studentteacher, student-student, student-content) were provided in the educational design of the five UoLs.Interaction was provided via discussion fora, discussion prompts and automated answers, hints and comments in quizzes.With regard to the interaction quality we found that the underlying organization and clarity of all three interaction types was poor, hence the low levels in Figure3.The feedback role of the teachers, peers and learning environment was unspecified and students were not informed whether, when, how and on what criteria they will receive and/or provide feedback.Additionally, the design did not provide information about the feedback type, intensity and directness that will be provided.The feedback itself was corrective and did not elaborate on weak and strong points.Only in one case, the use of fora was supported by a pre-structure.The educational design did not provide any information on which criteria students had to discuss with each other, nor did they get informed about why the discussion forum was useful.With one exception, the one guided by a structure, a general lack of interactivity between students and between students and teachers was found in the discussion fora.All MOOCs provided simple multiple choice quizzes that the students could use to test their factual knowledge.Students had several attempts