The impact of direct instruction in a problem-based learning setting. Effects of a video-based training program to foster preservice teachers’ professional vision of critical incidents in the classroom

S Classroom disruptions are challenging. Problem-based learning (PBL) may help preservice tea- chers prepare for these situations through self-directed knowledge acquisition or direct instruction. In a first study, we applied a two-group design where students acquired knowledge through either self-directed learning (CG) or direct instruction (EG). Depending on the treatment, we examined differences in knowledge about classroom disruptions and in professional vision (no- ticing and knowledge-based reasoning). Knowledge was assessed with a multiple-choice test, and professional vision through video case analysis. EG showed higher scores in knowledge than CG and mentioned more knowledge-based reasons. In a second study, pre-post comparison showed increased knowledge and reasoning over time. Noticing did not differ between groups in Study 1 and remained stable in Study 2. learning vs. direct instruction of the relevant material in a problem-based learning environment on three cognitive outcomes: knowledge on classroom management, noticing critical incidents in the classroom, and knowledge-based reasoning.


Introduction
Preservice teachers struggle to manage student misbehavior and to implement effective strategies for preventing and coping with classroom disruptions when beginning their teaching careers (Aloe, Amo, & Shanahan, 2014;Gallagher, 2009;Melnick & Meister, 2008). A key precondition for preventing and intervening in disruptions is that teachers are able to identify relevant characteristics of the situation (noticing) and draw conclusions regarding their own actions (knowledge-based reasoning) in highly complex classroom situations (Doyle, 1986). If a teacher notices, for instance, that students are engaged in activities unrelated to the lesson, they can intervene quickly to recapture students' attention and get them involved in the lesson again. The skills of noticing and knowledgebased reasoning are subsumed under the term "professional vision" (Sherin, 2001).

Fostering preservice teachers' professional vision of critical incidents in classroom
Dealing with classroom disruptions is a key component of classroom management, a competency that can be understood as comprising "all of a teacher's activities that are aimed at establishing and maintaining social order within the social system of the school class" (Ophardt & Thiel, 2013, 46). Building on the work of Doyle (1984), Ophardt and Thiel (2013) describe how these activities are outlined in a lesson plan, which covers learning goals and methods and a program of action that has to be adapted situationally to the concrete teaching context. A number of strategies can be used to prevent and intervene in disruptions (see Thiel, 2016). First, teachers have to establish an interactional order. This means introducing rules and procedures, encouraging and modifying behavior to conform with the rules, and establishing a working relationship. Within this interactional order, teachers establish a clear program of action (e.g., introducing an assignment) and use verbal, paraverbal, and nonverbal signals to control students' attention and behavior. An important precondition for effective control is that teachers continually have an overview of the entire class and that they demonstrate this to their students (see Kounin, 1970). Guiding lesson flow effectively means maintaining continuous group activation, which includes (Kounin, 1970): keeping all students activated (group alerting), keeping all non-active students on standby (high participation formats), and conveying the impression that learning outcomes could be monitored at any time (encouraging accountability). For effective intervention, it is fundamental that the teacher is able to correctly assess the reason for the disruption, the scope of the disruption, and the potential dynamics of the disruption (see Thiel, 2016). Here, it should be kept in mind that, on the one hand, disruptive behavior of the student has to be stopped (individual focus), and at the same time, the program of action for the class must be maintained (class focus). To effectively stop the disruptive behavior, it is crucial that the teacher selects the appropriate strategy. In the case of a minor rule violation (e.g., whispering to a neighbor) it may make sense not to address it directly but to demonstrate awareness of the incident through eye contact. At the same time, the program of action has to be strengthened by intensifying the attentional control of the entire learning group. As soon as a threat to the program of action arises, the teacher has to intervene, or the disruption may create ripple effects. A minimal intervention (e.g., a brief reminder of the rules) can counter this. Teachers' professional vision is a key component of a successful approach to disruptions (see Wettstein, 2013).
The multidimensionality of the classroom, with different events occurring simultaneously, and the immediate need for classroom management are particularly challenging for teachers (Doyle, 2006). A majority of classroom management issues are also relatively ill-structured, with no single best way to handle them effectively. In a study by LePage et al. (2005), an expert-novice comparison showed that the relevant skills and knowledge to manage classrooms develops predominantly during practical experience. As compared to expert teachers, less experienced teachers have trouble balancing care and control (Nie & Lau, 2009;Weinstein, 1998); they tend to lose the group focus of the whole class when reacting to misbehavior (Thiel, Richter, & Ophardt, 2012;Westerman, 1991) and their attention is more scattered across the classroom (Wolff, Jarodzka, van den Bogert, & Boshuizen, 2016). These differences are ascribed to differences in cognitive structures: Experts have more case-based, better connected, and thus more flexible knowledge (which is organized in schemata) than novices. This allows them to notice the critical incidents in the classroom and choose from a broad variety of possible strategies, adapting sensitively to the context at hand (Berliner, 2001;Carter, Cushing, Sabers, Stein, & Berliner, 1988;Spiro, Feltovich, Jacobson, & Coulson, 1991).
Therefore, it seems crucial to initially create preconditions that foster preservice teachers' professional vision of critical incidents in classroom. This may pertain to 1) building schemata that enable teachers to identify relevant events in the classroom and 2) enhancing the ability to use this knowledge to interpret these events correctly, to anticipate the further course of action, and to develop suitable action strategies. Professional vision (Sherin & van Es, 2009;Sherin, 2001) supports the two processes of noticing and knowledge-based reasoning: Noticing describes whether teachers pay attention to important events and ignore those that are irrelevant to maintaining the activity flow and guaranteeing learning (e.g., small disturbances that do not jeopardize the course of action and may not cause ripple effects). Knowledge-based reasoning refers to the ways in which teachers on the one hand reason about what they notice and on the other hand develop appropriate strategies to handle situations, based on their knowledge and understanding (Sherin & van Es, 2009, e.g., a teacher praises the effort of a student who failed in order to reinforce motivation), indicating differentiated, integrated, and flexible knowledge (Seidel & Stürmer, 2014). In fact, recent research has demonstrated that working with videos is an effective means of teaching professional vision (Gold et al., 2013;Stürmer, Seidel, & Holzberger, 2016;van den Bogert et al., 2014).

Problem-based learning in teacher education
PBL is an umbrella term that encompasses many different instructional approaches and applications. A key underlying principle of PBL is that it anchors learning and teaching in concrete, realistic problems (Hmelo-Silver, 2004;Schmidt, 1994). More specifically, Barrows (1996, p. 5f) identified six core characteristics of PBL learning environments: 1) learning is student-centered 2) learning occurs in small student groups, 3) teachers are facilitators or guides, 4) problems form the organizing focus and stimulus for learning 5) problems are a vehicle for the development of problem-solving skills, and 6) new information is acquired through self-directed learning.
According to these characteristics, PBL is supposed to meet a broad spectrum of instructional goals such as promoting the construction of extensive and flexible knowledge, fostering effective skills in problem solving and collaboration, and influencing intrinsic interest in the subject matter (Barrows, 1996;Hmelo-Silver, 2004;Norman & Schmidt, 1992). A number of meta-analyses indeed identified positive effects favoring PBL when compared with traditional instruction: PBL leads to better program evaluation, to superior skills and clinical performance, and to increased retention of knowledge. However, PBL was found to be equally or even less effective than traditional instruction with regard to academic achievement and acquired knowledge (Albanese & Mitchell, 1993;Dochy et al., 2003;Hmelo-Silver, 2004;Norman & Schmidt, 1992;Vernon & Blake, 1993).
Of particular interest to the present study is Dochy et al.'s (2003) meta-analysis, which was the first study that extended the literature base to other domains than medical education and also constrained its selection of studies to solely (quasi-)experimental designs. In line with previous evidence, they found a small but significant effect on student knowledge favoring traditional instruction. Yet, subsequent moderator analyses revealed that the student expertise level was associated with the variation in effect sizes: Traditional instruction led to a more structured knowledge base in studies with first-and second-year students than PBL, but led to a less structured knowledge base for third-year and advanced students. These findings are in line with a theoretical debate initiated by Kirschner et al. (2006). Based on the "knowledge of human cognitive architecture, expert-novice differences, and cognitive load theory" (p. 75), they argue that PBL is not suitable for the knowledge acquisition of novices-even though it may suit advanced learners or experts, who have already stored relevant information in their long-term memory, which in turn frees up capacity in the working memory that is needed for the problem-solving process. In contrast, learners who lack the "extensive experience" that experts draw on to "quickly select and apply the best procedures for solving problems" (p. 76) are overwhelmed when meeting the high demands of unguided PBL.
Consequently, the construction of a PBL environment involves a careful analysis of both the learning goals (addressed outcomes) and the target group (addressed learners). In order to tailor the PBL model to the developmental level of preservice teachers, additional support in the tradition of direct instruction may be implemented, such as scaffolding, elicited explanations, process worksheets, and worked examples (Alfieri et al., 2011;Kirschner et al., 2006;Wijnia et al., 2014). In addition, it seems promising to supply preservice teachers with a "framework for reflecting on what was observed [and] concrete images of innovative and alternative teaching strategies" (Santagata, Zannoni, & Stigler, 2007, p. 125).

A newly developed training program to foster preservice teachers' professional vision of critical incidents in the classroom
Based on the current state of the research and previous experience with professional learning about classroom management among inservice teachers Piwowar et al., 2013), we developed a PBL training program for preservice teachers starting with an introduction to classroom management taught by an instructor. The training program consisted of four sessions in total that lasted 90 min each. At the core of the treatment were two ill-structured (video-based) problem cases for the participants to solve collaboratively in groups of four to five students. The problem cases exemplified severely disrupted classes, unfavorable classroom situations, and poor classroom management in a video sequence: the observed teacher applied common, but rather ineffective, strategies to both prevent and handle misbehavior, maintain a conducive teacher-student relationship, and proactively steer the activity flow; the situations escalated (i.e., a student left the classroom in case 1, a student refused to participate in class in case 2). For each problem case, we also had a corresponding best practice example that served as a case exemplar at the end of the treatment (Moreno & Valdez, 2007;Wijnia et al., 2014). In order to be able to provide matched cases of the same initial classroom scenario (problem case vs. case exemplar), we used scripted video vignettes . In a previous study  these scripted video vignettes were rated by preservice teachers (N = 81). More than 90% valued the videos both as a good basis for discussion and reflection and as credible and authentic. Eventually, because we had to assume that participants were unfamiliar with PBL, the problem-solving process was structured to follow a process worksheet (Alfieri et al., 2011).
The training program started with two lecture-like sessions that provided a systematic framework on heuristics, including theoretical terms and schemata on questions of classroom management and dealing with disruptions. In the third session, the students formed small groups and watched the two problem ("dysfunctional") video cases. For each case, they proceeded with a three-step worksheet, which included 1) recording relevant events 2) evaluating the teacher behavior with reference to theoretical and empirical research, and 3) developing alternative, favorable teacher strategies. In the fourth and final session, participants had the opportunity to compare their identified alternative teacher behaviors to those in the respective case exemplars ("functional" cases) and discuss the acquired strategies.

Objectives
Since PBL has been shown to be a useful approach in higher education and, more specifically, teacher preparation (e.g., Choi & Yang, 2011;Edwards & Hammer, 2006;Golightly & Raath, 2015;Hushman & Napper-Owen, 2011;Yoon, Woo, Treagust, & Chandrasegaran, 2014;Zhang, Lundeberg, Koehler, & Eberhardt, 2011), we developed the described PBL environment using video cases in a secondary teacher preparation course on the topic of classroom disruptions. PBL also has elements of direct instruction, so although we were convinced of its usefulness in preparing teachers for the classroom, we assumed that it should not be implemented without specific adaptations in this phase of teacher training. Novice teachers, who are not familiar with the PBL cycle, may experience difficulties in acquiring new information through self-directed learning and may therefore be unsuccessful in handling their problem case (Wijnia et al., 2014). Thus, the overarching goal was to gain a broad understanding of the usefulness of this newly developed PBL environment, which contained two research questions. First: Is a training program that merges direct instruction and PBL effective with regard to preservice teachers' professional vision of critical incidents in the classroom? And second: Is this training program superior to a more traditional PBL treatment starting with a self-study phase?
Therefore, we conducted two experimental studies (see Fig. 1). Study one served to explicitly examine the second question and compared two treatment conditions: Participants in the control group (CG) acquired knowledge through self-study in line with Barrows' (1996) core characteristics of PBL settings. In the experimental group (EG), on the other hand, relevant theories, strategies, and principles of classroom management were systematically introduced by a university teacher using direct instruction (contradicting Barrows' sixth core characteristic). While the results of a similar study published in Kumschick et al. (2017) revealed higher content knowledge, higher identified motivation, and higher teacher self-efficacy in classroom management in the EG as opposed to the CG, the present study sought to further complement those results with an in-depth, quantified qualitative analysis of aspects of professional vision regarding critical incidents in the classroom. We expected that participants in the EG would outperform V.L. Barth, et al. International Journal of Educational Research 95 (2019) 1-12 participants in the CG not only with regard to knowledge but also with regard to noticing and reasoning (group difference hypothesis).
In Study 2, we sought to further study changes occurring after training participation through pre-post comparison. Based on the results of Study 1, we only conducted the new treatment in the EG described above. However, we used an experimental design in which group A filled out the questionnaires to assess knowledge and professional vision pre-post, and group B filled out the questionnaires only after intervention. This research design allowed us to study enhancement while controlling for potential pretesting effects (affecting the way the treatment works) and retest effects (participants remember their answers and improve performance independently of participation in a treatment, Kulik, Kulik, & Bangert, 1984). We expected that both knowledge and skills (that is, noticing and reasoning) would increase over time (change hypothesis).

Sample and study design
The two classroom management trainings were implemented in a university course for N = 237 preservice secondary teachers in their first year of master's-level study. Eight study groups were randomly assigned to either the experimental group (EG) or the control group (CG). For this analysis, we only analyzed the data from participants who were part of all measures in post testing with N = 198 (n EG = 108 and n CG = 90). Nine participants who dropped out of the original sample did not take part in video analysis (participation in testing was not mandatory). The EG and the CG were comparable with regard to their sex (58% female) and their age (M = 26.03 years, SD = 4.01).

Treatments
Whereas the EG took part in the PBL-training described above, the treatment of the CG was rather traditionally problem-based, as the participants gained the relevant knowledge in a more self-directed way. In the first session of the CG, the student groups first watched the problem cases and were then required to record relevant incidents of the video cases collaboratively (step 1 of the process worksheet). Based on this, the groups identified learning topics for the subsequent problem-solving process. In the second session, they learned independently about key theoretical concepts and empirical findings on strategies to prevent and deal effectively with disruptions by reading relevant literature provided in a mandatory reader (contents of the literature were identical to the contents of the systematic introduction in the EG). In the third session, group members first shared and discussed the literature and then proceeded according to the process worksheet (steps 1 and 2). The fourth session was identical to the session in the EG.
Three university teachers were randomly assigned to the seminars; each attended the same number of EG and CG. Thus, teachers were not blind to treatment conditions. The role of the teacher was to facilitate the learning process during problem solving and give the introductory sessions in the EG.

Measures and data analyses 6.1.3.1. Multiple-choice test on classroom management knowledge (MC-test).
We conducted a self-developed multiple-choice test with 20 closed-ended items that covered factual knowledge, understanding of linking principles, and an application of both aspects to written practical examples. Participants had to choose one correct out of six possible answers, for example: "Which teacher behavior is most appropriate for handling minor disruptions? Pick a student to repeat what has been said; Engage other students to regulate misbehavior; Reprimand in a timely way; Use gestures and make eye contact (correct); Refer to rules." The total score resulted from adding up the number of correct answers. Cronbach's alpha was satisfactory at the lower bound with α = .60. The development of the instrument was based on a validated observer rating and a student rating for secondary teachers to assess classroom management competencies (Piwowar, 2013). The rating instruments comprise nine theoretical facets of classroom management strategies whose factor structure was tested with exploratory factor analysis (Piwowar, 2013). For test economic reasons, each theoretically modelled facet was represented with about two multiple-choice items. No re-examination of the factor structure was carried out. The development of the MC test contained several pilot phases. First, the items developed were discussed intensively within a group of five experts in this domain. Second, think aloud analyses were applied with two preservice teachers to test item difficulty, the adequacy of phrasing, and the connotations of the distractors. Third, a quantitative indication of congruent test validity was deduced from applying the MC test in a Bachelor seminar that also focused on issues of classroom management (N = 74). Since previous studies showed that an increase in knowledge of class management is accompanied by an increase in both self-reported teacher self-efficacy in classroom management (Pfitzner-Eden, Thiel, & Horsley, 2014) and self-reported knowledge (Böttcher-Oschmann, Groß-Ophoff, & Thiel, 2018), we also studied these correlations. We used Pearson's Product Moment Correlation Coefficient to show that the MC test score was correlated with teacher self-efficacy in classroom management (r = .19) and two subjective change rating items ("In this seminar I have learned a lot about effective classroom management", r = .33; "In this seminar I did not learn anything really new about classroom management", r= -0.43).

Video-based case analysis on professional vision.
Teachers have to respond quickly to spontaneous (disruptive) events by recognizing (noticing) relevant situations and evaluating them (reasoning) under pressure to act. To measure the competencies of professional vision in as realistic a way as possible, we took three conditions into account in our measurement method : 1. To visualize the complexity of the teaching situation, we used a video-based case; 2. To simulate the pressure to act, we gave participants V.L. Barth, et al. International Journal of Educational Research 95 (2019) 1-12 a limited period of time to complete the assignment; 3. Since events also occur just once in situ, they would only view the video case one time. The dependent variables noticing and reasoning were assessed through video-based case analysis with an open-ended question (Barth, 2017). The video prompt was an eight-minute scripted video about coping with classroom disruptions . It was similar to the problem cases used in the training program but displayed a different class and a different teacher. The participants had 40 min to both watch the video clip and then describe what they noticed and how they reasoned about the classroom situation: "In the video you have seen some

classroom disruptions. Describe all important aspects of the lesson which have led to the unfavorable course of action. Please justify your choice (if possible, in a theoretically well-founded way) and give for each aspect an example from the teaching situation shown."
We used qualitative structuring content analysis (Mayring, 2014) to analyze the data. In a first step, the two assessment dimensions noticing and reasoning were defined: -Noticing: Descriptions of behavior of the teacher or the students -Reasoning: Evaluations of observed incidents With the help of four experts in classroom management, we then deductively developed categories to score the participants answers, which were 184 different behavioral characteristics for noticing (e.g., teacher overlooks that a student raised his hand), and 63 theoretical references which are relevant to reasoning about classroom disruptions (e.g. "the monitoring of the teacher is insufficient", coded as withitness, Kounin, 1970). The number of correctly described behavioral characteristics formed the score in noticing, and the number of correctly described theoretical concepts formed the score in reasoning. Eventually, we developed a coding manual containing a definition, anchor samples, and coding rules for each category (Mayring, 2014, p. 95). In a three-day workshop, eight coders (preservice teachers who had attended the seminar previously and earned high grades in the final exams) were trained intensively to apply the coding manual to all given answers. To calculate inter-coder reliability (Cohen's kappa κ), 10% of the sample (n = 20) was double-coded. A mean Cohen's kappa κ of 0.72 indicated satisfactory inter-coder-reliability (vgl. Fleiss & Cohen, 1973).
As was to be expected, professional vision and knowledge revealed medium-significant correlations (r = .34 and r = .44 for noticing and reasoning, respectively, see Table 1), the significant correlation of noticing and reasoning was strong (r = .65). Age and sex were not meaningfully related to the assessed outcomes (-0.18 > r < 0.15).

Results
Item difficulties of the knowledge MC test were balanced with a minimum of 30% and a maximum of 90% correct answers (Md = 57%, M = 61%). In the video analyses, participants wrote down about 11 correct incidents they had noticed, varying between zero and 27, and listed about five correct different theoretical reasons, varying between zero and 19 (Table 2).
We applied Multivariate Analysis of Variance (MANOVA) with the SPSS24 software to test group differences; the alpha level was set at p < .05. To test between-subjects effects, we used the Bonferroni method; the level of significance was corrected to account for the multiple tests (alpha level was set at p < .02). Rejection or confirmation of the hypotheses was oriented towards the critical alpha level (that is, statistical significance). However, we also reported effect size partial eta square η p 2 as an index for the practical relevance of the effects (Richardson, 2011), which is judged to be small at .01, medium at .06, and large at .14 (Cohen, 1988). Using Pillai's trace, there was a significant effect of the treatment on knowledge, noticing, and reasoning, V = 0.053, F(3, 190) = 3.542, p = .016, partial η 2 = .053.

. Sample and study design
In Study 2, the treatment was identical for the whole sample and was like that described for the EG in Study 1. Yet, we ran two evaluation conditions. Because competence testing may lead to pretest effects and thus limit internal validity, we decided to experimentally vary the survey design: Group A filled out questionnaires before and after intervention (pre-post comparison). Group B served as a control group and filled out the questionnaires only in post-testing.
Again, participants were preservice secondary teachers in their first year of a master's program (N = 179), with eight study groups that were randomly assigned to either group A (n A = 59) or group B (n B = 120). There was one university teacher for all groups, so the eight study groups took place at different times on different days. After the students were assigned to the individual groups, some students had problems planning their semester courses, so some had to switch to a course on another day. Some study groups therefore had much larger numbers of attendees (e.g., Wednesday courses) than others (e.g., Friday courses), leading to the unequal sample sizes of group A and group B. The groups were comparable with regard to their sex (59% female) and cognitive abilities (assessed by a short form of the raven advanced progressive matrices test by Arthur and Day 1994, with 8.52 out of 12 possible answers). However, the medium age was higher in group A than in group B (M A = 28.48, SD = 6.59, M B = 26.01, SD = 4.13, F[1,169] = 9.006, p = .003).

Measures and data analyses 7.1.2.1. Revised multiple-choice test on classroom management knowledge (MC-test-R).
We again used the MC test described above, but in a revised version. First, based on an item analysis regarding the item difficulty, we selected 18 (partly modified) items (instead of 20 items previously). Second, we examined the distractors that worked best and reduced the possible answers (from six previously) to five. Cronbach's alpha was satisfactory at α T1 = .60 and α T2 = .66.

Video-based case analysis on professional vision.
The overall procedure to assess noticing and reasoning was identical to the procedure in Study 1. A mean Cohen's kappa κ of 0.74 (n = 18) indicated satisfactory inter-coder reliability.
Correlations of knowledge and professional vision were close to zero in pre-testing and approximately medium in post-testing (r = .27 and r = .35 for noticing and reasoning, respectively, see Table 1). Noticing and reasoning revealed strong correlations in both (r T1 = .47, r T2 = .56). We collected further data in post-testing to check divergent validity of the three dependent variables with cognitive abilities (Arthur & Day, 1994) and found small correlations that were about even for knowledge (r = 0.24) and reasoning (r = .23) and slightly lower for noticing (r = .19), as was to be expected. Again, the outcomes were not meaningfully related to the participants' sex or age (-0.18 > r < 0.15).

Results
In a first step, we used data from the experimental design to test whether pre-testing (group A) influenced performance in posttesting by comparison to group B. ANOVAs revealed no group differences in post-testing for knowledge ( These results were similar and still non-significant when controlling for age, which differed between the two groups. Consequently, pretesting did not significantly affect scoring in post-testing, and pre-post comparison may be a valid indicator for potential improvement in the studied variables. Thus, we used the data from group A for subsequent change analyses. Note. EG -experimental group with instructed knowledge acquisition, CG -control group with self-study phase for knowledge acquisition. Knowledge -number of correct answers in MC-test in %, Noticing -number of mentioned events, reasoning -number of mentioned theory-based reasons. V.L. Barth, et al. International Journal of Educational Research 95 (2019) 1-12 In both pre-and post-testing, item difficulties of the multiple-choice knowledge test were balanced (percent of correct answers T1: Min = 14%, Max = 92%, Md = 50%; T2: Min = 32%, Max = 93%, Md = 70%) and total scores varied (T1: Min = 15%, Max = 80%; M = 51%; T2: Min = 6%, Max = 100%, M = 72%). In the video analyses, participants noticed between four and 35 aspects in pretesting (M = 11.24) and between three and 22 aspects in post-testing (M = 11.44). In average, participants named about three different theoretical reasons at pre-testing (Min = 0, Max = 10) and about five in post-testing (Min = 0, Max = 13).

Discussion
Professional vision is a prerequisite for successful teaching (Sherin, 2001) that can be addressed in training at the university level (Gold et al., 2013). But which learning settings are best for preservice teachers with little classroom experience? The present study sought to both implement a training program to foster preservice teachers' professional vision of critical incidents in classroom and to further study the impact of self-directed learning vs. direct instruction of the relevant material in a problem-based learning environment on three cognitive outcomes: knowledge on classroom management, noticing critical incidents in the classroom, and knowledge-based reasoning.
Indeed, the preservice teachers who participated in the newly developed training program enhanced their knowledge and also improved their ability to apply this knowledge in a simulated teaching situation (the video) through knowledge-based reasoning, as could be shown in Study 2. In addition, comparing the two different conditions of the PBL setting in Study 1 revealed that the preservice teachers benefited from our target-group-specific adaptation of the self-study phase: The experimental group, which received a systematic introduction to the relevant knowledge base, showed statistically significantly higher declarative knowledge and knowledge-based reasoning than the control group, which acquired the knowledge in a self-directed way. Thus, we were able to achieve our goal of offering preservice teachers a PBL training program in classroom management that facilitated declarative knowledge, theory-based reasoning, and-as demonstrated in Kumschick et al. (2017)-teacher self-efficacy in a motivating learning environment. Given the outlined challenges to offering suitable learning opportunities, the results reported here are particularly encouraging: we identified small to large effects even though the training consisted of just four sessions. In contrast, the majority of teacher education programs in classroom management and related skills that lead to notable improvements are a full semester in length (Gold et al., 2013;Stough & Montague, 2015). With regard to this economic aspect, and because professional vision is a prerequisite for successful teaching (Sherin, 2001), we implemented the training in seminars that prepare preservice teachers for their first school internship. As classroom disruptions are a special case, it is also conceivable that one could conduct a similar videobased training program with a focus on nonverbal communication to promote classroom management skills.
The training program developed here did not measurably affect the ability of noticing, as was hypothesized: The two training groups did not differ (Study 1), and noticing did not improve over time (Study 2). One possible reason for this finding may be that the training itself did not deal specifically with noticing. While the process worksheet that guided the problem-solving cycle contained the prompt to record noticed incidents in groups, the results of this step were not discussed or evaluated in the plenum. Our hypotheses were grounded in the assumption that increased knowledge would entail an increase in noticing: various authors assume that professional vision depends on the quantity and quality of the (domain-specific) knowledge (Seidel & Stürmer, 2014;Sherin & van Es, 2009;Sherin, 2007).
Obviously, even though knowledge, knowledge-based reasoning, and noticing were substantially interrelated, further subskills seem to play an important role in the ability of noticing, such as selective attention and situation awareness, which are further determined by contextual factors such as personal experiences or beliefs and thus lead to interindividual differences in information processing (Endsley, 1995;Sherin & van Es, 2009). A second explanation could refer to the video prompts used: They may have been too easy (in terms of test administration) to be able to assess changes or group differences (ceiling effects). Due to the scripted nature Note. Knowledge -number of correct answers in MC-test-R in %, Noticing -number of mentioned events, reasoning -number of mentioned theorybased reasons.
of the video prompts, the classroom management issues were particularly salient. In a real setting, simultaneity and complexity of events are probably even greater. Moreover, the preservice teachers already had classroom management knowledge pre-intervention (about 50% correct answers in the MC-test-R, see Study 2). Looking at the results as a whole, recognizing relevant events may have been relatively easy for our participants, and the (differences in) knowledge acquisition may have been too small to substantially affect the ability to notice. These two possible explanations may provide interpretations of our results, particularly our finding that the training did not substantially affect noticing, in contrast to the results of other training studies (Santagata et al., 2007;Star & Strickland, 2008;van Es & Sherin, 2008).

Strengths and limitations
This study points to a number of both practical and theoretical issues that are important to teacher educators who are responsible for training preservice teachers. We implemented PBL in teacher education in classroom management and took a critical view of its potential to foster knowledge and professional vision. We therefore carried out two experimental studies that incorporated both quantitative and qualitative measures. In addition, we addressed Kirschner et al.'s (2006) critique that PBL should be adapted to the developmental needs of the learners, and is usually studied as a whole, instead of systematically varying single components of it (see Sweller et al., 2007). Still, several shortcomings may limit the generalizability and scope of our findings. A first limitation is due to the research design applied here. In order to guarantee internal validity, we ran two consecutive studies in which we only varied a single aspect between the two groups: in Study 1, this was a single element of the learning environment, and in Study 2, it was the evaluation design to provide a valid assessment of change. However, this approach does not allow us to draw inferences about the (primarily) PBL nature of the treatment. We did not have a "real" control group such as a traditional PBL treatment in Study 1, or a further control group with a different treatment in Study 2. In addition, for curricular reasons, it was not possible to apply a pretest measure in Study 1 to determine the participants' preconditions. We therefore measured the effectiveness of the training only unspecifically. In future studies, to safely exclude exercise effects, a parallel test form should be created or a highly correlated measure of knowledge on classroom management should be applied. Although it seems valuable to further implement research designs like the one used here, we justified our decision with regard to internal validity. Still, the aforementioned flaws impede compatibility with previous research on PBL and the significance for the construction of suitable learning environments.
A second limitation should be noted with regard to the measurement occasions and the chosen instruments. First, research has demonstrated that participants in traditional PBL gain slightly less knowledge, but remember more of the acquired knowledge (Dochy et al., 2003). Due to accessibility (the preservice teachers finished their university exams just a few months later), we unfortunately had no opportunity to conduct a follow-up survey to assess the retention of the acquired knowledge and examine long-term effects on professional vision. Second, there were no established or available instruments available that could have been applied to measure our dependent variables accurately. Thus, the implemented measures were all newly developed. While correlations between the assessed constructs and external criteria were as expected, the reliability of the scores of the MC-tests were at the lower end and the assessment of professional vision was only qualitative in nature. Our attempt to quantify and condense the results of the video analyses may not have fully utilized the potential of the data and may also have limited the validity of the scores on professional vision. Especially with regard to knowledge-based reasoning the two theoretical facets, interpreting of observed incidents and developing appropriate strategies, were not measured and analyzed separately. In a future study, both facets should be gathered separately in order to obtain more information about students' competences. Moreover, the mere counting of mentioned incidents as an indicator for noticing probably did not cover this skill optimally, as it did not take into account selective attention and the relevance of incidents mentioned (Sherin, 2007). With regard to the latter, a criterion-oriented reference standard (e.g., an expert rating) would be desirable. It might also have been valuable to analyze the data with respect to content, e.g., whether participants in the CG overlooked or mentioned different events than the EG, or whether mentioned events became more relevant over time (see, e.g., Wolff et al., 2016). Although it goes beyond the scope of our study, it might have been useful to analyze the output from the process worksheets as well, particularly regarding the quantity and quality of generated alternative teacher strategies (see, e.g., Santagata & Angelici, 2010), which will be incorporated into our future research.
Finally, there may have been other reasons than the direct instruction in the EG that may have caused (no) group differences in Study 1. For example, CG students might have needed more time to become familiar with the literature and probably also may have wanted to have the opportunity to add literature they researched on their own. Although we tried to keep context features of the two groups as comparable as possible, which included keeping an even workload, we cannot rule out the possibility that other factors may have affected our dependent variables.

Conclusion and directions for future research
The current results provide support for the effectiveness of our training program, which was based on PBL and adapted to preservice teachers through elements of direct instruction. This is a first step towards systematic research on PBL in teacher education and instructional design. We demonstrated that it is worthwhile to explore the composition of learning environments in detail-in contrast to findings from previous research that focused on the comparison of entire instructional approaches, that is, PBL versus direct instruction curricula. It seems promising to keep employing experimental methodologies to gather empirical evidence and further elaborate on the impact of single elements of instructional settings (such as the amount of scaffolding, the composition of process worksheets, the problem design etc., see Kirschner et al., 2006) for different target groups and different target variables. An innovative direction for improving the learning setting would be to further broaden sources of information for the problem-solving process, such as teacher or student interviews, which could add relevant information to appraise the classroom interactions and thus improve preservice teachers' emotional regulation, as well as their reflection and interpretation of upcoming classroom events.
To foster preservice teachers' professional vision of critical incidents in the classroom at the university level, more extensive training programs would also be viable. Curricula that combine the PBL cycle with a coherent set of opportunities to learn (e.g., in combination with microteaching or "video clubs" (Sherin & van Es, 2009;van Es, 2009), where students can start to explore the newly learned strategies in a shared environment under conditions of reduced complexity, Piwowar et al., 2013) could be studied in more detail. This should also include addressing a variety of other than cognitive outcomes, such as emotional regulation and performance in (simulated) teaching scenarios.