Learning With Concept and Knowledge Maps: A Meta-Analysis

This meta-analysis reviews experimental and quasi-experimental studies in which students learned by constructing, modifying, or viewing node-link diagrams. Following an exhaustive search for studies meeting specified design criteria, 67 standardized mean difference effect sizes were extracted from 55 studies involving 5,818 participants. Students at levels ranging from Grade 4 to postsecondary used concept maps to learn in domains such as science, psychology, statistics, and nursing. Posttests measured recall and transfer. Across several instructional conditions, settings, and methodological features, the use of concept maps was associated with increased knowledge retention. Mean effect sizes varied from small to large depending on how concept maps were used and on the type of comparison treatment. Significant heterogeneity was found in most subsets.

Concept maps (Novak & Gowin, 1984) and knowledge maps (O'Donnell, Dansereau, & Hall, 2002) are diagrams that represent ideas as node-link assemblies. They are often used as media for constructive learning activities and as communication aids in lectures, study materials, and collaborative learning (Cañas et al., 2003). Over the last two decades there has been significant interest among educational researchers in the instructional use of node-link diagrams. Figure 1 shows that the number of publications referring to concept maps, knowledge maps, or node-link maps has greatly expanded since 1985. Through selective searches of the ERIC and PsycINFO databases, we estimate that more than 500 peer-reviewed articles, most published since 1997, have made substantial reference to the educational application of concept or knowledge maps.
The term graphic organizer commonly describes two-dimensional visual knowledge representations, including flowcharts, timelines, and tables, that show relationships among concepts or processes by means of spatial position, connecting lines, and intersecting figures (Alvermann, 1981;Ives & Hoy, 2003;Winn, 1991). As described by Estes, Mills, and Barron (1969), graphic organizers were conceived as an entailment of Ausubel's theory of meaningful learning, according to which learners actively subsume new concepts within preexisting, superordinate cognitive structures (Ausubel, 1968). Graphic organizers were first designed to function as advance organizers, priming students for learning by activating prior knowledge and illustrating its relationship with new concepts (Hawk, 1986).

The decrease in results from ERIC after 2003 reflects a change in the indexing policies of that database.
A concept map can be regarded as a type of graphic organizer that is distinguished by the use of labeled nodes denoting concepts and links denoting relationships among concepts. The links in a concept map may be labeled or unlabeled, directional or nondirectional. Although the modern flourishing of node-link diagrams is often viewed as an entailment of Quillian's semantic networks (Quillian, 1967), similar diagrams have been used for communication and learning since at least the 13th century (Sowa, 2000). Dansereau and colleagues (e.g., Hall, Dansereau, & Skaggs, 1992;Lambiotte & Dansereau, 1992;McCagg & Dansereau, 1991) used the term knowledge map to refer to a concept map with directional links labeled by a fixed vocabulary of symbols such as P (part) or C (characteristic). Throughout this review, the terms concept map and map refer to node-link diagrams, including knowledge maps. Figure 2 was adapted from a concept map created by an undergraduate student in an educational psychology course taught by Nesbit. For the most part, it follows a radial tree structure in which each node, except the central root node, has one incoming link. Teachers and students also use concept maps that differ considerably from radial trees in graphical layout and numbers of incoming links.
Researchers have investigated many different ways of using concept maps. One branch of research, pioneered by Novak and colleagues, has examined student construction or modification of concept maps. This research includes studies in which students worked individually or in groups to construct maps using information sources such as lectures or printed materials. Another significant research branch, conducted mainly by Dansereau and colleagues using knowledge maps, has investigated the cognitive effects of studying or communicating with preconstructed maps. This research uses maps as advance organizers, collaboration tools, and stand-alone information sources for individual learning.
In recent years, scholars have suggested that concept mapping may be especially effective in interactive software environments (Cañas et al., 2003;Novak, 1998Novak, , 2002. Software can reduce many of the mechanical obstacles to editing complex maps, provide feedback on the correctness of student-constructed maps (Chang, Sung, & Chen, 2002), and present maps in learner-controlled, animated formats that guide learners through visually complex structures (Nesbit & Adesope, 2005).

Meta-Analyses of Graphic Organizer and Concept Map Research
Because graphic organizers and concept maps have some shared features and are used in similar ways, graphic organizer research is relevant to analyzing the effects of concept maps. In a meta-analysis of graphic organizer research, Moore and Readence (1984) found that initiating learning activities with graphic organizers led to small, positive effects on comprehension; their use as activity-closing summaries produced somewhat larger effects. Moore and Readence cautiously attributed this difference to greater "student involvement" (p. 15) in activity-closing uses. Although Moore and Readence did not obtain separate effect sizes for preconstructed organizers and student-constructed organizers, activity summarizing more often used student-constructed organizers. Horton et al. (1993) conducted a meta-analysis of 18 classroom-based concept map studies. They reported (over 14 studies) that concept mapping by students raised posttest achievement scores by a mean of .42 standard deviations. The mean effect size for 3 studies using teacher-prepared maps was .59. Concept mapping by FIGURE 2. Portion of a concept map on bullying constructed by a university undergraduate while studying educational psychology. Adapted with permission from Melissa Arcari. students in groups (2 studies) produced a mean effect size of .88. Four studies that measured attitudinal effects demonstrated a mean effect size of 1.57. Five of the studies analyzed by Horton et al. used comparison treatments in which no valid instruction was provided; the remainder provided some form of "conventional or didactic type of instruction" (p. 104) as a comparison treatment.

Hypothesized Cognitive and Self-Regulatory Effects
Several explanations have been advanced to account for the beneficial effects of concept maps. Some explanations point to unique, intrinsic properties of the concept map medium. Others point to properties that concept maps share with other summary formats, such as outlines and lists. Building on these explanations, theorists have also proposed interactions between the properties of concept maps and individual differences among learners. In this section we review the theoretical background of the research questions that guided the meta-analysis.

Dual Coding and Conjoint Retention
Viewing or constructing concept maps in conjunction with semantically equivalent text or spoken presentations may facilitate cognitive representation of the information in both verbal and visuospatial memory. According to Paivio's dual coding theory (Paivio, 1986), verbal knowledge and mental images reside in separate but potentially interlinked memory codes. Links between verbal and visuospatial codes provide additional retrieval paths for both types of information. Furthermore, because verbal and visuospatial memories draw from different cognitive resources, simultaneous verbal and visuospatial processing can be efficiently performed (Baddeley, 1992). Dual coding theory has been used to explain enhanced retention and transfer when learners study pictures accompanied by speech or text (Mayer, 2001). Images can, however, be cognitively generated from verbal information (Denis & Cocude, 1992). Therefore, dual coding does not necessarily require both pictorial and verbal input; and, in some situations, presenting only verbal information may be as effective as presenting both forms.
Theories of how students learn from geographic maps may help us to understand the effects of concept maps. It is well established that introducing a geographic map as an adjunct to verbal information presented as text (or speech) increases recall of information referenced in both the map and the verbal presentations (Diana & Webb, 1997;Griffin & Robinson, 2005;Stock et al., 1995). Of particular interest, however, is whether geographic maps are more effective as supplementary materials than lists and other text formats and, if so, what specific characteristics and usages of the maps contribute to the enhancement. One replicated finding is that recall of facts about places described in a speech passage is greater when presentation of the passage is preceded by a relevant geographic map rather than a verbal description of the map (Stock et al., 1995;Winn, 1991). Kulhavy's conjoint retention theory, an extension of dual coding theory, postulates that visuospatial memory encodes local features, such as landmarks, mimetic icons, and symbols, as well as structural information representing the spatial configuration of features, the relative distances between features, and the positions of features relative to the map boundary (Verdi & Kulhavy, 2002). According to conjoint retention theory, feature information and structural information both facilitate the coding and later recall of accompanying texts. Visual features aid later recall if they are visually distinct and drawn in such a way that they activate prior knowledge about the represented place (e.g., are mimetic). Structural knowledge establishes a spatial frame that references visual features and verbal knowledge to enable efficient, spatially indexed memory searches. Recent research (Griffin & Robinson, 2000 casts doubt on the role of structural information in aiding recall of text information and suggests that the benefits of presenting geographic maps might be entirely attributable to the knowledge-activating properties of localized features. In this view, consistent with dual coding theory but not with conjoint retention theory, supplementary material consisting of a list of labeled icons representing local features would be just as effective as a map showing the geographic distribution of the labeled icons.
Dual coding theory suggests that concept maps can facilitate learning if they incorporate labeled nodes drawn as distinct, mimetic icons-pictures that help the learner to retrieve prior knowledge about the concept and code the concept as an image. Although concept mapping software is available that allows map authors to represent each node by a labeled image, none of the research reviewed in this meta-analysis used maps incorporating mimetic icons. Conjoint retention theory, admittedly applied outside the domain for which it was conceived, suggests that concept maps may facilitate learning by enabling the learner to code a spatial frame for indexing and efficiently retrieving concepts.

Verbal Coding
Standing as a distinct alternative to dual coding and conjoint retention is the notion that the graphical conventions of concept maps are coded more like texts than pictures. In this view, any visuospatial coding of the map image in long-term memory is less important than the processes by which the map image facilitates verbal coding. There are several plausible means by which concept maps might be more effective than text in facilitating verbal coding.
In maps, a concept is represented by a single node regardless of how many relationships it has with other concepts. That is, maps visually integrate propositions dealing with the same concept. In contrast, a concept may be represented at several places scattered throughout a text passage, and it may be represented by different words. Larkin and Simon (1987) described how diagrams, in comparison with text, can offer more efficient support to comprehension and problem solving. Maps may lower the cognitive load needed to add new associations to those already linked with previously encountered concepts by allowing a more efficient visual search than text passages, a more efficient search of long-term memory, or both.
Maps may facilitate verbal coding by co-locating concepts that have similar meanings or that are subsumed by the same higher-order concept, thus signaling the information's macrostructure. Placement of nodes may reduce cognitive load by reducing the visual or memory search required to distinguish or associate similar concepts. Winn (1991) reviewed research suggesting that pre-attentive visual processing of diagrams, such as visual chunking of collocated objects, lends efficiencies that cannot be obtained from text. The pre-attentive visual processes draw from cognitive resources that do not interfere with those required for attentional processing. A similar advantage may be obtained when concept maps use distinctive shapes and colors for nodes representing different types of concepts. Indeed, there is evidence that learning is enhanced by studying maps in which meaning is signaled by node proximity, shape, and color (Wallace, West, Ware, & Dansereau, 1998;Wiegmann, Dansereau, & McCagg, 1992). Weinstein and Mayer (1986) described the use of concept maps (they called it "networking") as an organizational strategy for complex learning tasks such as comprehending text. Students who have learned to read or construct concept maps may be better able to identify the internal connections among concepts presented in text. The act of translating information from a text format to a node-link format may require that learners process meaning more deeply than they normally do when reading text or listening to a lecture. According to this view, learners benefit from receiving information in a text format and converting it to a map format, or vice versa. For example, as learners construct a map from a text passage, there may be advantages in having them group nodes spatially according to semantic similarity, because in doing so they must make decisions about information structure that is latent in the text. In constructing a hierarchical concept map, learners must judge the relative inclusivity or specificity of concepts, a process that demands cognitive engagement (Novak & Gowin, 1984). Because it is often possible to convert between map and text with a fairly simple and automatic procedure, greater benefit may accrue if the learner is required to translate from the relationship terms found in text passages to a restricted, standardized vocabulary for node labels, such as those used in knowledge maps (Holley & Dansereau, 1984).

Learning Strategies
Concept maps may enhance learning when they are used to summarize information. There is a great deal of evidence that creating or studying summaries boosts recall of summarized ideas (Foos, 1995). Compared with prose-form summaries, concept maps may be reviewed more quickly, allowing research participants to complete more passes through the presented information in a fixed amount of time.
In this respect concept maps may be similar to other concise summary formats such as lists and indented outlines (O'Donnell, Dansereau, & Hall, 2002). If concept maps are inherently biased toward the representation of summary information, it may be that they are particularly good for acquiring main ideas, but poor for acquiring detailed, nuance-laden knowledge.
Unlike prose, concept maps have no conventional reading order and may thereby encourage a range of deep learning strategies that depart from the surface strategy of repeated reading. For example, when deciding which node to process first, a student may decide to select the most important or most central concept. The act of judging concept importance requires deeper processing than the student might normally exercise when reading text (Novak & Gowin, 1984). In general, processing the meaning of concept maps may be a less routinized cognitive activity than reading text passages, and therefore more likely to trigger metacognitive engagement. Learning to construct or read concept maps may increase students' ability to construct knowledge from information resources even when they are performing tasks that do not involve concept maps (Chmielewski & Dansereau, 1998). It may be that students who have worked with concept maps are more likely to explicitly identify concepts found in text, and the relations among them.
Learning to work with concept maps may help students to parse the meaning of text and other information sources. Unlike outlines, lists, and other graphical organizers, concept maps are built from concept-relationship-concept triplets that constitute complete propositions. Therefore, to construct concept maps from a text, students must more thoroughly and precisely extract the text's meaning.

Individual Differences
There is some evidence that low-ability students, perhaps specifically those with low verbal ability, obtain greater benefit from instructional diagrams than do high-ability learners (Holliday, Brunner, & Donais, 1977;Moyer, Sowder, Threadgill-Sowder, & Moyer, 1984;Stensvold & Wilson, 1990). Several theorists (O'Donnell, Dansereau, & Hall, 2002;Patterson, Dansereau, & Wiegmann, 1993;Lambiotte & Dansereau, 1992) have hypothesized that students with low verbal ability can more easily understand and construct concept maps than they can decipher and write scholarly text, and that these students may benefit more from the use of concept maps than students with high verbal ability. The format of concept maps, specifically the use of brief labels and simple node-link-node syntax to represent propositions, may be more easily comprehended and constructed by learners who have lower verbal ability. Compared with the language presented in textbooks, concept maps offer a relatively regular and simple syntax. Maps may also be easier to comprehend for learners studying in a second language (Amer, 1994). Lambiotte and Dansereau (1992) proposed that students with low prior knowledge benefit more from concept maps than those with high prior knowledge. Citing Mayer's assimilation theory (Mayer, 1979), they hypothesized that the specific macrostructure signaled by a map might guide the knowledge construction of less knowledgeable students but conflict with the cognitive structures already established in more knowledgeable students.

Collaborative and Cooperative Learning
Preconstructed concept maps have been used in a variety of small group learning activities such as scripted cooperation (Rewey, Dansereau, Dees, Skaggs, & Pitre, 1992) and peer teaching (Patterson, Dansereau, & Wiegmann, 1993). Analyses of student interactions during collaborative concept mapping in science education have demonstrated that this activity can sustain meaningful discourse and co-construction of key concepts (Roth & Roychoudhury, 1993Stoyanova & Kommers, 2002; van Boxtel, van der Linden, Roelofs, & Erkens, 2002). For example, van Boxtel et al. found that secondary science students who constructed maps in dyads uttered approximately three on-topic propositions per minute, with similar levels of participation from both members of the dyad.
Concept maps seem to suit collaborative and cooperative learning because, like lists and outlines, they make economical use of text and can be written with letters that are large enough to be viewed by a small group. When drawn on large paper sheets or whiteboards, concept maps can often be extended with less need for reorganization and erasure than lists and outlines. Because semantic dependencies are more explicitly represented in concept maps than in text formats, they may be more amenable to concurrent editing in which different group members simultaneously modify the product. As noted by van Boxtel et al. (2002), concept mapping does not require detailed writing activities that can take time away from discussion of substantive concepts.

Research Questions
The goal of the meta-analysis was to review all experimental and quasiexperimental studies on the learning effects of concepts maps that met specific methodological criteria. A reexamination of this research base is due because many concept map studies have appeared since 1993, the year in which the metaanalysis by Horton et al. was published. About half the studies coded in the present meta-analysis were published during or after 1993. Unlike Horton et al., we included laboratory studies in which the learning activities were not part of a formal course of study or in which the learning outcomes were not assessed for academic credit. By expanding the scope to include laboratory research, we added many experiments that had participants study maps rather than construct them.
Recognizing the considerable diversity in the concept map research base, and the plurality of the research questions that it addresses, our goal was to estimate the effects of specific approaches to using maps and to qualify those effects according to the conditions under which they were investigated. Our initial intent was to include and categorize any learning-related outcomes from this literature that could be aggregated for meta-analytic treatment, potentially including knowledge comprehension, retention, and transfer, as well as changes in learning skills, motivation, and attitudes.
The meta-analysis addressed the following research questions: • What are the effects of learning activities in which learners construct or modify maps in comparison with other, nonmapping learning activities? • What are the effects of studying maps in comparison with studying other materials such as text passages, outlines, and lists? • How do these effects vary when maps are used in different knowledge domains, educational levels, and instructional designs? • How does the use of concept maps affect constructs such as central knowledge, detailed knowledge, knowledge transfer, learning skills, and attitudes toward learning? • What are the effects of using maps in cooperative and collaborative learning? • How do different levels of verbal ability and prior knowledge affect learning from concept maps? • How are concept map effect sizes conditioned by methodological features of the research?

Method
Study Selection Criteria Following a preliminary examination of empirical studies and reviews, we formed criteria that would capture research designed to assess the educational and learning effects of concept maps. We included in the meta-analysis studies that (a) contrasted the effects of map study, construction, or manipulation with the effects of other learning activities; (b) measured cognitive or motivational outcomes such as recall, problem-solving transfer, learning skills, interest, and attitude; (c) reported sufficient data to allow an estimate of standardized mean difference effect size; (d) assigned participants to groups prior to differing treatments; (e) randomly assigned participants to groups, or used a pretest or other prior variable correlated with outcome to control for preexisting differences among groups. Studies reporting a pretest effect size outside the range −.40 < d < .40 were excluded from the meta-analysis. When a study was reported in more than one source (e.g., dissertation and journal article), the version published in a journal article was used for coding. Reports available in languages other than English were considered. Specifically, two studies written in Japanese (Minagawa, 1999;Takumi, 2001) were translated into English but were not coded because they did not meet criteria (a) and (b), above.
Implicit in the first criterion (a) is that sufficient information must have been supplied about comparison treatments to ensure that they constituted legitimate and reasonable learning activities. More generally, both comparison and experimental treatments must have been designed to promote learning. Effect sizes were excluded if the authors stated that one of the treatments was intentionally designed to be confusing or difficult to learn from. For example, we excluded one finding from a study by Blankenship and Dansereau (2000, p. 297) in which the treatment presented a map intentionally designed to have "poor structural properties."

Search, Retrieval, and Selection of Studies
On May 4, 2005, we searched six databases using the query concept map* OR knowledge map* OR node-link map*. The databases and number of studies returned by each database (in parentheses) were ERIC (847), Web of Science (564), PsycINFO empirical studies (396), PsycARTICLES (397), Academic Search Elite (281), and Dissertation Abstracts (170). We searched titles and abstracts of presentations at annual meetings of the American Educational Research Association and the National Association for Research in Science Teaching held after 1990. Finally, we searched the reference sections of a few comprehensive review papers (Cañas et al., 2003;Horton et al., 1993;Novak, 1990aNovak, , 1990bO'Donnell, Dansereau, & Hall, 2002).
In the selection phase, one researcher read the abstract or online text of each study found in the search. If the abstract did not provide sufficient information to exclude the study according to our selection criteria, the researcher scanned the methods, procedure, and data collection parts of the paper to retain or exclude the paper. Borderline cases were retained for full text inspection. For each thesis found through the Dissertation Abstracts database, the researcher read the first 24 pages to determine eligibility for inclusion. Studies identified as not meeting the selection criteria were eliminated, resulting in a list of 122 studies for which full text copies were obtained.

Coding Study Characteristics and Effect Sizes
Two researchers independently read each study retained in the previous phase to (a) eliminate those found not to meet the selection criteria, (b) select group comparisons consistent with the research questions of the meta-analysis, and (c) code each comparison according to a predefined coding form and coding instructions. The coding form consisted of 26 menu items and 38 brief comment items. The items included source (e.g., journal or dissertation), grade level, participant gender, setting and task, structure of adjunct materials, type of student interaction, content domain, country, how the map was used and designed, duration of study and treatment, comparison treatment, participant attrition, control for pretreatment differences (e.g., random assignment), and outcome construct.
The coders also rated treatment fidelity, that is, the thoroughness with which the treatment conditions were applied and monitored. This included consideration of whether participants were properly trained in the treatment activities, and to what extent their engagement in learning activities was monitored. For example, in a concept mapping study, a high treatment fidelity rating could be assigned if researchers observed the participants while they were constructing maps. A medium rating could be assigned if mapping activities were not observed but the maps were later assessed by the researchers. A low rating could be assigned if participants were asked to construct maps but there was no monitoring or assessment.

Effect Size Extraction
One important principle of meta-analysis is to avoid entering effect sizes that are statistically dependent (Lipsey & Wilson, 2001). For example, if a study has one control group and two treatment groups exposed to different types of maps, it is inappropriate to enter the two map-versus-control findings as different effect sizes because they share the same control group. Doing so would inflate the overall weight attributed to the study by counting the control group twice. We devised a coding scheme that would allow us to preserve interesting but statistically dependent comparisons from the same study. By using the coding scheme we avoided inappropriately combining statistically dependent comparisons in calculating average effect sizes, while still using them in separate analyses.
When repeated outcomes were reported (e.g., immediate and delayed achievement tests), only the later outcome was used. When data from multiple experimental or comparison treatments were reported, and the distinctions among the treatments did not address the research questions or could not be aligned with differing treatments from other studies, the weighted average of the multiple treatment groups was coded.
Using group means and standard deviations, a standardized mean difference effect size (Cohen's d) was obtained for each finding: where X e is the mean of the (experimental) group using concept maps, X c is the mean of the comparison group, and s Pooled is their pooled standard deviation. Cohen's d was then used to calculate Hedge's unbiased estimate of the standardized mean difference effect size (Hedges & Olkin, 1985, p. 81): where N is the total number of participants in the experimental and comparison groups. The inverse variance weight, important for aggregating effect sizes, was coded for each finding as where n e is the sample size of the experimental group and n c is the sample size of the comparison group (Lipsey & Wilson, 2001). As recommended by Lipsey and Wilson, the coders estimated their confidence in the data with which the effect size was calculated. The effect size confidence was rated as low, medium, or high depending on reliability of the posttest scores, pretest effect sizes, and whether sufficient data were available for an accurate calculation.

Data Analysis
So that the contribution of a finding would be commensurate with its sample size, weighted mean effect sizes were calculated as where ES i is Hedge's unbiased estimate (g) of the ith effect size, and w i is the inverse variance weight coded for ES i (Lipsey & Wilson, 2001). The standard error of the mean was calculated as To determine statistical significance, a 95% confidence interval was constructed around each weighted mean effect size. The lower limit (ES -L ) and upper limit (ES -U ) of the confidence interval were calculated as where ES is the mean effect size, 1.96 is the critical value for the z-distribution (α = .05), and SE ES -is the standard error of the mean effect size. When the lower limit of a confidence interval was greater than zero, the mean effect size was interpreted as indicating a statistically detectable result favoring the use of concept maps. One of the assumptions of the significance test is that all findings aggregated into a weighted mean effect size share the same population effect size and that the observed dispersion of effect sizes around the mean results only from the random sampling of participants from the population, that is, sampling error. This assumption was tested by the homogeneity of variance statistic When all k effect sizes comprising a mean effect size are derived from the same population effect size, Q has a chi-square distribution with k -1 degrees of freedom. When Q exceeded the critical value of the chi-square distribution (p < .05), the mean effect size was judged to be significantly heterogeneous (Lipsey & Wilson, 2001).

Results
Out of the 122 studies retained in the initial selection phase, 55 studies involving 5,818 participants were found to meet the specified selection criteria and were coded. The most frequent reason for rejecting studies was failure to control for prior differences among treatment groups. The mean intercoder agreement was 96.2%.
Six admitted studies, listed in Table 1, measured self-report outcomes that we categorized as affect (anxiety, frustration, satisfaction), self-efficacy, motivation, and perceived use of learning strategies. As in other sections of this report, the signs of effect sizes were switched so that beneficial effects of using maps (e.g., anxiety reduction) would be indicated by positive values. The studies that measured selfreport outcomes investigated disparate uses of concept maps. Considering the small number of these studies and the diverse set of treatments and outcome constructs, we decided not to obtain mean effect sizes for self-report outcomes. But we note that using concept maps was associated with positive effect sizes in all cases where self-reports were measured. The remainder of the results section deals with studies reporting performance outcomes.
Appendix A (page 441) shows a summary of the 96 effect sizes extracted for performance outcomes prior to resolving statistical dependencies. For some studies, more than one effect size was extracted. Within a study, statistically dependent and independent findings were differentiated according to whether they used separate subsamples, multiple outcome constructs for the same participants, or multiple treatment groups compared with a single control group. Effect sizes in the same study obtained from wholly separate participant subsamples were regarded as statistically independent (Lipsey & Wilson,p. 112). Effect sizes obtained from overlapping subsamples (e.g., same participants but different outcome measures) were regarded as statistically dependent. Two admitted studies measured knowledge of relationships between concepts while also assessing, in the same participants, broader conceptual knowledge presented by the treatment. For knowledge of concept relationships, Schmid and Telaro (1990) found an effect size of g = .89, and Lehman, Carter, and Kahle (1985) found an effect size of g = .28. Both effect sizes were excluded from further analysis because they were not statistically independent of the effect sizes reported for broader conceptual knowledge outcomes. The remaining outcome constructs were coded as retention, transfer, mixed retention and transfer, and learning skills.
To generate a distribution of statistically independent effect sizes, a single effect size was obtained for each set of statistically dependent effect sizes by calculating a weighted average over different outcome constructs and treatment groups. One study with an effect size of g = 5.94 (Guastello, Beasley, & Sinatra, 2000) was judged to be an outlier (z = +6.4, p < .001). Because a reexamination of the study could not attribute the exceptional effect size to methodological flaws or artifacts and because the study observed participants who had characteristics apparently similar to other samples in this analysis, the effect size was not deleted but, rather, was adjusted downward to a value (g = +2.2) slightly greater than the next-largest effect size, as recommended by Tabachnick and Fidell (2001).

Effects of Constructing and Studying Concept Maps
Participants constructed or modified maps in 25 studies, from which were derived 27 statistically independent effect sizes. Participants studied concept maps in 30 studies, from which were derived 40 statistically independent effect sizes. Students in the comparison treatment groups participated in group discussions, attended lectures, or worked with outlines, lists, or text passages. Table 2 shows the weighted mean of all statistically independent effect sizes, split according to whether maps were constructed or studied, and the geographical location of the research. The table includes the number of participants (N) in each category, the number of findings (k), the weighted mean effect size (M) and its standard error (SE), the 95% confidence interval around the mean, and the results of a test of homogeneity (Q) with its associated degrees of freedom (df ). The effects of concept maps were statistically detectable in all categories except for the averaged results of two studies conducted in Taiwan, in which students constructed maps. One of the two Taiwanese studies (Chang, Sung, & Chen, 2002), coded as having high treatment fidelity and high effect size confidence, reported statistically significant benefits from concept mapping activities. The other Taiwanese study (Chang, 1994), coded as having low treatment fidelity and medium effect size confidence, reported no significant difference.
Homogeneity was rejected for all effect size means in Table 2, indicating that they represent widely varying individual effect sizes. Heterogeneity was noticeably higher across studies in which maps were constructed rather than presented, perhaps reflecting the greater diversity of treatments and lower experimental control in concept mapping research.
The mean effect size for concept mapping studies conducted in Africa (Nigeria and Egypt) was much higher than for other locations. All but one of the African studies were conducted in Nigeria by P. A. Okebukola and his colleagues. According to Okebukola (personal communication, July 11, 2004), concept mapping offers special benefits to Nigerian students when compared with conventional Nigerian teaching methods that rely on intensive lecturing to large classes (Edukugho, 2005). Although English was the language of instruction in all of the African studies, it was a second or third language for most of the participants. Nigerian and Egyptian participants may have found the syntactic structure of concept maps easier to parse than the English lectures. Faced with rather large differences in effect size across geographical locations for the concept mapping studies, we decided to restrict our focus for the remainder of the analysis to those conducted in the United States and Canada. All research in which participants studied maps was retained because there was no evidence that the mean effect sizes for that research differed substantially according to geographic location. Table 3 shows weighted mean effect sizes for concept mapping studies split by educational level, class setting, subject (knowledge domain), and study duration. There were no laboratory studies in which participants constructed concept maps. In all but three of the findings reported in Table 3, students performed the learning activity (constructed maps or did a comparison learning activity) entirely in a classroom under the supervision of an instructor. Unlike the research in which maps were studied, research in which maps were constructed inconsistently reported treatment duration. As a proxy for variation in treatment duration, Table 3 presents an approximate median split on study duration, which was reported in all but two findings.

Educational Level, Setting, Subject, Duration, Adjunct Materials, and Map Type
All categories listed in Table 3 show statistically detectable mean effect sizes, except studies that did not report study duration. For most categories, however, the effect size distributions were significantly heterogeneous, indicating that the variability among effect sizes was greater than that expected from sampling error. In such cases, we interpret variation across categories as suggesting but not confirming theoretical interpretation. For instance, the mean effect sizes and confidence intervals across the subject categories suggest that concept mapping offers greater benefit in subject areas that are more saturated with verbal knowledge. However, the certainty of this interpretation is limited by significant heterogeneity and the small number of findings in the humanities, law, and social studies category. In the tables that follow, different ways of subdividing the data are presented to identify the source of the excess variation among studies. Table 4 shows mean effect sizes for studying concept maps split by educational level, setting, subject, treatment duration, use of adjunct materials, and map type. Almost all of these studies used undergraduate university students as participants. Table 4 were conducted in laboratory settings; that is, they used learning activities that did not contribute toward performance assessment in an academic program. An approximate median split was used to divide studies into those with treatment durations less than or greater than 1 hour. Almost all of the categories identified in Table 4 had statistically detectable mean effect sizes, but most were also significantly heterogeneous.

Most of the investigations summarized in
Most studies in Table 4 used text in adjunct or source materials for both experimental and treatment groups. For example, Wachter (1993) had an experimental group study a map before studying a text passage and had the comparison group immediately begin to study the text passage. A large minority of studies had learners in the experimental group study only a concept map and learners in the comparison group study only a nonmap information source. The moderate mean effect size in this latter category indicates that, in at least some situations, concepts maps can work effectively as stand-alone information sources. Although the majority of the studies in Table 4 used static concept maps presented on paper, a few presented animated maps or hyperlinked maps that participants could use to access hypertext. The two studies that compared presentation of animated maps with animated text (Blankenship & Dansereau, 2000;Nesbit & Adesope, 2005) obtained substantial effect sizes favoring animated concept maps. Notably, however, the hypertext studies indicated no significant advantage for hyperlinked maps in comparison with hyperlinked outlines and other navigational devices. Outcome Constructs and Test Types Appendix B (page 447) shows the outcome constructs measured by the studies. The studies were split according to outcome construct (retention, transfer, mixed retention and transfer) and test format (free recall, objective items, short answer items, mixed item types). Most research in which learners studied maps used retention measures, typically a free recall test. In contrast, the classroom-based research, in which learners constructed maps, tended to use achievement measures, typically a multiple-choice test that mixed retention items with near transfer items. Mean effect sizes were statistically detectable in all categories, but most were significantly heterogeneous.

Methodological Quality
Appendix C (page 448) shows how effect sizes varied with the methodological quality of the research and whether it was published in a journal. The studies were split according to the coders' confidence in the calculated effect size, the coders' rating of treatment fidelity, whether the study randomly assigned participants to treatments, and whether the study appeared in a journal or dissertation. Mean effect sizes were statistically detectable in all categories. Among classroom studies in which students constructed or modified maps, low or medium coder confidence in the effect size was associated with low mean effect size, and high coder confidence in the effect size was associated with high mean effect size. In the same studies, random assignment of participants to treatment conditions was associated with a high mean effect size, and nonrandom assignment was associated with a lower mean effect size. The investigations in which maps were studied does not show a similar pattern, probably because they were mainly better-controlled laboratory studies with more consistent methodological quality. Table 5 shows mean effect sizes for learning with concept maps in individual, group, and cooperative settings. For the studies in which maps were constructed, it was necessary to establish a category called mixed group and individual, in which there was a combination of group and individual learning with concept maps. These studies, which showed a large and significant mean effect size, often involved students individually constructing maps and then discussing them in a whole-class activity. The category not applicable or unknown included studies that either did not report sufficient data to determine social interactions during learning, or used group learning for the mapping group and individual learning for the control group. It would be misleading to identify studies in the mixed group and individual learning category as assessing the effectiveness of collaborative or cooperative concept mapping, because they often used relatively large class groups in which there was likely little or no contribution from many of the students, and the group interactions often consisted of reviewing maps rather than constructing them. Research in which materials were studied in individual learning settings produced a statistically detectable mean effect size favoring the use of concept maps. In contrast, there was a nonsignificant mean effect size from studies in which preconstructed materials were used in cooperative tasks. The tasks used in the latter studies were all structured as dyadic, scripted cooperation activities.

Individual, Group, and Cooperative Settings
The large mean effect size found for the mixed group and individual studies may have more to do with the comparison treatments used in these studies than the type of interpersonal interaction. As we discuss in the following section, in studies that use lectures or discussions as comparison treatments, the relatively greater engagement provided by the concept-mapping activity may be the active ingredient that produces large, positive effects.
Despite the nonsignificant mean effect size for the dyadic cooperation studies, it is too soon to conclude that concept maps are no better than other formats for use as communication aids in cooperative learning. We believe that the potential advantages of concept maps in cooperative learning are sensitive to both the nature of the task and the training of participants in cooperative methods. Tasks that require rapid communication of complex, non-hierarchically structured information are more likely to benefit from the conceptually integrated propositions represented in the concept map format. Also, students who do not have strategies for cooperative problem solving with maps may be unable to exploit their advantages. For example, in one of the studies we reviewed (Patterson, Dansereau, & Newbern, 1992), dyads learning about the interrelated physiological effects of alcohol obtained greater benefit from concepts maps when they were provided with a strategy for using them in cooperative learning (g = .65) than when no strategy was provided (g = .29).

Comparison Treatments
Effect sizes are likely to vary according to the type of comparison or control treatment. Table 6 shows that among the map construction findings, 10 were from studies that used lectures or whole-class discussions as comparison activities, and 7 were from studies that had students write text or outlines. In contrast, research that presented preconstructed maps had participants study text, outlines, or lists as a comparison activity. Statistically detectable benefits were demonstrated for studying maps rather than outlines or lists, and for constructing maps rather than reading text, attending lectures, or attending class discussions. Concept mapping appears to compare very favorably with teaching methods in which learners have diffuse responsibility for task completion (e.g., whole-class discussion), but it shows only a small advantage over other constructive tasks, such as individual notetaking or summarizing. As an effect size drops below .2 standard deviations, one may be justified in questioning its pedagogical significance and whether it might be attributed solely to experimenter bias.
The elevated effect size in studies that used lecture or discussion as the comparison treatment is important in interpreting the results of this meta-analysis because it suggests that variation in the constructive quality of the comparison task may underlie much of the effect size variation observed across other categories. In particular, there is a substantial overlap between studies that used lecture or discussion as a comparison treatment and those in which students did concept mapping in mixed group and individual modes. Of the 10 studies that used mixed group and individual concept mapping as an experimental treatment, 8 used lecture or discussion as a comparison activity. We conjecture that many of the elevated effect sizes generated by these studies are due to lower effectiveness of the comparison treatment rather than to any particular benefit of mixed group and individual concept mapping.

Individual Differences
We were unable to isolate theoretically relevant individual difference variables from the studies in which maps were constructed. However, there were a few studies that presented maps to students who were identified as relatively low or high in prior knowledge or verbal ability. Only relative, nonstandardized assessments of these individual difference variables were reported. In each of these investigations, students were categorized using a median split on tests of prior knowledge or verbal ability. Table 7 shows a statistically detectable mean effect size when maps were presented to students who had relatively low ability (either verbal abil- ity or knowledge). When examined separately, only the effect for low verbal ability was statistically detectable. Because the statistical power of these comparisons is low and the confidence intervals are wide, it cannot be concluded that maps do not benefit higher-ability students.

Retaining Central and Detail Ideas
To examine how maps affected the recall of ideas at different levels of generality, we isolated studies that separately assessed central and detail knowledge. Table 8 shows the results from six studies that presented semantically equivalent text passages and maps and measured recall of central and detail ideas by the same participants. There was a statistically detectable mean effect size for each of the two types of knowledge outcomes, and the effect sizes for central ideas was larger.
Although derived from a relatively few studies, this is an important result because it contradicts the hypothesis that concept maps are mere summary tools, achieving  gains in general knowledge only at the expense of losses in detailed knowledge. At the same time, the greater benefit accorded to central ideas indicates that concept maps may have an inherent bias toward summary.

Conclusion
The meta-analysis found that, in comparison with activities such as reading text passages, attending lectures, and participating in class discussions, concept mapping activities are more effective for attaining knowledge retention and transfer. Concept mapping was found to benefit learners across a broad range of educational levels, subject areas, and settings. Much of this benefit may be due to greater learner engagement occasioned by concept mapping in comparison with reading and listening, rather than the properties of the concept map as an information medium. There is evidence that concept mapping is slightly more effective than other constructive activities such as writing summaries and outlines. But the small size of this effect raises doubts about its authenticity and pedagogical significance. The advantages of concept mapping were more pronounced in better-designed studies, particularly those that used random assignment of participants to treatment groups. The significant heterogeneity associated with most mean effect sizes for concept mapping indicates that the benefits discussed here are somewhat unreliable and that carefully designed research is needed to better identify mediating conditions. Across educational levels, subject areas, and settings, it was found that studying concept or knowledge maps is somewhat more effective for retaining knowledge than studying text passages, lists, and outlines. This effect was especially strong in two studies that used animated maps but absent for hyperlinked maps. The benefits of using preconstructed maps were evident in individual learning but not in dyadic, cooperative learning. From a few studies, it appears that preconstructed maps are particularly useful as a communication medium for students with lower verbal proficiency and may offer little or no advantage to those with high verbal proficiency. Studying maps rather than text passages assists in recall of both central ideas and detail ideas, but the effect may be stronger for central ideas. There is insufficient evidence to determine whether studying concept maps is particularly efficacious for knowledge transfer and development of learning skills.
The result that studying concept maps is somewhat more effective than studying lists and outlines contradicts the hypothesis that all summary formats confer equal benefits, and is consistent with theories claiming that that concept maps lower extrinsic cognitive load by arranging nodes in two-dimensional space to represent relatedness, consolidating all references to a concept in a single symbol, and explicitly labeling links to identify relationships. The evidence that concept maps can be more effective than text passages for conveying detailed information reinforces the notion that concept maps have more to offer than the mere reduction of information.
These results help to identify gaps in the evidence and point to high-priority areas for further research. To elucidate cognitive processes, there is an immediate need for concept map research to assess learning outcomes beyond conventional free recall and researcher-constructed achievement tests. Instead, investigations should examine the processes by which students learn with concept maps and their effects on higher-level learning goals such as problem-solving transfer, application, and analysis (Anderson & Krathwohl, 2001); conceptual change (Novak, 2002); and the development of learning skills (Chang, Sung, & Chen, 2002). To explain how students learn from concept maps, research is needed that compares the use of concept maps, graphic organizers (Katayama & Robinson, 2000), and outlines.
High-quality research is needed on the use of concept maps in elementary and secondary education, especially with students learning in second languages or who are identified as having reading and language difficulties. There is a lack of research on pedagogical models for using concept maps in small group and whole-class settings. More research is also needed on the effectiveness of concept mapping as a notetaking and prewriting activity for developing reading and writing skills.
The evidence presented in this review should persuade teachers to make extensive, well-planned use of concept mapping activities and preconstructed concept maps. We found no categories or conditions in which concept maps produced significant negative effects, and, aside from the theoretical objection one might pose that frequent use of concept maps could reduce practice in reading and writing text, no potentially detrimental effects have been identified. Broadly, then, teachers and instructional designers can be encouraged to adopt concept mapping as a learning activity and to communicate ideas with preconstructed concept maps in a wide range of educational settings.