A quantitative integration of the military cohesion literature.

In this article, we review research on group cohesion in military units, using meta-analytic techniques that convert research findings to a common metric. After combining some of the 39 samples to ...

meta-analysis, integrating the results of 14 real-world work groups (of which 9 were military), found a positive relation (r = .32) between cohesion and performance. The Evans and Dion (1991) meta-analysis, which included 16 published studies (5 dealing with military groups) and was based on group-level analyses, also found a positive relation between cohesion and performance (r = .36 when uncorrected for error of measurement, with r = .42 when corrected). Another meta-analysis, conducted by Mullen and Copper (1994), included 49 studies and reported a correlation of .23 for their subset of 10 "military" studies. The latest meta-analysis of cohesion and performance, conducted by Gully, Whitney, and Devine (1995), used 46 studies (5 of which were military) and obtained an uncorrected (for error of measurement) effect size (r) of .166, with a corrected value of .199. Removing one methodologically questionable military study1 and six effect sizes that used self-report measures of productivity resulted in effect sizes of ,221 (uncorrected) and .265 (corrected).
The military believes that cohesive groups enhance the combat effectiveness (e.g., Oliver, 1990). Because there are rarely, if ever, measures of combat effectiveness available, researchers generally employ measures of combat readiness, such as Army General Testing and Evaluation Program scores or ratings by commanders. Although the meta-analyses summarized earlier included military groups, none of them exhaustively sampled that population of studies. Many military studies are not in the refereed literature because they are institutional reports or papers presented at conferences. Hence, a thorough meta-analytic examination of group cohesion in military units has not yet been accomplished, and meta-analyses dealing with group cohesion have not investigated outcomes other than performance.
The purpose of this research was to conduct a meta-analysis of the cohesion research involving military groups and to focus on outcomes valued by the military. The research questions to be explored were as follows: 1. What is the overall relation between group cohesion in military units and various outcomes of interest to the military? 2. Do these cohesion-outcome relations hold across various types of military units? That is, do military units from different branches, different countries, and so on, vary in the degree to which cohesion is related to various outcomes?
3. To what extent are different measures of group cohesion comparable across studies involving military units? Are different cohesion measures conceptually similar? Do different measures of military group cohesion yield essentially the same results, or does operationalizing the construct differently-in terms of formats, number of items, or sources of ratings-lead to different findings? 'We did not include this study (Hoiberg & Pugh, 1978) in our meta-analysis because the cohesion measure was the respondent's expectations for cohesion at the respondent's next duty station. We felt that expectations did not meet our definition of cohesion, as we have interpreted the construct. This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
4. How do the results of this research compare to the results of previous meta-analyses of group cohesion?

Identification of Pertinent Studies
The original nine military cohesion studies in Oliver's (1988a) rneta-analysis were augmented by research that had been reported since that meta-analysis was conducted and by studies that were missed in the original search. We included both published and unpublished research. To identify pertinent studies, we used computer searches, previous reviews, reference lists of studies already identified, a notice in the American Psychological Association Division of Military Psychology newsletter, and personal contacts with scholars known to have conducted research on military cohesion.
Definitions. Many researchers investigating cohesion do not explain what they mean by the constructs they are investigating. What follows are clarifications of the terms cohesion, group, outcome, and effect size.
In discussing conceptual issues involved in measuring cohesion, Oliver (1990) noted that cohesion is usually considered a "group" variable and conceptualized in some way as the "stick-togetherness" of a group. Although there are many definitions of the cohesiveness construct in the literature, most authorities have agreed that cohesion is a multidimensional construct that may include an interpersonal aspect and a task orientation aspect. Carron (1982) defined group cohesiveness as "a dynamic process that is reflected in the tendency for a group to stick together and remain united in the pursuit of its goals and objectives" (p. 124). We adopted this statement as our working definition for cohesion.
Another term that needed to be clarified was group. For this construct, we accepted a definition proposed by Hogg (1992): "A group of two or more individuals in face-to-face interactions, each aware of the others who belong to the group, and each aware of their positive interdependence as they strive to achieve mutual goals" (p. 4).
An outcome is generally regarded as a consequence. Here we define outcome as a consequence or result (associated with military units) that is of interest to the military-for example, combat readiness. Chronologically, cohesion usually precedes outcome, but we do not hold that cohesion causes the outcome, merely that the two may be associated.
Our effect size for this meta-analysis was the correlation coefficient (Pearson's r or rho), a measure of association. This statistic was usually found in our studies. be followed. The guidance to the coders on how to code the variables was intended to be as specific as possible to reduce coding uncertainties.
Coder training and coding procedure. Each study was coded for pertinent variables by two coders: (a) a research psychologist, who had conducted cohesion research and had experience in quantitative research integration; and (b) an advanced graduate student, who had participated in the analysis of military cohesion data.
Coders coded one study together, discussing various aspects of the procedure as they went through the coding process. They then coded three studies independently and met again to discuss their results. Coders coded the remainder of the studies independently, although they occasionally conferred about ambiguous situations. These decisions usually involved which variable to code as cohesion (which sometimes was a subset of another variable, such as morale), which variables to use as outcomes, and when to combine variables. After coding was completed, we reviewed the database to ensure that the data had been entered accurately. All discrepancies were resolved by discussions between the two coders.
Ratings of study quality. We felt that it was important to ascertain the quality of each study in an objective manner. Accordingly, we constructed the Quality of Study Scale, which was based on six coded variables: publication form, reporting of the reliability of the cohesion measure, standardization of the cohesion measure, percentage of outcomes for which reliability was reported, percentage of outcomes that were standardized measures, and percentage of outcomes from the same source as the cohesion measure (reverse coded). Each of the six ratings was on a 3-point scale (coded 1,2, and 3, from the lowest to highest degree of rigor or quality), with a possible summed range of 6.0 to 18.0. The Quality of Study Scale ratings ranged from 7.0 to 16.00, with a mean across the 39 samples of 11.02 and a standard deviation of 2.64. In addition to this objective measure, the coders made a subjective overall rating of the quality of each research effort on a 5-point scale ranging from 1 (very poor) to 5 (very good). The coders' ratings were averaged for each study or sample. The range of these averaged subjective ratings was 2.0 to 4.5, with a mean of 3.13 and a standard deviation of 0.65. The subjective overall ratings by the coders correlated .43 with the more objective Quality of Study Scale. Coders rated their confidence in their subjective ratings using a 3-point scale developed by Orwin and Cordray (1985), in which 3 was certain or almost certain, 2 was more likely than not, and 1 was guess. The mean confidence rating for the two coders on this variable was 2.0.

Analyses
As noted by many authorities, the lack of statistical independence among effect sizes may pose a problem for meta-analytic research integration. Lack of independ-ence may result from using the same sample to calculate more than one effect size from different measures, different samples in the same study may be used to calculate separate effect sizes for each sample, and the same researchers may conduct several different studies (Matt & Cook, 1994). In our analyses, we followed two procedures to ensure greater independence of our data: (a) We combined samples for which independence seemed a problem, and (b) we conducted separate analyses for each class of outcome measure.
Samples and composites. Effect size analyses were based on 33 samples or composite samples. We combined the results from the two studies by Goodacre (1951Goodacre ( , 1953, and we combined the studies conducted by Siebold and his colleagues (Julien & Siebold, 1989;Siebold, 1990;Siebold & Kelly, 1987, 1988a, 1988b. The Manning (1984, 1987) research included military units from two countries (United States and Israel) that were stationed in three different countries, and we treated these three groups as independent samples.
Combining outcomes. We clustered the 21 types of outcome measures enumerated on the code sheet into seven classes of outcomes. Within each study, we averaged effect sizes for outcomes falling into each of the seven different categories. The outcome categories were individual performance (individual scores on a test or an activity such as rifle practice), group performance (group as a whole rated on some measure of performance), well-being (physical or psychological), joblmilitary satisfaction, retention, readiness (operational or psychological), and indiscipline (e.g., AWOL rates). Then, we conducted a separate analysis for each of the seven outcome classes.
Weighting of effect sizes. Typically, effect sizes are weighted by the number of participants involved.* Because cohesion is a group-level construct, however, one can argue that the proper sample is the number of groups. We report the results of both weighting procedures.
Corrections. The r-to-z transformation is controversial, and we did not use it.
Some meta-analysts recommend its use before statistical manipulations involving 2Thisprocedure is in contrast to using the reciprocal ofthe sampling variance of ras recommended by Shadish and Haddock (1994) in their chapter on combining estimates of effect size. Some authorities believe using sample size is preferable to using the variance of r, although the former procedure is technically more approximate. B. J. Becker (personal communication, April 3. 1996) has found that, in practice, sample size works better. We found that using sample size resulted in more conservative (i.e., broader) estimates of confidence intervals. This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
correlation coefficients (e.g., Rosenthal, 1994). Other authorities (e.g., Hunter & Schmidt, 1990) argue against using the transformation because its bias is greater than the bias of the uncorrected r. We also did not correct for error of measurement, as some meta-analysts do (e.g., Gully et al., 1995). Because we found that the reliabilities of instruments were reported poorly in the studies whose results we were integrating, we felt that correcting for measurement error was problematic.
By not applying the correction, our mean effect sizes were lower and, thus, more conservative estimates of the true effect sizes. Table 1 summarizes the studies we included in our meta-analysis, includingdescriptions of the 39 samples. Table 2 summarizes the characteristics of these samples.

Description of the Total Sample
Date and form of publication. The 39 individual samples spanned a period of 40 years , although almost two thirds (25) were from the 1980s. Some 17 studies (44%) were from published sources (journal articles, books, or a test manual), and another 18 (46%) were conference papers, institutional reports, or dissertations. The 4 remaining studies were miscellaneous unpublished papers.
Number of participants. The number of participants can be reported in different ways-the number of participants or~ginally present in the military units involved, the response rate to a survey, or the number of participants left after those with missing data are deleted. We tried to use samples associated with the analyses, but it was sometimes necessary to estimate or average to obtain a total sample.
In this meta-analysis, the mean number of participants per study or sample was about 955. However, this average is misleading because the range was from 24 to 6,724, with almost two thirds (25) of the samples under 1,000. The total number of participants across all 39 samples was 37,226.
Number and level of groups. We sometimes found it very difficult to determine the number of groups in the study. In 30 of the 39 cases, the authors reported the number of groups, or the coders estimated the number. In the other 9 cases, there was not enough information to make estimates. The number of groups ranged from 4 to 115, with a mean of 34.6 and a median of 24.5. The most frequently reported group level was squad, section, or platoon, which accounted for 12  Outcome 1 : Officer judges rated ships on "field unit performance" (7-point scale in 11 areas) Outcome 2: 5-item work satisfaction scale (work factor) Cohesion: Group Atmospheric score (Fiedler, 1967; 10-item semantic differential) This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.  (2) r = .07 (3) r = .59 (4) r = .28 ( l ) r = .I7 (I) r = .07 ( 2 ) r = . 1 8 (I) r = .25 (see their  Table 5) This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.  This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
(3 1%) of the 39 cases. Another 6 of the samples (15%) focused on the company or battalion level. Among the group levels classified as "other," 18 (46%) were squadrons, detachments, aircrews, tank teams, ship crews, and combinations of levels. Three studies did not report a group level. Outcome measures. The number of outcome measures per study ranged from one to seven. Of the 39 samples, about half (19, or 53%) used one outcome measure. The modal number of outcomes was 2.0, and the mean was 2.19. For analysis, the outcomes were grouped into the seven categories described previously.

Samples.
Among the set of outcome variables, eight were standardized measures. These included measures of psychological well-being, physical well-being, and an Army satisfaction scale. In addition, we classified as standardized several Army measures of performance that are used to evaluate individual and group performance, such as Physical Training and Army General Testing and Evaluation Program scores. Reliabilities were reported for 24 outcome measures, with a mean reliability of .76, and a range from .43 to .95.
About half the outcome measures came from the same source as the cohesion measure. Researchers reported using the same source (usually a survey) for collecting both cohesion and outcome data for 46% of the outcomes. The other 54% of the outcomes relied on different sources to obtain these data. Cohesion measures were based on soldier responses, but some of the outcomes came from records data or ratings by commanders or others. Outcome measures for variables such as job satisfaction or well-being (e.g., stress levels or marital satisfaction) were generally self-report measures. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. Level of analysis of outcomes. We had level-of-analysis data for 82 outcomes. About half (43 of 82, or 52%) of these constituted group-level outcomes or individual data aggregated to the group level. The remaining levels of analysis (39 of 82, or 48%) involved individual data that were correlated without aggregation for the entire sample. Two studies reported data using both approaches. lnterrater reliability. Interrater agreement for 13 nominal variables averaged 89%. The percentage of agreement ranged from 28% to 100%. We obtained 100% agreement for publication date of studies, whereas the 28% rate was for type of sampling plan. The next lowest percentage of agreement for the two coders was 80% for publication form. Without the sampling plan variable, mean interrater agreement for 12 variables was 94%.
For 1 1 quantitative variables, excluding effect sizes, the correlation between the two raters was .82. The smallest correlation (.67) was for the reliability of the cohesion measure, whereas the correlation for total sample was near 100%. Because total sample was an important variable, agreement on which number to use as the sample was arrived at during conferences throughout the coding process. The comelation for 85 effect sizes was .98. Differences in effect sizes were very small and were often due to rounding differences.

Effect Size Results
Tables 3 and 4 summarize the effect sizes for each outcome class. Table 3 is based on the number of participants, and Table 4 is based on the number of groups. Both tables show the number of cases on which each effect size is based, the total number of participants or groups involved, and the confidence intervals (CI) for the effect size estimates. The tables also contain the unweighted effect size means and the coefficient of robustness (CR; Rosenthal, 1995) for each outcome.  In both tables, the number of cases for each effect size ranged from 4 (indiscipline) to 16 (group performance). However, for all but these two outcomes, the number of cases dropped when we weighted by the number of groups (Table 4). The effect size increased in five of seven outcomes and the CI broadened, as a result of weighting by the number of groups. In general, we considered results based on fewer than six or seven cases very questionable, although the pattern of contributing cases is also important in evaluating the result.
CI is used to assess the confidence one can place in a statistic, with a smaller range of values suggesting that greater confidence is warranted. Rosenthal(1995) considered the CR an additional aid in interpreting the results of meta-analytic reviews. This statistic, which is the mean effect size divided by its standard deviation, becomes larger as variability decreases, the mean effect size increases, or both. Hence, it is possible to use this ratio as an indication of the consistency of the effect size results.
Group performance. This effect size was based on the largest number of cases (16) for both weighting procedures. It was substantial when calculated using the number of participants (.400) but decreased when calculated using the number of groups (.331). TheCI was .378-.422 for weighting by participants and .250-.412 for weighting by groups.
Individual performance. The effect size for individual performance suggests a somewhat weaker and less stable relation with cohesion than does the effect size for group performance. However, the effect size of. 196 obtained from weighting by participants increased to ,310 when weighted by groups, although the CI widened to .132-.470 from .180-.211. The number of cases dropped from nine to seven as a result of the weighting change.
Job/military satisfaction. Despite this drop in the number of cases, this effect size was the largest and most consistent of all the outcomes. The value weighted by participants (.470) decreased somewhat when weighted by groups (.453), and the CI broadened (from .451-.488 to .335-.572).
Retention. Results for this very stable outcome increased slightly as a result of differential weighting (from .221 to .255), although the CI became wider (from ,205-.222 to .I 12-.397). Although the number of cases dropped from seven to four as a result of weighting by the number of groups, the already sizable CR of 2.42 increased to 4.37.
Well-being. The seven original cases for this variable dropped to three when we used weighting by groups, and the effect size rose to .334 from .245. As a result of the weighting change, the CI broadened from ,204-.222 to .112-.397.
Readiness. The effect size for readiness was .286 weighted by number of participants and .3 17 weighted by number of groups. The former value was based on six cases, and the latter based on three cases. The CI became considerably wider, moving from .252-.321 to .045-.589 because of the change in weighting.
Indiscipline. This variable was based on four cases for both types of weighting. As a result of weighting by groups, the effect size rose from .152 to .228, and the width of the CI increased from .129-. 176 to ,077-.379.

Relation of Results to Research Questions
In reviewing our findings, note that the effect size used (r) assesses the degree of a relation and does not imply that one variable necessarily causes another.
Relation of cohesion to outcomes. None of the 95% CIS for the seven mean effect sizes included zero or any minus values, indicating that the relation between cohesion and the various outcomes is a positive one. Because some findings are based on only a few cases, we are more confident about the results for certain outcomes than for others. We found, for example, that group cohesion was substantially related to soldier perceptions of job and military satisfaction. This robust finding held regardless of whether the mean effect size was weighted by number of participants or number of groups. Because the various cohesion and satisfaction ratings were all self-report measures, use of the same method may have affected these results.
Cohesion was also solidly associated with performance, with group performance more strongly correlated with cohesion than was individual performance. Group performance results also appeared to demonstrate greater consistency than individual performance results as indicated by the CRs for both outcomes.
The mean effect size for indiscipline, which was based on four cases regardless of weighting procedure, was not only based on very few cases but also involved infrequently occurring events. For these reasons, we find the results for indiscipline inconclusive.
The other outcome variables were also based on relatively fewer cases, especially when weighted by groups, and hence our conclusions must be tentative. However, we conclude that group cohesion was positively related to retention, well-being, and readiness. It is interesting to note that, when weighted by groups, thecorrelation with cohesion increased for these three outcome measures and that the CRs for retention suggested considerable stability for this outcome in particular.
Similarity of results across different types of military units. We observed no clear effect of military service (e.g., U.S. Army, Navy, etc.) or country (United States, Canada, or Israel), but because we had so few samples for such comparisons, our conclusions must be tentative. Table 1, a wide variety of both cohesion and outcome measures has been used in military research. Some of the cohesion measures (especially the sociometric ones) focus on the interpersonal aspect of the phenomenon, but others also incorporate the task or teamwork dimension of cohesion. We attempted to code the studies for different dimensions but abandoned the task because few researchers provided this kind of information. Sometimes, they reported using a particular instrument, but our examination of the data revealed that they used only some items from the measure. Thus, we found it difficult to make comparisons of results supposedly involving the same instrument.

Comparability of measures. As shown in
Future research may shed further light on the comparability issue. For now, we suggest that a varied set of measures appears to result in similarly inconclusive findings.

Relation of results to previous meta-analyses.
Our research resulted in effect sizes roughly comparable to those obtained in other meta-analyses. For example, our group performance results were close to the results of Evans and Dion (1991), whose r values were .36 uncorrected and .42 corrected. Gully et al. (1995) reported a range of effect sizes (. 166 to .464) that varied according to whether they were corrected for unreliability of measurement, involved self-report measures of performance, or took into account the moderators of level of analysis3 or degree of task interdependence. Although our mean effect sizes tended to be lower than those reported by Gully et al., our values for group performance (.400 weighted by participants, .331 weighted by groups, and .428 unweighted) were fairly similar to their results for high levels of task interdependence (.387 uncorrected and ,464 corrected). Although we did not code degree of task interdependence, we did separate performance on the basis of group and individual tasks. Thus, group tasks, such as platoon performance on a field problem, would be more interdependent than individual tasks, such as scores on a job knowledge task or rifle qualification exercise.
Our effect sizes for performance were higher than those Mullen and Copper (1994) obtained for their set of military groups. In addition to rejecting 44 of the 10 studies Mullen and Copper classified as military, we differentiated between group and individual performance, which Mullen and Copper did not. When we averaged the effect sizes for individual performance and group performance, we obtained an overall mean effect size for performance of .265 when weighted by number of partic-3We did not have enough cases for most outcomes to compare group-and individual-level analyses as Gully et al. (1995) did. However, we obtained an effect size of .439 for the 12 cases of group performance that used the group as the unit of analysis. The 4 cases of group performance that correlated individual cohesion measures directly with individual performance measures resulted in an effect size of .347. ipants and .325 when weighted by number of groups, bringing our results closer to those of Mullen and Copper, who obtained .23 for their "military" subgroup.
Because previous meta-analyses have considered only performance as an outcome measure, we cannot compare our results for the other outcomes.

Implications of Results for Military Policy and Planning
In general, military planners and policymakers believe that cohesion is a desirable characteristic of military units and try to foster its development. Our findings provide substantial empirical support for such activities in terms of performance and job satisfaction outcomes. The relation of cohesion to other outcomes is not yet firmly established, but the available research suggests that cohesion enhances well-being, increases retention and readiness, and works against indiscipline in military groups. We encourage the services to continue their efforts to enhance cohesion and to consider developing (and documenting the effect of) interventions designed to increase cohesion.
For example, to enhance cohesion, some authorities (e.g., Blaufarb, 1989) have argued in favor of lengthening tours to diminish turbulence. Stabilizing personnel supposedly leads to increased cohesion and, thus, to greater combat effectiveness (see Griffith, 1986), although research by Oliver (1988b) suggests that intermediate levels of turbulence may be most desirable in enhancing cohesion and performance, noting that one of the problems with the COHORT (Cohesion, Operational Readiness, and Training) system was that the junior enlisted personnel were well stabilized but their officer leadership was not. When two values such as these-increasing personnel stability and enhancing officer career development-conflict, Army managers must make difficult decisions as to how to resolve these issues.

Implication of Results for Future Research
Our results have many implications for future research, some of which are briefly noted here.
Reporting basic data. Meta-analysts are invariably amazed by how poorly research is reported when they face the task of coding a set of studies (Orwin, 1994). To produce accurate studies, researchers should report means, standard deviations, number of participants, and intercorrelations of all continuous variables. All results need to be included, not just significant ones, with exact probabilities of results reported. Researchers should describe their samples, instruments, and settings in sufficient detail to enable readers to judge the extent to which the research might generalize to other situations and to enable meta-analysts to code pertinent variables. Cohesion researchers also need to report both the total number of participants and the number of groups as well as nonresponse rates and attrition data.

Conceptualization.
We felt that the conceptualization in the studies we reviewed tended to be weak. Many researchers did not define what they meant by cohesion. They may have quoted classic authorities such as Lott and Lott (1965) but then neglected to specify their own definition. We feel it is essential for researchers to define such terms so that the operationalization of the variables can be assessed.

Selecting measures.
We believe that the military cohesion research contains good examples of instruments that have demonstrated their worth and would be useful in future research. There are several cohesion measures, for example, and a number of outcome measures that provide reliability and validity data. We urge researchers to incorporate such measures into their future investigations.
In addition, we recommend that researchers tap different sources for their cohesion and outcome measures. Measures of satisfaction and some of the standardized well-being scales will necessarily be from the same source as the cohesion measures; but in other cases, especially when collecting performance data, researchers should use different sources for their measures.

Level of analysis.
We found that cohesion researchers often ignored the level of analysis issue. Correlating individual data with outcomes across the entire sample ignores the group aspect of cohesion. The appropriate level will depend on the researcher's purpose and the mission of the units involved. In general, we recommend lower levels, such as squad or platoon, because larger units may get beyond the "small-group" aspect of cohesion and are less likely to meet definitions of the construct. Sometimes, of course, performance data are available at a higher level-the company road-march scores of Siebold and Kelly (1987), for example, or the ship performance data used by Allen (1986). In both these cases, the researchers used performance data for a higher organizational level that functioned as a group to accomplish a specific mission.
Both cohesion and outcomes need to represent the same level. For example, if the level chosen is the platoon, the cohesion instrument should query respondents about interactions at the platoon level. If the outcome measure is not a measure of platoon performance as a group, individual performance data should be aggregated to the group (platoon) level. Some authorities (e.g., Gully et al., 1995) also emphasize the need to examine the agreement of responses at the individual level before aggregating to a group level. According to Gully et al., aggregation bias becomes a "potentially severe problem" (p. 513) if the homogeneity of individual-level responses is not ensured.
Other moderator variables. Although we coded potential moderators, too few studies used the same variables to make such analyses meaningful. We encourage cohesion researchers to include measures of leadership style, demographic characteristics, task interdependence, and other potential moderators to enable future meta-analysts to explore their relation to cohesion.

CONCLUSIONS
We conclude that the results of ourmeta-analysis demonstrate that group cohesion in military units is related in apositive manner to various desirable outcomes of interest to themilitary services. This meta-analysis is the first sucheffort limited to thecohesion of military groups and also the first to examine the relation of cohesion to outcomes other than performance. Our results demonstrate that looking at measures of individual performance separately from measures of group performance has merit because these variables are differentially related to cohesion. In addition, we have reported effect sizes weighted by number of groups as well as by number of participants. We also are convinced of the need to consider level of analysis in selecting or developing measures and in conducting dataanalysis. As notedpreviously, ourfindings have relevance both for military planning and for future research.