Forum RESIDENTIAL RADON EXPOSURE AND LUNG CANCER RISK: COMMENTARY ON COHEN’S COUNTY-BASED STUDY

The large United States county-based study (Cohen 1995, 2001) in which an inverse relationship has been suggested between residential low-dose radon levels and lung cancer mortality has been reviewed. While this study has been used to evaluate the validity of the linear nonthreshold theory, the grouped nature of its data limits the usefulness of this application. Our assessment of the study’s approach, including a reanalysis of its data, also indicates that the likelihood of strong, undetected confounding effects by cigarette smoking, coupled with approximations of data values and uncertainties in accuracy of data sources regarding levels of radon exposure and intensity of smoking, compromises the study’s analytic power. The most clear data for estimating lung cancer risk from low levels of radon exposure continue to rest with higher-dose studies of miner populations in which projections to zero dose are consistent with estimates arising from most case-control studies regarding residential exposure. Health Phys. 87(6):647–655; 2004


INTRODUCTION
In recent years many epidemiologic studies have sought to determine the extent of lung cancer risk arising from residential low dose radon exposure (NAS 1999). Data regarding higher dose exposures in miner populations, when extrapolated to low dose levels, suggest a linear no-threshold (LNT) relationship. Most studies in residential exposure settings are compatible with this suggestion, especially case-control studies in which data come from observations made on individual persons (Lubin and Boice 1997;NAS 1999;Darby et al. 2001a;Field 2001;Krewski et al. 2002;Darby and Hill 2003). Considerable heterogeneity exists among such studies, however, and none to date, while consistent with the LNT model, has yet had sufficient statistical power, alone or collectively, to demonstrate a clear doseresponse relationship at low dose levels (below about 200 Bq m Ϫ3 ).
The large county-based study conducted by Cohen regarding population-based lung cancer rates and areaspecific residential radon levels suggests that increasing levels in such settings may be associated with reduced lung cancer risk (Cohen 1995(Cohen , 2001(Cohen , 2002. While the study was designed specifically to test the validity of the LNT model and not to make direct measurements of risk for individuals, its value for estimating health risks at any level has been widely questioned since it relies entirely on grouped population data, without recourse to levels of exposure or disease risk among individuals (Stidley and Samet 1993;Greenland andRobins 1994a, 1994b;Piantadosi 1994;Field et al. 1998;Lubin 1998aLubin , 1998bNAS 1999;Greenland 2001;NCRP 2001). Such grouped data (so-called ecologic data) are useful for measuring individual risks only if a linear relationship exists between exposure and outcome. However, the grouped data on which Cohen's study is based, in the absence of supplemental information regarding individual risks or illnesses within counties, provide extensive opportunities for uncontrolled cross-level bias and confounding because of differing population characteristics among groups being compared. In defending his work, Cohen emphasizes that he is merely seeking to test the validity of the LNT model, that this is different from measuring health risk outcomes, and that therefore the usual reservations about grouped data do not apply (Cohen 1994(Cohen , 1998a(Cohen , 1998b(Cohen , 2000. While Cohen's approach may be legitimate in principle, his exclusive use of aggregate population-based data does mean that his study is still open to ecologic bias. More importantly, it seems also subject to other difficulties related to its use of various data assumptions and to the manner in which it conducts certain calculations, especially as they relate to likely confounding by cigarette smoking. To understand Cohen's work more fully, we obtained the several sets of county data upon which his study is based (radon levels, lung cancer mortality rates, and smoking prevalence values). Beyond recognizing potential for ecologic bias, we have chosen to review and repeat the analysis itself, giving attention to sources of data, to analytic results (with particular emphasis on the potential influence of cigarette smoking), and to the interpretation of those results in terms of possible doseresponse patterns. In presenting the results of this review, we have chosen to focus on data for males only (analytic results for females did not differ from those for males) and to restrict our analysis of lung cancer mortality to the time period 1970 -1979.

ANALYTIC MODEL AND ASSUMPTIONS
The analytic model by which Cohen seeks to test the LNT theory rests with the BEIR IV equation (NAS 1988) that describes individual risk (Cohen 1995 where m is lung cancer mortality, a and b are parameters, r is radon exposure, and i represents the individual and the variety of associated variables that may modify individual risk. If summed to represent risk over a population, eqn (1) becomes ͚m͑i͒ ϭ ͚a͑i͒ ϩ ͚a͑i͒b͑i͒r͑i͒.
Without information about each person's risk profile and radon exposure history, that equation is generally not solvable. However, if a and b are constants (A and B), the following equation can be formed: where N is the total population (here, county populations). When divided by N, eqn (3) becomes: where M is the crude lung cancer mortality rate in the population. By definition, the term multiplying B is the average radon exposure. Unfortunately, the assumption that A and B are constants is an incorrect simplification (Lagarde and Pershagen 1999;Lubin 2002). Cohen recognizes this in his calculation of expected ageadjusted values, but nonetheless uses eqn (4) as his basis for analyzing data. He then plots lung cancer mortality against average radon levels. While the average level can at least in principle be measured, the exposure value depends heavily on individual population characteristics Smith et al. 1998). Despite such assumptions, Cohen proceeds to use eqn (4) to compare data for M with average radon levels and not with average radon exposure values.

Counties and radon levels
The counties included in the study were selected on the basis of available county-specific average indoor residential radon level measurements made in the time period 1986 -1991 (Cohen and Colditz 1994;Cohen 1995). Initially, 1,729 United States counties were included (over half of all U.S. counties and about 90% of the country's population). Three different data sources were used: 1) measurements made through a special project at the University of Pittsburgh (1,151 counties); 2) measurements made by the U.S. Environmental Protection Agency (1,074 counties); and 3) measurements made by individual state agencies (358 counties). Average values were used for counties where more than one source of measurement existed. Concern about the possible effects of population mobility on total radon exposures led to the removal of all counties in Arizona, California, and Florida, states to which people often move for retirement. This reduced the study's final geographic coverage to 1,601 counties.

Lung cancer mortality
Average annual age-adjusted and sex-specific lung cancer mortality rates were initially obtained from national statistics for the time period 1970 -1979 in each county (Riggan and Mason 1983). Later analyses used similar data for the more recent time period 1979 -1994 (obtained from an internet website maintained by the Centers for Disease Control and Prevention). Analytic results were similar using the two different time periods.

Smoking levels
County-specific data regarding sex-specific prevalence of cigarette smoking were derived from statespecific data contained in the 1985 Bureau of the Census Current Population Survey (Marcus et al. 1989). To transform those state values into county values, Cohen made adjustments for selected county-specific socioeconomic variables likely to be related to smoking prevalence. Initial analyses encompassed 54 such variables, with principal attention given to urban-rural population distributions (Cohen 1995). Later analyses expanded the list to 472 variables (Cohen 2001). Smoking frequency levels were further adjusted back to their possible values in 1960 -1970 by the use of state-specific frequency ratios that compared 1985 with that earlier period (Cohen and Colditz 1994).

Uncertainties
Of the various sets of data used in the study, most affected by uncertainty are the estimates of county-based smoking prevalence levels since they involve calculation from state values, allowance for changes over time, and adjustment for socioeconomic variables. Cohen discussed these estimates at length and tried different approaches to assess their effect on radon-related lung cancer risk, including grouping counties into six subsets according to similar radon levels and conducting statespecific analyses (Cohen 1995).
Uncertainties about radon levels involve limitations in physical measurement procedures. This, coupled with inability to account fully for multiple residences over time and for population mobility patterns, makes radon levels unreliable surrogates for actual radon exposures. Clearly, age and radon exposure are linked as shown in eqn (2) and as Cohen recognizes in his calculation of expected values (Cohen 1995). While these uncertainties could easily reduce or mask a positive correlation, they do not seem likely to produce a negative correlation, and less likely to do so than uncertainties regarding smoking frequencies ). The county-specific lung cancer mortality rates raise some question with respect to the time interval between radon measurements (1986 -1991) and dates of cancer occurrence (1970 -1979). Although it seems unlikely that radon levels would change much over time, Cohen addressed this issue by using a later mortality time period in his most recent analyses (Cohen 2001). His recent analyses gave results similar to his earlier studies.

ANALYSIS OF DATA
Figs. 1 and 2 show scatter plots of the data for lung cancer mortality and for smoking prevalence vs. radon levels. Similar declines with increasing radon, accompanied by considerable scatter of data points, are evident for both lung cancer mortality and smoking prevalence. This picture suggests either a protective effect of radon exposure against lung cancer or a negative confounding relationship between smoking prevalence and radon levels across counties. As expected, the data also show a strong direct relationship between smoking prevalence and lung cancer risk (Fig. 3).
Cohen initially performed analyses using individual counties as separate data points but later, for display purposes, combined counties into groups within increasing ranges of radon levels and calculated average values for mortality rates and radon levels. We have calculated average values for groups of counties with increasing    000 (1970 -1979) by percent cigarette smokers among males. Each point represents data for one county. 650 Health Physics December 2004, Volume 87, Number 6 radon levels but have created 13 groups compared to Cohen's 18 (Table 1). Cohen's original data set recorded county-specific radon levels in pCi L Ϫ1 units; we grouped counties within 0.50 pCi L Ϫ1 intervals (later converting to Bq m Ϫ3 ) while Cohen used 0.25 pCi L Ϫ1 intervals in 12 instances where numbers of counties were large and 1.00 pCi L Ϫ1 in one instance where numbers were small. We calculated unweighted average values for cancer mortality, together with standard errors of the means and standard deviations of the distributions. Cohen also used the standard error of the mean within each county grouping, but in addition, indicated the first and third quartiles of mortality rate distributions as a reflection of how widely scattered rates for individual counties are in relation to the mean value for each county group. Fig. 4 shows the results of our analysis of lung cancer mortality in relation to radon levels, uncorrected for smoking but with smoking frequencies superimposed. Both Cohen's analysis and ours show an overall pattern of decreasing mortality with rising radon levels. In both sets of data, however, that decrease is largely confined to radon levels below about 100 Bq m Ϫ3 , rates above about 175 Bq m Ϫ3 being too uncertain to permit interpretation due to the limited number of counties with such dose rates. At intermediate levels, while the curves are essentially flat, the wide distribution of individual county rates does not rule out the possibility of either rising or falling risks. Smoking frequencies in Fig. 4 closely parallel lung cancer mortality except at the lowest radon levels where frequencies do not rise with declining radon. This suggests that confounding by smoking may be especially prominent among counties in that low radon range.
To address more fully the issue of confounding by smoking, Cohen assumes, as BEIR IV does also (NAS 1988), that mortality rates M [see eqn (4) above] can be approximated as where S is the fraction of smokers, factors a s and a n represent base mortality rates for smokers and non-smokers, b is a constant, and r is the average radon level. All of the difficulties discussed above in arriving at eqn (4) are exacerbated in using eqn (5). Nonetheless, we corrected for the effect of smoking by assuming a s ϭ 12a n and dividing the mortality rates by 1 ϩ 11S, county by county, which gives a result for mortality of non-smokers in the form of a n (1 ϩ br). We then calculated the average mortality rate and its standard error of the mean and standard deviation of the distribution for each group of counties. Fig. 5a and 5b compare results before (also contained in Fig. 4) and after adjustment for smoking. Both show a modest overall decline in average mortality rates with increasing radon levels, a decline that is mostly seen at the lowest radon levels. Now, however, after adjustment for smoking, both in our analysis and in Cohen's, the slope at the lowest levels of radon is less steep and does not extend beyond about 50 Bq m Ϫ3 . For radon levels above that point and extending to about 175 Bq m Ϫ3 , the curve is essentially flat. Data beyond about 175 Bq m Ϫ3 are too uncertain to support conclusions. The difference in slope between low and mid-range radon levels, together with some reduction in the low-range pattern after smoking adjustment, suggests that, in counties where radon levels are low [more likely urban than rural parts of the country (Cohen 1991)], higher frequencies of smoking may result in stronger confounding than elsewhere. While the opposite effect might be expected in high-radon, low-smoking parts of the country (largely rural counties), the data there are too sparse for interpretation.
In an effort to explore potential urban-rural differences more fully, we examined, as Cohen did also, data within individual states, some states being more urban than others. Such state-specific mortality patterns were not found to be consistently different from those seen for the full county-based data set. Likewise, no differences were seen in analyses of county data grouped by median values into joint categories of high or low smoking and high or low radon levels. As expected from the total data analyses ( Fig. 5a and b), steeper mortality slopes were seen mostly in low radon counties where true smoking frequencies are likely to be especially high.

COMMENT
There has been substantial criticism of several aspects of Cohen's approach to data analysis, beyond the fundamental reservations about ecologic bias arising from his use of grouped data. Much of this criticism has centered on his belief that his analyses have adequately adjusted for the serious confounding effects of smoking. Other concerns, however, involve major uncertainties in sources of data, the use of radon levels instead of radon exposure, and smoking prevalence uncorrected for individual smoking patterns. These factors introduce uncertainties that the use of average values does not remove ). The confounding effects of smoking are potentially so very large, given the very strong etiologic relationship between cigarette smoking and lung cancer and the complex interactions between smoking and socioeconomic conditions, that merely relying on average overall smoking prevalence, supplemented by average values for many related census variables, cannot be expected to provide adequate adjustment. Full control of smoking confounding is unfortunately not possible within the context of this study. Since its data come entirely from broad, often non-specific, population-based sources, they lack not only detailed county-specific information about smoking patterns, beyond total smoking prevalence values, but there is no access to information from individual persons in the counties under study. Hence in some counties with high smoking percentages, perhaps urban counties in particular, actual smoking exposures (number of cigarettes per day, years smoked, etc.) may well be considerably greater than in other counties with different socioeconomic conditions, despite similar overall smoking percentages (Darby et al. 2001b). Both our analyses (Figs. 4 and 5) and Cohen's analysis imply that this may be so. Only at the lowest radon levels, in counties likely to be more urban than rural (Cohen 1991), do the data suggest a negative relationship with lung cancer mortality. It is likely that this pattern results from strong confounding by smoking data that are particularly imprecise in urban settings. Similar confounding, although perhaps not as intense, is equally likely to exist in the data for counties at higher radon levels.
A dominant role for smoking-related confounding is clearly reinforced by a recent study that used Cohen's county data sets to examine relationships between radon levels and mortality from several different forms of cancer (Puskin 2003). Strong negative relationships were seen for four cancers strongly associated with cigarette smoking (lung, oral-pharynx, larynx, and esophagus), clear but less marked relationships for two cancers less  (1970 -1979) and average percent of cigarette smokers among males (Ⅲ) by average radon levels within counties grouped by rising dose intervals. Confidence intervals are expressed as the standard deviation of the distribution for each county group. Fig. 5a. Average annual lung cancer mortality per 100,000 among males (1970 -1979) unadjusted for percent of cigarette smokers among males in each group of counties. Confidence intervals expressed are 95% level of the mean (solid lines) and the standard deviation of the distribution (dotted lines) for each county group. Fig. 5b. Average annual lung cancer mortality per 100,000 among males (1970 -1979) adjusted for percent of cigarette smokers among males in each group of counties. Confidence intervals expressed are 95% level of the mean (solid lines) and the standard deviation of distribution (dotted lines) for each county group. strongly associated with smoking (pancreas and urinary bladder), and no distinct relationship for three cancers not substantially linked to smoking (colon, prostate, and female breast). These findings closely resemble earlier findings reported by Cohen himself (Cohen 1993) and remarked upon by Gilbert (1994). Since only for lung cancer can one postulate any significant direct effect of exposure to radon progeny, it seems most likely that for all smoking-related cancers, including lung cancer, the apparent negative relationship with radon levels merely reflects the confounding influence of smoking rather than any true biologic effect arising from radon exposure itself. In response to Puskin, Cohen maintains that his study's smoking data do not confound the results, and he proposes that radiation-induced biologic defense mechanisms may act to protect against cancer at tissue sites apart from those directly in contact with inhaled radon (Cohen 2004). In answer, Puskin stresses the biologic implausibility of radon progeny acting to stimulate cellular defenses (Puskin et al. 2004).
Similar questions regarding the possible confounding influence of smoking are suggested in a further analysis that uses Cohen's data sets and compares lung cancer mortality rates with radon levels at different county elevations above sea level (Van Pelt 2003). As altitude rises, radon levels rise, but lung cancer rates fall. Stratification by altitude reduces by about 50% the overall negative slope of lung cancer risk in relation to radon levels. While Van Pelt proposes that this reduction in risk may result from lowered oxygen levels at higher altitudes, resulting in greater oxidative DNA damage at lower elevations, it may also reflect incomplete adjustment for confounding by smoking. As suggested above, smoking prevalence at lower altitudes where more urban populations exist may be greater than county average values suggest, while the reverse may hold true for more rural counties at higher elevations. A similar disparity in smoking frequencies at different geographic locations, confounding observed relationships between radon and lung cancer, is suggested in data from Sweden (Lagarde and Pershagen 1999). With smoking being a strong determinant of lung cancer risk, and radon relatively weak, accurate and comparable measures of smoking prevalence are critical.
The conclusion we therefore draw, without reference to reservations about ecologic bias, is that systematic errors and uncertainties in Cohen's data and analysis, especially in relation to the influence of smoking on lung cancer risk, preclude estimating to what degree or in what direction lung cancer mortality is altered by exposure to very low doses of radon (less than about 175 Bq m Ϫ3 ). At the same time, however, we would also comment that the figure contained in the BEIR VI report (figure 3-2, page 89) (NAS 1999) is not an entirely accurate representation of Cohen's published results, shown there in comparison to results from miner cohort studies and residential case-control studies. That figure exaggerates the contrast between Cohen's work and the extrapolation of miner data by extending Cohen's data on lung cancer mortality to radon levels higher (up to about 350 Bq m Ϫ3 ) than those that were actually analyzed (up to about 200 Bq m Ϫ3 ) and by not including error bars for the data shown. Were the figure adjusted to reflect these facts, the apparent difference between Cohen's results and those of case-control studies would be less striking. We would suggest, however, that contrasting the results of these several kinds of studies in a single graph may itself be misleading in the face of underlying concern about ecologic bias in the Cohen data. Had those data suggested a positive relationship between radon exposure and lung cancer mortality, that same concern would apply.
However Cohen's data are displayed, their interpretation is clearly compromised by their reliance on grouped data. This potential for ecologic bias is especially great since the study seeks to measure a relatively weak relationship between radon exposure and lung cancer risk in the face of the exceptionally strong influence of smoking. In the absence of data for individual persons, especially data about individual smoking histories, but also uncertain approximations of actual radon exposure, reliable conclusions are unlikely. Despite the potential statistical power of ecologic studies, arising from their use of large populations, their inability to overcome ecologic bias, and especially their limited capacity to counteract the strong confounding influence of smoking risk factors, greatly compromises their value in assessing the risk of lung cancer from residential low dose radon exposure.