Rating Locomotive Crew Diesel Emission Exposure Profiles Using Statistics and Bayesian Decision Analysis

ABSTRACT

For more than 20 years CSX Transportation (CSXT) has collected exposure measurements from locomotive engineers and conductors who are potentially exposed to diesel emissions. The database included measurements for elemental and total carbon, polycyclic aromatic hydrocarbons, aromatics, aldehydes, carbon monoxide, and nitrogen dioxide. This database was statistically analyzed and summarized, and the resulting statistics and exposure profiles were compared to relevant occupational exposure limits (OELs) using both parametric and non-parametric descriptive and compliance statistics. Exposure ratings, using the American Industrial Health Association (AIHA) exposure categorization scheme, were determined using both the compliance statistics and Bayesian Decision Analysis (BDA). The statistical analysis of the elemental carbon data (a marker for diesel particulate) strongly suggests that the majority of levels in the cabs of the lead locomotives (n = 156) were less than the California guideline of 0.020 mg/m(3). The sample 95th percentile was roughly half the guideline; resulting in an AIHA exposure rating of category 2/3 (determined using BDA). The elemental carbon (EC) levels in the trailing locomotives tended to be greater than those in the lead locomotive; however, locomotive crews rarely ride in the trailing locomotive. Lead locomotive EC levels were similar to those reported by other investigators studying locomotive crew exposures and to levels measured in urban areas. Lastly, both the EC sample mean and 95%UCL were less than the Environmental Protection Agency (EPA) reference concentration of 0.005 mg/m(3). With the exception of nitrogen dioxide, the overwhelming majority of the measurements for total carbon, polycyclic aromatic hydrocarbons, aromatics, aldehydes, and combustion gases in the cabs of CSXT locomotives were either non-detects or considerably less than the working OELs for the years represented in the database. When compared to the previous American Conference of Governmental Industrial Hygienists (ACGIH) threshold limit value (TLV) of 3 ppm the nitrogen dioxide exposure profile merits an exposure rating of AIHA exposure category 1. However, using the newly adopted TLV of 0.2 ppm the exposure profile receives an exposure rating of category 4. Further evaluation is recommended to determine the current status of nitrogen dioxide exposures. [Supplementary materials are available for this article. Go to the publisher's online edition of Journal of Occupational and Environmental Hygiene for the following free supplemental resource: additional text on OELs, methods, results, and additional figures and tables.]. 

METHODS

Data Acquisition and Validation

After obtaining the data sets, the data were copied to a “working” Excel spreadsheet and then imported into a statistical analysis program (Version 13, Systat Software, Inc., San Jose, Calif.). The sampling methods used are listed in Table I. The data set fields used in the analysis were inspected for odd or inconsistent entries.

A pdf copy of the “CSX Transportation Air Sampling Record” for each sampling result was provided (carbon monoxide and nitrogen dioxide were not included). A selection of the records was extracted and the exposure data and information were compared to the entries in the database. A few discrepancies were found regarding the sampling method or sampling time. For example, there were four benzene results listed as detects when it was known that no detects for benzene had been measured by CSXT during the 1990 to 2011 interval. Upon checking the CSXT records, including the laboratory reports, it was found that these data had been entered incorrectly into the database.

Data Standardization and Filtering

In principle, the sampling time of a measurement should be consistent with the averaging time of the exposure limit. For this study, sampling time equaled the time required for the locomotive run, which nearly always was greater than 120 min. Measurements were deleted from the analysis if the sampling time was less than 120 min. This resulted in the deletion of only three EC non-detects where the sample times were 6, 26, and 46 min. None of the benzene cases was deleted.

Nitrogen dioxide was measured using both direct reading instruments and long-term passive dosimeters. The direct reading measurements and the passive dosimeter measurements that were either recorded as zero or were collected for less than 120 min were discarded. Several hundred carbon monoxide measurements were collected using a direct reading instrument and recorded as either non-detects or not greater than 5 ppm. Of the 68 measurements collected using long-term samplers, 45 were recorded as zero and three had sampling times less than 120 min. This left 22 measurements where the sampling time was 120 min or greater and the recorded concentration was greater than zero.

The sampling times corresponded to the length of the locomotive run, which tended to vary daily depending upon the assignment for each engineer and conductor crew. For comparison to the OELs, the concentrations were adjusted to a 480-min sampling period: C480=(c*t)/480, where c average concentration and t=sampling time. The time spent waiting for an assignment or after the run being transported via automobile to the next assignment was unsampled.

Selection of OELs

Table II lists the various federal and authoritative OELs that were used as “working OELs” for this analysis. The working OEL was often the ACGIH TLV. The Occupational Safety and Health Administration (OSHA) permissible exposure limit (PEL) was used in those instances where the measurement sampling time was not consistent with that required for comparison to the ACGIH TLV. (The Supplemental Materials contains a table of additional OELs for each substance as well as additional text regarding the California guideline,(6) the U.S. Environmental Protection Agency (EPA) “Reference Concentration,”(12) and the Mine Safety and Health Administration (MSHA) limit for the total carbon in diesel emissions.(13)) To generate figures that contain the results for several substances a “severity ratio” was calculated by dividing the measurement by the working OEL.

Statistical Analysis

Each locomotive has an engineer and a conductor. They tend to be potentially exposed to similar substances (i.e., diesel emissions), at similar levels, and work at the same location and during identical periods. However, the conductor generally spends slightly less time in the cab due to the need to inspect the train prior to a run. The engineer and conductor will be considered here to be part of the same similar exposure group (SEG).(14) Furthermore, this SEG will be considered to span all locomotive models.

Descriptive and compliance statistics were produced using both Systat (Version 13) and the IHDataAnalyst (Version 1.27, Exposure Assessment Solutions, Inc, Morgantown, WV). The majority of the figures were generated using Systat. The IHDataAnalyst was used to (a) evaluate the goodness-of-fit for the lognormal distribution model, looking for egregious departures, (b) calculate descriptive and compliance statistics for censored data sets (i.e., a data set containing non-detects), (c) calculate non-parametric statistics, and (d) calculate BDA probabilities, which were used to assist in assigning the SEG exposure profile to the most appropriate exposure category.

Descriptive Statistics

In addition to the usual order statistics—sample size (n), minimum (min), maximum (max), and median—the following descriptive statistics are provided: sample mean and sample standard deviation (mean, SD) and the sample geometric mean (GM) and sample geometric standard deviation (GSD).

Most of the substance-specific data sets contained non- detects (i.e., the true concentration was less than the “minimum quantifiable concentration” (MQC) for the sampling and analytical method and laboratory combination). When non- detects were present, the percent censored was less than 80%, and the sample size was fairly large, the maximum likelihood estimation (MLE) method was used to calculate estimates of the lognormal distribution parameters.(15,16) These estimates were then used to calculate the mean, exceedance fraction, and the 95th percentile of the substance exposure profile. The Kaplan-Meier method, as recommended by Helsel(17) forcensored data sets, was also used to estimate the mean EC level.

Compliance Statistics

The exceedance fraction and 95th percentile were calculated to assist in determining whether or not the exposure profile for each substance was generally in compliance with the working OEL (see Table II). The compliance statistics were estimated using non-parametric methods and using the lognormal distributional model. Because the statistics are estimates, and not the true values, the 95% lower and 95% upper confidence limits were calculated for each estimate. The statistics and confidence limits were calculated using standard methods.(14,18)

If the data were censored, i.e., contained one or more non- detects, the sample size used to calculate each confidence limit was the total sample size minus the number of non-detects.(14) There is no generally accepted method for calculating confidence intervals where the data set is censored. This ad hoc procedure results in somewhat wider confidence intervals, but compensates for the presence of non-detects in the data set

Assigning an AIHA Exposure Rating

Exposure ratings of 0 to 4 are assigned using the rating scheme of the AIHA.(14) For example, an exposure rating of Category 3 or less indicates that the majority of the occupational exposures—that is, at least 95% —were less than the OEL. An exposure rating of Category 2, 1, or 0 indicates that the majority of the exposures were less than 50%, 10%, or 1% of the OEL, respectively. An exposure rating of Category 4 indicates that occupational exposures frequently exceeded the OEL; that is, greater than 5% of the exposures exceeded the OEL. Exposure ratings are useful in that a succinct phrase—for example, “category two, high certainty” —can be used to convey a considerable amount of information: the most likely range for the true 95th percentile exposure, the statistical confidence in the assessment, and the degree of risk (relative to the chosen OEL) most likely experienced by members of the SEG.

While the sample 95th percentile (and its confidence interval) can be used to assign an exposure rating, the method of BDA was employed for most of the exposure rating assignments.(19) BDA is a statistical method for estimating the likelihood that the true 95th percentile exposure falls within the range associated with each of the AIHA exposure rating categories and is capable of handling both detects and non- detects. For this analysis a flat, non-informative prior was used (see the “Methods” section in the Supplemental Materials and Hewett et al.(19) for more information).

Goodness-of-fit Evaluation

The substance-specific data sets were evaluated using both subjective and objective goodness-of-fit procedures to deter- mine if the lognormal distributional model was appropriate for describing the exposure profiles.(14,18) Goodness-of-fit is not an issue whenever non-parametric statistics are calculated, as these statistics do not require a distributional model. Non- parametric statistics, also called large-sample statistics, may be more informative whenever the goodness-of-fit determination is equivocal and the sample size is fairly large. In this study, both parametric and non-parametric statistics are reported, with a general finding that they are consistent, and lead to identical or near identical conclusions.

RESULTS

A total of 190 EC measurements was collected from lead, trailing, and miscellaneous yard locomotives. Sample times for EC ranged between 121 and 669 min, with a median of 403 min. (There was no obvious trend in a plot of the sample times and concentrations. The Pearson and Spearman correlation coefficients were -0.12 and -0.15, respectively.) The EC measurements were collected starting in 1996 through 2007. Measurements for benzene, toluene, ethylbenzene, and xylene (BTEX) were collected starting in 1994 through 2007, although sampling did not occur every year. Aldehydes, PAHs, and gases (CO and NO2) were collected on and off between 1990 and 2007, respectively. (see Table SII for additional details).

The data for all substances (except PAHs), normalized to the working OEL, are shown in Figures 2–5. The majority of the measurements for BTEX and aldehydes were non-detects. None of the detects exceeded the working OELs. Nearly 98% of the PAH measurements were non-detects, with a maximum detect of 0.024 mg/m3. All but one of the CO detects was less than 10% of the working OEL. For NO2, 6.4% of the measurements exceeded the working OEL.

Traditional Statistical Analysis

Table III contains the usual descriptive and compliance statistics for all substances (except PAHs). Given the large percentages for non-detects, the calculation of the standard normal and lognormal descriptive statistics was not possible for many substances. For the remainder, the maximum likelihood method (MLE) was used to estimate the geometric mean (GM) and geometric standard deviation (GSD).(16) These were then used to calculate the minimum variance unbiased estimator of the mean. Confidence intervals for the median and mean are provided to assess the uncertainty in the point estimates. (Descriptive statistics for PAHs are shown in Table SIII.

The usual compliance statistics—exceedance fraction and 95th percentile—and confidence intervals are provided in Table IV. In general, there was good agreement between the non-parametric and parametric (i.e., lognormal distribution) estimates. For BTEX, aldehydes, and the gases, all of the non-parametric and parametric sample 95th percentiles were considerably less than the working OELs. For NO2, the parametric 95th percentile was 0.25 ppm, which exceeded the working OEL (i.e., the new ACGIH TLV) of 0.2 ppm,
but was considerably less than the previous ACGIH TLV of 3 ppm.

The EC data for the lead locomotive are displayed in Figure 6. Nearly 58% of the measurements were non-detects. Four measurements exceeded the working OEL, while the majority were less than the EPA Reference Concentration.(12) The lead locomotive data failed a formal goodness-of-fit test (for the lognormal distributional model), which is not surprising considering the large percentage of non-detects and the departures from lognormality in both tails (see Figure 7). Subjectively, however, the lognormal fit does not look unreasonable.

Both the non-parametric and parametric sample 95th percentile (and their 95%UCLs) for EC in the lead locomotives (see Table IV ) were less than or nearly equal to the working OEL, which strongly suggests that the true 95th percentile was less than the working OEL. In addition, both the mean and its 95%UCL were less than the EPA Reference Concentration of 0.005 mg/m3, suggesting that the true mean was less than the EPA limit for general environmental exposures. (The non- parametric Kaplan-Meier mean for left censored data, which takes in account the non-detects, was 0.0028 mg/m3, which is equal to the mean in Table III calculated using the lognormal model. (Helsel(17) recommended calculating the Kaplan-Meier mean whenever a censored data set is suspected to depart substantially from the lognormal model.)

Analysis Using BDA

Table V contains exposure ratings and certainty levels for all substances (except the PAHs). The exposure ratings provide an alternative means for assessing compliance with an OEL. This table allows one to quickly evaluate the exposure ratings and uncertainties in these ratings. BDA was used to determine the probability that data for each substance came from exposure profiles that could be given AIHA exposure ratings of category 0 through 4.(19) The decision probabilities in Table V also reflect the parameter space (i.e., the range of geometric means and geometric standard deviations considered in the BDA analysis) used for each substance, which in several cases had to be expanded beyond the default parameter space recommended for BDA (see Hewett et al.(19) for additional details on the use of BDA). (Expanding parameter space tends to shift the decision probabilities into the higher exposure categories.)

For EC in the lead locomotive the exposure rating could be either category 2 or 3 (which reflects the fact that the sample 95th percentile is roughly half of the working OEL, therefore nearly equal to the dividing line between category 2 and 3). For BTEX and the aldehydes, BDA strongly suggests that an exposure rating of 0 or 1 is appropriate. CO merited a category 2 rating while NO2 received a category 4 rating using the new ACGIH TLV of 0.2 ppm. The exposure rating implies the range that most likely contains the true 95th percentile.

One has to look at the sample statistics to determine the best estimate of the true 95th percentile, which for NO2 is 0.22 ppm using non-parametric methods, and 0.25 ppm using the lognormal distribution assumption (Table IV). (The exposure rating for NO2 would be category 1 when using the former ACGIH TLV.)

Determinants of Exposure

Effect of Locomotive Position and Window Status

For most of the cases there was additional contextual information on the locomotive position (e.g., lead versus trailing) and the status of the locomotive windows during the run (open versus closed). The median level for the lead locomotives appears to be considerably less than that for the trailing locomotives (considering both detects and non- detects): 0.0027 mg/m3 (n=156) versus 0.0073 mg/m3 (n=22), respectively (see Table III). A two-sided t-test comparison of the log-transformed data indicated that the geometric means were significantly different (p < 0.001) (see Figures S1–S3). (This analysis was repeated for TC levels in the lead and trailing locomotives. The levels in the trailing locomotive tended to be slightly greater than the levels in the lead locomotive, but the difference in the geometric means was not significant (p=0.229 and p=0.336 assuming separate and pooled variances, respectively).)

It was not expected that window status (open vs. closed) would greatly affect the EC levels for the lead locomotive. A t-test comparison of the log-transformed data indicates that the geometric means were not significantly different (p =0.318 and p = 0.334 assuming separate and pooled variances, respectively). In contrast, a t-test comparison of the TC levels showed that the TC levels were significantly greater (p < 0.05) when the windows in the lead locomotive were open.

It is logical to expect that window status might have a profound effect on the EC and TC levels in the trailing locomotive. However, a t-test comparison of the log- transformed EC values, by window status, was not significant (p=0.504 and p=0.499 assuming separate and pooled variances, respectively), indicating that the geometric mean EC levels when the windows were open versus closed were not significantly different. This analysis was repeated for TC levels in the trailing locomotive. The TC levels were not significantly greater when the windows in the trailing locomotive were open (p 0.964 assuming either separate or pooled variances).

Analysis of variance (ANOVA) was used to evaluate the combined effects of locomotive position (lead vs. trailing) and window status (open vs. closed) on the EC levels (results not shown). The position has a significant effect (p < 0.0001), but neither window status nor the interaction term was close to being statistically significant (see the Supplemental Materials). To fully evaluate the effect of window status on the trailing locomotive additional data will be necessary.

Effect of Tunnels

EC levels for the lead locomotive were plotted versus the number of tunnels encountered per run (see Figure S4). There was no indication of an effect due to the number of tunnels encountered during the run.

(see the Supplemental Materials for additional text as well as an evaluation of the effect of locomotive manufacturer, class, and model on EC levels.)
