The MCM is an irregularly shaped region that is 20,686 km2 in area, # st_area(st_union(pred.area())) and we used the MCM as the prediction area for which we generated and validated our temperature predictions. The longitude ranges from 99.9 to 97.8°W and the latitude ranges from 18.6 to 20.2°N # round(d = 1, st_bbox(st_transform(pred.area(), crs = crs.lonlat))) (see Figure 1). #notebook.html#fig--g/area-map To produce the study area for our temperature model, we expanded the spatial extent by 50 km outwards from each side of the bounding box of the MCM and rounded the limits to the nearest tenth of a degree of longitude or latitude. This allowed us to include additional stations that we used only as training data. The result was a rectangular region in the plate carrée projection that ranged from 100.4 to 97.4°W and 18.1 to 20.6°N. # study.area() This was then reprojected to match the MODIS sinusoidal projection and divided into 101,892 total 1 km × 1 km cells, # nrow(master.grid) and this output served as our master grid for modeling. Fig. 1. Study area showing all available ground meteorological stations (n=153) used for our daily Ta predictions in the Megalopolis of Central Mexico (MCM, shown as indigo-colored regions) from 2003–2018. 1. Data sources 1. Meteorological ground stations We utilized ground station data from the year 2003 through 2018 inclusive. By request, we obtained historical meteorological data from the Servicio Meteorológico Nacional de México (SMN) that integrated three monitoring networks: the Estaciones Meteorológicas Automáticas network (EMAs; 28 stations), the Estaciones Sinópticas Meteorológicas network (ESIMEs; 10 stations), and an apparently unnamed network of observatories (10 stations). We also incorporated two networks for which data is publicly available on the Internet: the Sistema de Monitoreo Atmosférico de la Ciudad de México (SIMAT; 33 stations; http://www.aire.cdmx.gob.mx) and the Programa de Estaciones Meteorológicas del Bachillerato Universitario at the Universidad Nacional Autónoma de México (UNAM; 27 stations; https://www.ruoa.unam.mx/pembu). Finally, we included observations from personal weather stations available from Weather Underground (45 stations; http://wunderground.com). # get.ground()$stations[, by = network, .N] For each station and day, we collected the minimum, mean, and maximum temperature, as well as the mean wind speed. These statistics are provided directly by Weather Underground. The other networks provide observations at a higher temporal resolution: hourly for SIMAT and SMN observatories, one per 30 minutes for UNAM, and once per 10 minutes for ESIMEs and EMAs. We computed the daily temperature and wind speed statistics when at least 75% of the observations for a station and day were not missing. Otherwise, that variable was treated as missing. After removing stations outside the study area (i.e., the area containing the MCM and the 50 km buffer around MCM’s bounding box) and time period, but before further data cleaning, we had 355,028 total daily observations from 175 stations. # invisible(do.call(filter.raw, get.ground.raw())) We removed the following data for their questionable quality: 1) stations with fewer than 100 days of observations over the period of study; 2) station-days with values that are impossible (e.g., a maximum temperature that is less than the minimum temperature or a negative mean wind speed) or implausible (i.e., colder than −30 °C, hotter than 53 °C, or having a mean wind speed greater than 114 m/s); 3) reported temperatures deviating ≥20 K from both of the nearest two stations. After these removals, 324,193 observations from 153 stations remained; the above per-network counts of stations are after data cleaning. Of these, there were 258,954 observations # sum(ground$stn %in% stations[master.grid[stations$mrow, in.pred.area], stn]) from 124 stations # sum(master.grid[stations$mrow, in.pred.area]) in the prediction area (i.e., the area of the MCM). We checked for possible duplicate stations—that is, stations with different identifiers but similar locations and a substantial number of identical observations—but did not find any. 1. Land surface temperature Estimates of Land Surface Temperature (LST), also known as the “skin temperature” of the earth’s surface, were extracted from the daily daytime and nighttime MODIS LST products MOD11A1 (Terra satellite) and MYD11A1 (Aqua satellite) using the most recent reprocessing version (Collection 6). The spatial resolution was 1 × 1 km and data were available for the entire study period. Local overpass times for Terra and Aqua were around 22:30 and 1:30 for night time, and around 10:30 and 13:30 for daytime, respectively. MODIS LST products are derived from channels 31 (10.78–11.28 µm) and 32 (11.77–12.27 µm) in the thermal infrared band, and they are already corrected for emissivity and atmospheric effects using the split-window algorithm. These products have been used before in similar studies due to their high spatiotemporal resolution and free availability (Kloog et al. 2012; Rosenfeld et al. 2017; Shi et al. 2016; Kloog et al. 2014; Benali et al. 2012). 1. Land use terms Vegetation - To calculate a measure of vegetation density, the monthly normalized difference vegetation index (NDVI), from both Terra and Aqua MODIS instruments were averaged (Collection 6 MOD13_A3 and MYD13_A3) at the spatial resolution of 1 × 1 km. Elevation - Elevation from the Shuttle Radar Topography Mission at a 30m spatial resolution was aggregated by applying a Gaussian filter (150m SD) and extracting data to the centroids of the MODIS 1 × 1 km products. 1. Statistical methods Aqua and Terra readings were combined to form a daily LST variable for daytime and nighttime. When one satellite's reading was missing for a given time and place, data from the other was used. When both readings were available, the average was calculated. To impute cases where both satellite readings were missing, we used a simple linear-interpolation algorithm as follows. For a given day t and grid cell g, we found the closest days before and after t in the same year as t, t0 and t1, with a non-missing value of the appropriate variable (daytime LST or nighttime LST) at g, and we computed the imputed value as: (y0(t1 - t) + y1(t - t0))/(t1 - t0) where yi is the LST at day ti and place g. When no such t0 exists for the given year, y1, unaltered, is used as the imputed value instead, and likewise y0 substitutes when no t1 exists. We also imputed missing wind speed. We did this by simply copying the wind speed from the closest station with a value for the same day (or the previous day, if no other station was available that day, or the day before that if necessary, and so on). We calibrated LST to Ta for each 1 x 1 km grid cell day using grid cells for which both Ta measurements and LST values (observed or imputed) were available in order to estimate Ta. The temperature model, fit separately per year and daily Ta outcome (minimum, mean, and maximum), was a mixed-effects regression model implemented with the R package lme4. Its equation was: Air temperatureij = (α + υj) + β1 day surface temperatureij + β2 night surface temperatureij + β3 imputed day surface temperatureij + β4 imputed night surface temperatureij + β5 NDVIij + β6 sin2π timei + β7 cos2π timei + β8 elevationi + β9 mean wind speedij + β10 seasonj + β11 seasonj*day surface temperatureij + β12 seasonj*night surface temperatureij + εij where Air temperatureij is air temperature on location i on day j, (α + υj) are the fixed and random intercepts, imputed day surface temperatureij is an indicator of whether satellite day surface temperature is imputed, imputed night surface temperatureij is an indicator of whether satellite night surface temperature is imputed, NDVIij is the mean NDVI of Aqua & Terra for grid cell i in month j, timej is the time of year calculated as (day of the year - 1)/(total days of the year - 1), elevationi is the mean elevation at site i, and mean wind speedij is daily mean wind speed from the nearest station. The season was defined as a categorical variable with three levels: cold-dry for November through February, warm-dry for March and April, and rainy for May to October. All continuous variables were centered and scaled before fitting. All analyses were conducted in R 3.6.0. 1. Assessment of model performance Within each year, all available stations in the prediction area were randomly split into 10 cross-validation folds. Inside the cross-validation loop, models were trained with all observations (occurring in the year of interest) for stations in the training folds plus all stations outside the prediction area but within the study area, and asked to predict the observations for the stations in the test fold. Mean wind speed for test stations was imputed as if all test stations were missing mean wind speed at all times. The primary measure of model performance was the root mean squared error (RMSE) of cross-validated predictions. Tabular summaries of model results also report the standard deviation (SD) and improvement (SD - RMSE) of each outcome (minimum, mean, or maximum Ta) in a particular year, season, or subregion. To evaluate prediction performance more evenly throughout the study region, our summary tables include spatially weighted versions of the SD and RMSE, for which each day and each sixteenth of a longitude-latitude grid cell (splitting each degree of longitude or latitude into four equal intervals) with at least one observation is given a total weight of 1. These metrics help to evaluate performance when we consider areas with few stations (e.g., the northern parts of the MCM) to be equally important to areas with lots of stations (e.g., Mexico City proper). In addition, we estimate the degree to which our predictions capture spatial and temporal patterns by computing for each temperature outcome y and prediction p, given per-station annual means of the outcome My and the predictions Mp: - R2, the proportion of variance accounted for, as 1 - Mean((y - p)2)/Var(y) - R2spatial, the squared correlation between My and Mp. - R2temporal, the squared correlation between (y - My) and (p - Mp). For summaries on the per-station annual means (R2spatial), the sample size is the number of stations instead of station-days (Kloog et al. 2014). 1. Estimation of at risk population We obtained population density information at the AGEB level (equivalent to US census tracts) from the 2010 Mexican Census that was carried out by the Instituto Nacional de Estadística y Geografía de Mexico (Instituto Nacional de Estadística y Geografía (INEGI), 2016). AGEB data is available as a polygon layer with population data for all metropolitan areas in the MCM. From these records we calculated the number of at risk days (above 30 °C and below 5 °C) experienced by people 65 years old and over in the MCM during the year 2010. 1. Results Approximately one-quarter to one-third of LST data were missing from both Aqua and Terra satellites, and thus had to be imputed for our temperature model. The proportions of missing LST data for both satellites were consistent over time (23% to 34% missing daytime LST and 27% to 37% missing nighttime LST across all years). #notebook.html#tab--sat-temp-missingness For all years and dependent variables ( minimum, mean, and maximum temperature), we observed substantially lower RMSE compared with the SD, indicating that our model was effective in predicting temperature. Table 1 #notebook.html#tab--cv-overall presents cross-validation results for mean Ta. Tables A.1 and A.2 in the Appendix show cross-validation results for minimum and maximum Ta, respectively. Across all results, the range for RMSEs was 0.9 to 2.4 K, # round(range(sr$overall$rmse), 1) with a mean of 1.5 K, # round(mean(sr$overall$rmse), 1) whereas the range for SDs was 2.6 to 5.1 K. # round(range(sr$overall$sd), 1) Values of R2 ranged from 0.68 to 0.93. # round(range(sr$overall$R2), 2) The spatially weighted RMSEs, in which all monitored areas are equally important, were generally worse than the unweighted RMSEs (by 0.4 K on average), but still reasonably small. # mean(sr$overall$rmse.s - sr$overall$rmse) Table 1. Prediction accuracy for the Megalopolis of Central Mexico (MCM): Ten-fold cross validated (CV) results for daily mean Ta predictions for 2003–2018. We examined model performance by metropolitan area in order to compare how our model performed in areas with fewer stations relative to those with more stations. Table 2 #notebook.html#tab--cv-by-region summarizes cross-validated model performance within each metropolitan area of the MCM with >5 stations for the most recent year in 2018. For results for all metropolitan areas, see Table A.3. There were large differences in both SD and RMSE across municipalities. Still, the improvement was at least 1 K in most cases. Table 2. Prediction accuracy by metropolitan area in the MCM: Ten-fold cross-validation (CV) results for Ta predictions in 2018. We evaluated model performance for each ground monitoring network from which we obtained temperature records. Table 3 #notebook.html#tab--cv-by-network shows the accuracy in cross-validated Ta predictions by type of network. The lowest RMSE values were obtained for UNAM records for minimum, mean, and maximum Ta predictions. The highest RMSE values came from Weather Underground records for all types of Ta predictions and from the ESIMES network for minimum Ta. One reason why UNAM had the lowest RMSE despite having fewer stations in 2018 compared to Weather Underground is because UNAM stations were evenly distributed across a smaller and more climatically homogeneous area. Another reason for this pattern might be that UNAM stations receive higher maintenance compared to personal weather stations from Weather Underground, whose placement and data quality can vary widely. Table 3. Prediction accuracy by ground monitoring network in the MCM in 2018. Model performance was compared by season type: cold-dry, warm-dry, and rainy. Table 4 presents the average cross-validated prediction accuracy for each Ta outcome by season for the entire MCM from 2003-2018. The lowest average RMSE values were for mean Ta for all seasons. The highest precision improvements were during the rainy season for minimum, mean, and maximum Ta. The highest RMSE values were observed for minimum and mean Ta during the cold-dry season and for maximum Ta in the warm-dry season. #notebook.html#tab--cv-by-season Table 4. Prediction accuracy by season in the MCM: Average S.D., RMSE and S.D. - RMSE for minimum, mean, and maximum Ta predictions from 2003–2018. Model predictions were made for each 1 × 1 km grid cell in the entire prediction area for summarization. Figure 2 #notebook.html#fig--g/map-temp-quantiles shows the 95th percentile of the minimum and maximum temperature across days for each grid cell in 2018. In general, the inhabited areas were hotter than the mountainous borders between municipalities. The southernmost part of the MCM, comprising the cities of Cuernavaca and Cuautla, was substantially hotter than the rest of the MCM. Figure 2. Spatial pattern of the 95th percentiles of minimum(a) and maximum (b) temperature across days for each grid cell in the Megalopolis of Central Mexico for 2018 We used population data to quantify human exposure to extreme ambient temperatures for people age 65 years and older in the urban AGEBs of the MCM in 2010, Figure 3 (a). #notebook.html#fig--g/map-population The highest number of person-days exposure to daily minimum Ta ≤5 °C, were concentrated in the metropolitan areas of Toluca, MCMA, Puebla-Tlaxcala and Pachuca, Figure 3 (b). #notebook.html#fig--g/map-extreme-person-days As for exposure to daily maximum Ta ≥30 °C, Figure 3 (c) shows that the metropolitan areas with higher number of person-days above this point were Cuernavaca, Cuautla, MCMA and Puebla-Tlaxcala. Figure 3. Spatial distribution of population density (a), person-days exposure to ≤5 °C (b) and ≥30°C (c) for each grid cell in the Megalopolis of Central Mexico for people aged 65 years old and over in 2010