6 Thematic Mapping

6.1 Overview

Once we have downloaded the contextual data and generated the access metrics, we can start visualizing them to identify any spatial patterns. This can help identify whether a variable is homogeneously distributed across space or do we see clustering & spatial heterogeneity. In this tutorial we will cover methods to plot data variables spatially i.e. create thematic maps, technically known as choropleth maps. We will cover the most commonly used types of choropleth mapping techniques employed in R. Please note the methods covered here are an introduction to spatial plotting. In this tutorial our objectives are to:

  • Generate a choropleth map
  • Visualize spatial distributions of data
  • Test sensitivty of various thresholds

6.2 Environment Setup

To replicate the codes & functions illustrated in this tutorial, you’ll need to have R and RStudio downloaded and installed on your system. This tutorial assumes some familiarity with the R programming language.

6.2.1 Input/Output

We will using the per capita income data for the City of Chicago downloaded & saved as a shapefile in the Census Data Wrangling. These files can also be found here. Our input is:

  • Chicago Zip Codes with per capita income, chizips_ACS.shp

Our output will be three thematic maps highlighting the distribution of per capita income at a zip code level across the city of Chicago.

6.2.2 Load the packages

We will use the following packages in this tutorial:

  • tidyverse: to manipulate data
  • tmap: to visualize and create maps
  • sf: to read/write and manipulate spatial data

Load the libraries required.

6.2.3 Load data

We will read in the shapefile with the per capita income at the zipcode level for the city of Chicago for year 2018.

## Reading layer `chizips_ACS' from data source `/Users/maryniakolak/code/opioid-environment-toolkit/data/chizips_ACS.shp' using driver `ESRI Shapefile'
## Simple feature collection with 56 features and 3 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -87.86962 ymin: 41.62992 xmax: -87.52416 ymax: 42.02313
## CRS:            4269
## Simple feature collection with 6 features and 3 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -87.64138 ymin: 41.85206 xmax: -87.60586 ymax: 41.88908
## CRS:            4269
##   GEOID totPp18 prCptIn                       geometry
## 1 60601   14675   92125 MULTIPOLYGON (((-87.63396 4...
## 2 60602    1244  100507 MULTIPOLYGON (((-87.63389 4...
## 3 60603    1174  117992 MULTIPOLYGON (((-87.63382 4...
## 4 60604     782  114575 MULTIPOLYGON (((-87.63375 4...
## 5 60605   27519   83408 MULTIPOLYGON (((-87.63311 4...
## 6 60606    3101  132765 MULTIPOLYGON (((-87.63998 4...

Lets review the dataset structure. In the R sf data object, the ‘geometry’ column provides the geographic information/boundaries that we can map. This is unique to simple features data structures, and a pretty phenomenal concept.

We can do a quick plot using:

6.3 Thematic Plotting

We will be using tmap package for plotting spatial data distributions. The package syntax has similarities with ggplot2 and follows the same idea of A Layered Grammar of Graphics.

  • for each input data layer use tm_shape(),
  • followed by the method to plot it, e.g tm_fill() or tm_dots() or tm_line() or tm_borders() etc.

Similar to ggplot2, aesthetics can be provided for each layer and plot layout can be manipulated using tm_layout(). For more details on tmap usage & functionality, check tmap documentation. The previous map we plotted using plot can be mapped using tmap as in the code below.

## tmap mode set to plotting

In tmap, the classification scheme is set by the style option in tm_fill() and the default style is pretty. Lets plot the distribution of per capita income by zipcode across the city of Chicago with default style using the code below. We can also change the color palette used to depict the spatial distribution. See Set Color Palette in Appendix for more details on that.

We will be plotting the spatial distribution of variable perCapIncome for the city of Chicago using three methods.

  1. Quantile
  2. Natural Breaks
  3. Standard Deviation

For a more detailed overview of choropleth mapping and methods, check out the related GeoDa Center Documentation.

6.3.2 Natural Breaks

Natural breaks or jenks distribution uses a nonlinear algorithm to cluster data into groups such that the intra-bin similarity is maximized and inter-bin dissimilarity is minimized. It is obtained by setting style = 'jenks' and n = no. of bins in the tm_fill().

As we can see, jenks method better classifies the dataset in review than the quantile distribution. There is no correct method to use and the choice of classification method is dependent on the problem & dataset used.

6.3.3 Standard Deviation

A standard deviation map normalizes the dataset (mean = 0, stdev = 1) and transforms it into units of stdev (given mean =0). It helps identify outliers in the dataset. It is obtained by setting style = 'sd' in the tm_fill(). The normalization process can create bins with negative values, which in this case don’t necessarily make sense for the dataset, but it still helps identify the outliers.

6.4 Appendix

Use ColorBrewer

To build aesthetically pleasing and easy-to-read maps, we recommend using color palette schemes recommended in ColorBrewer 2.0 developed by Cynthia Brewer. The website distinguishes between sequential(ordered), diverging(spread around a center) & qualitative(categorical) data. Information on these palettes cab be displayed in R using RColorBrewer package.

We can get the hex values for the colors used in a specific palette with n bins & plot the corresponding colors using code below.

## [1] "#F6EFF7" "#BDC9E1" "#67A9CF" "#1C9099" "#016C59"

We can update the jenks map by using this sequential color scheme and changing the transparency using alpha = 0.8 as below.

We can also update the stdev map by using a diverging color scheme as below.