1 County boundaries

Using 20m (most generlized) file.

Excluding Puerto Rico:

Final dataset used:

From these set of 3142 counties in 51 states 5% sample would mean 157 polygons.

Also - important to notice - there are five counties with zero neighbours, Three in Hawaii:

And two elsewhere:

And also important to notice the text from the paper:

Within this dataset, the 5 boroughs/counties of New York are treated as a single entity. We have done the same in these analyses, assigning all 5 counties the values associated with New York County

The counties in question are most likely these

  • The Bronx is Bronx County (FIPS 36005)
  • Brooklyn is Kings County (FIPS 36047)
  • Manhattan is New York County (FIPS 36061)
  • Queens is Queens County (FIPS 36081)
  • Staten Island is Richmond County (FIPS 36085)

At the moment, using dataset prepared these counties are excluded from analyses since there is no merge to explanatory variables and no spatial join possible!

2 Solution 1: picking one state at random

Might be an option. Must be the same in between runs? Most likely not enough counties?

## [1] 51
## [1] "55"
## [1] 72

3 Solution 2: picking fixed state

Texas as the biggest state that gives 8.1 data?

## [1] 254

4 Solution 3: st_sample

Unfortunately non contiguous :/

5 Solution 4: igraph, spdep and sf custom solution

Solution suggested by @Spacedman here.

## Deleting source `data/cb_2018_us_county_20m_prep_sample.shp' using driver `ESRI Shapefile'
## Writing layer `cb_2018_us_county_20m_prep_sample' to data source `data/cb_2018_us_county_20m_prep_sample.shp' using driver `ESRI Shapefile'
## Writing 157 features with 2 fields and geometry type Multi Polygon.

These are counties that will be used as input for 5% sample analyses!